CN116949014A

CN116949014A - UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same

Info

Publication number: CN116949014A
Application number: CN202310860511.XA
Authority: CN
Inventors: 王猛; 潘文嘉
Original assignee: Tianjin Institute of Industrial Biotechnology of CAS
Current assignee: Tianjin Institute of Industrial Biotechnology of CAS
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-10-27

Abstract

The present invention relates to fusion proteins of specific nucleic acid binding proteins and methods of capturing specific nucleic acids. The fusion protein is formed by fusing a cytosine base editor and a uracil binding protein, and the method is a method for capturing DNA containing a specific target sequence (containing PAM and cytosine C at a specific position) from a sample, and comprises the following steps: 1) Contacting a sample containing said target DNA with the UdgX-SSBE3 protein to capture the target DNA, and 2) detecting the target DNA. The invention also provides for the use of DNA containing a specific target sequence to capture and/or detect DNA from a sample by base editing-uracil binding proteins. The method can capture the target DNA from the complex nucleic acid sample with high sensitivity and specificity, and has practical application value.

Description

UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same

Technical Field

The present disclosure relates to methods of capturing target nucleic acids. In particular, the present disclosure relates to methods and uses for capturing target nucleic acids using the UdgX-SSBE3 protein.

Background

There are a number of methods and applications for capturing target nucleic acids from polynucleotide samples (e.g., in the whole genome). Due to the influence of complex DNA samples (e.g. influencing the signal to noise ratio), the specific recognition of the sequence of interest is not favored, and capture of the sequence of interest is required. In research work, specific regions were captured using gene capture probesThe region was designed with a length of probe sequences, each sequence was shifted a distance along the gene position, and the captured DNA products were sequenced by synthesizing the above sequences in large quantities by artificial synthesis (Calvo SE, compton AG, hershman SG, et al molecular diagnosis of infantile mitochondrial disease with targeted next-generation sequencing Sci Transl Med,2012,4 (118): 118-110). The current methods for the capture of specific nucleic acids involve multiple steps, require the use of large amounts of nucleic acid samples, and require expensive instrumentation, which is cumbersome, difficult, time-consuming, labor-consuming, low-accuracy, and costly. For example, targeted DNA hybrid capture techniques require the design of specific single-stranded DNA over 100bp in length, and complex hybridization processes of up to tens of hours or more with sample nucleic acids under specific conditions. Whereas long hybridization incubations significantly affect the progress of capture. In addition, higher incubation temperatures and prolonged incubation will affect the salt ion concentration in the mixture, especially when the hybridization reaction is small in volume. For example, the target DNA capturing rate of the existing solid or liquid phase targeting DNA hybrid capturing technology can only reach 400 times (Enrichment of sequencing targets from the human genome by solution hybridization. Ryan Tewhey, masakazu Nakano, xaoyunwang, carlos)Barbara Novak, angelica Giuffre, eric Lin, scott Happe, doug N Roberts, emily M LeProst, eric J Topol, olivier Harismendy and Kelly A Frazer. Genome biol.2009;10 (10) R116.Doi:10.1186/gb-2009-10-10-r 116) and 1000 times (Microarray-based genomic selection for high-through put request sequencing. David T Okou, karyn Meltz Steinberg, christina Middle, david J Cutler, thomas J Albert)&Michael E zwick. Nat methods.2007Nov;4 (11):907-9). In addition, the process has high cost, low specificity and low DNA capturing efficiency, and the capturing effect is greatly influenced by personnel experiment technology and cannot realize automation. There is therefore a need to develop new methods for specific nucleic acid capture that are rapid, efficient, and low cost.

Disclosure of Invention

The invention firstly provides a fusion protein which is formed by fusing a cytosine base editor and a uracil binding protein, wherein the cytosine base editor is UdgX, and the uracil binding protein is a BE3 editor or an optimized SSBE3 editor thereof.

The uracil binding proteins are UdgX proteins from M.smegmatis, M.avium, R.imtechensis, M.haemaphium, rhodococcus, streptomyces coelicolor, gordoniana mibiense, bradyrhizobium japonicum and Nocardia farcidia. More specifically, the amino acid sequence is shown as SEQ ID No. 22.

The cytosine base editor is a cytosine base editor and is a BE3 type editor or a preferable BE3 type editor, particularly a preferable BE3 type editor is SSBE3, and the amino acid sequence of the cytosine base editor is shown as SEQ ID NO: 23.

The uracil binding protein and the cytosine base editor are connected through a linker, and preferably the amino acid sequence of the linker is shown in SEQ ID NO: shown at 24. In one embodiment, the amino acid sequence of the fusion protein is shown in SEQ ID No. 1.

In some embodiments, the fusion protein comprises UdgX having the sequence of motif a (GEQPG) and motif B (HPSSLL) and comprises the conserved region KRRIH. In some embodiments, the fusion protein may be a protein comprising or consisting of the sequence of SEQ ID No.1 or the amino acid sequence encoded by SEQ ID No.2 or a variant thereof.

In some embodiments, the fusion protein may be a protein comprising or consisting of an amino acid sequence that is about 60%,65%,70%,75%,80%,85%,90%,91%,92%,93%,94%,95%,96%,97%,98%,99% or more or 100% identical to the sequence of SEQ ID No.1 or the amino acid sequence encoded by SEQ ID No.2. In some embodiments, the variant of the fusion protein has one or more unnatural amino acids, one or more amino acid substitutions, one or more amino acid insertions, one or more amino acid deletions, or any combination thereof at one or more positions as compared to the fusion protein. In some embodiments, the variant has substantially similar or comparable activity to the fusion protein. In some embodiments, variants of the UdgX-SSBE3 protein may have about 60%,65%,70%,75%,80%,85%,90%,91%,92%,93%,94%,95%,96%,97%,98%,99% or more sequence identity to the amino acid sequence of the fusion protein. As known to those skilled in the art, variants of the protein may be obtained by introducing conservative substitutions, deletions and additions of amino acids in the preparation of the recombinant protein. The desired substitution, deletion or insertion may also be provided by altering the specific codon of the coding sequence. Alternatively, protein variants may be prepared by random or saturation mutagenesis techniques such as alanine scanning mutagenesis, error-prone polymerase chain reaction mutagenesis and oligonucleotide-directed mutagenesis. In some embodiments, conservative substitutions include substitutions between amino acids of similar nature, such as substitutions between hydrophobic amino acids Nle, met, ala, val, leu, lie, substitutions between neutral hydrophilic amino acids Cys, ser, thr, asn, gin, substitutions between acidic amino acids Asp, glu, substitutions between basic amino acids His, lys, arg, substitutions between influencing strand oriented amino acids Gly, pro, and substitutions between aromatic amino acids Trp, tyr, phe. In some embodiments, non-conservative substitutions between the above types may be included.

In some embodiments, the fusion protein also has a purification tag, thereby facilitating easy isolation and/or purification. In some embodiments, purification tags that may be used include tags commonly used for protein purification, such as Snap-tag, his-tag, flag-tag, MBP-tag, biotin, and the like.

The invention thus also provides polynucleotides encoding the fusion proteins, expression vectors and recombinant host cells. In particular, the polynucleotide sequence is shown in SEQ ID No.2.

Furthermore, the present invention provides a method for capturing a specific nucleic acid, and more particularly, to a method for simultaneously editing uracil-producing proteins and uracil-binding proteins for capturing a specific target sequence DNA (PAM-containing and cytosine C-containing at a specific position) using the fusion protein and application thereof. Wherein PAM-containing and cytosine C-containing at a specific position means NGG or NG or other type of PAM sequence, with a length of 8-30bp, preferably 20bp NGG for pre-PAM. The cytosine-containing C at a specific position means 3 to 25 bases, preferably 5 to 20 bases, more preferably 11 to 17 bases, upstream of the 5' end of the PAM site.

In one aspect, provided herein is a method of capturing target DNA from a sample, the method comprising:

1) Contacting a polynucleotide sample containing the target DNA with a specific fusion protein having base editing and uracil binding functions and sgrnas targeting the target sequence to obtain a fusion protein-target DNA complex;

2) Target DNA is detected and captured.

In a specific embodiment, in step 1) of the method, a target sequence DNA (PAM-containing and cytosine C-containing at a specific position) is selected to be captured, a target sequence is targeted to the target sequence using sgrnas targeting the target DNA such that the target base C in the target codon is edited to U, and the fusion protein can be covalently bound to the target base U in the target sequence, thereby obtaining a fusion protein-target DNA complex.

In a specific embodiment, washing the fusion protein-target DNA complex at least 3 times, preferably with PBS buffer or Tris-HCl (pH 8.0), is also comprised in step 1) of the method.

In a specific embodiment, step 2) of the method further comprises binding the fusion protein-target DNA complex to an affinity matrix, preferably magnetic beads, more preferably Snap magnetic beads.

In some embodiments, a polynucleotide sample containing the target DNA may be mixed with the fusion protein and the corresponding sgRNA and contacted to capture the target DNA. In some embodiments, a solution containing the fusion protein and corresponding sgrnas may be added to a polynucleotide sample containing the target DNA or a sample may be added to a solution containing the fusion protein and corresponding sgrnas, which are contacted to capture the target DNA.

In some embodiments, the DNA of interest is double-stranded DNA or single-stranded DNA or a mixture of both comprising DNA of the target sequence (PAM-containing and cytosine C at a specific position). In some embodiments, the DNA of interest contains one or more naturally occurring or artificially added target sequence DNAs (containing PAM and cytosine C at a specific position). In some embodiments, the target DNA is exogenous DNA or endogenous DNA relative to the subject. In some embodiments, the DNA of interest is from a human, a microorganism, an animal, or a plant. The method of the present invention can advantageously achieve capture and detection of target DNA present in a sample at a low abundance. The method of the present invention is directed to capturing target DNA in a sample, i.e. to solving the problem of obtaining low abundance target DNA in a sample.

In some embodiments, the fusion protein may include a protein subunit that specifically binds uracil, such as the UdgX protein. In some embodiments, the UdgX-SSBE3 protein specifically binds to its uracil obtained by base editing without binding to other bases or with greater affinity than binding to other bases. From the prior art, one of ordinary skill in the art can determine that uracil binding proteins have a strong affinity for uracil in DNA. (A unique urosil-DNA binding protein of the uracil DNA glycosylase superfamily, pau Biak Sang, thiruneelakantan Srinath, aravind Goud Patil, eui-Jeon Woo andUmesh Varshney. Nucleic Acids research.30;43 (17): 8452-63.Doi:10.1093/nar/gkv 854.).

In some embodiments, the method comprises isolating a complex of the fusion protein and the target DNA to obtain the target DNA. In some embodiments, the complex of fusion protein and target DNA may be isolated by any suitable separation method. For example, in some embodiments, complexes of fusion proteins and target DNA may be separated from complex nucleic acid samples by affinity separation techniques, and then recovered to obtain captured target DNA. Specific binding between molecules is called affinity, and techniques for purifying biomolecules using affinity are called affinity separation techniques. In some embodiments, the separation is performed by differences in affinity between the stationary phase-based ligand and the target molecule and affinity with other molecules. In some embodiments, the complex of fusion protein and target DNA may be obtained by chromatographic separation, such as affinity column chromatography. In some embodiments, the complex of uracil binding protein and target DNA can be obtained by magnetic bead separation, such as affinity magnetic beads. In some embodiments, the fusion protein has a capture tag, thereby facilitating easy isolation and/or purification. In some embodiments, capture tags that may be used include tags commonly used for protein capture such as Snap-tag, his-tag, flag-tag, MBP-tag, biotin, and the like. In some embodiments, complexes of fusion proteins and target DNA can be obtained by affinity separation using Snap magnetic beads (e.g., the NEB commercially available Snap magnetic beads). In some embodiments, the complex of fusion protein and target DNA is washed at least 3 more times with lysis buffer after binding to magnetic beads or affinity columns. In some embodiments, the lysis buffer is PBS buffer or Tris-HCl (pH 8.0). It has been found that further preference is given to de-crosslinking after removal of impurities, in order to facilitate capturing and/or detecting the target DNA with increased sensitivity and specificity.

In some embodiments, the method may further comprise the step of uncrosslinking the fusion protein-DNA fragment complex. In some embodiments, the step may be accomplished by, for example, adding a proteolytic enzyme, such as proteinase K, to the resulting eluate of the fusion protein-DNA fragment complex. In some embodiments, the method may further comprise amplifying the captured target DNA. Methods for amplifying nucleic acids of interest are widely known in the art and include, for example, PCR amplification, isothermal amplification, and the like.

In some embodiments, the source of the sample is not particularly limited as long as it is a sample that may contain the target DNA. In some embodiments, the sample comprises a genomic DNA sample, a cell-free DNA sample, an environmental genomic sample, and/or a mixed genomic DNA sample. In some embodiments, the sample comprises double stranded DNA, single stranded DNA, or a mixture of both. In some embodiments, the sample may be a polynucleotide sample, such as a sample containing mixed nucleic acids. In some embodiments, the sample may be from a variety of organisms, such as humans, plants, animals, microorganisms, and the like. In some embodiments, the sample may be from bacteria, archaebacteria, protist, fungi, or the like. In some embodiments, the sample may be an environmental genome sample (metagenomic sample). In some embodiments, the sample may be directly or indirectly from the subject. For example, the sample may be collected directly from the subject, or may be from an isolated sample obtained from the subject. In some embodiments, the sample may be from an animal subject, such as a mammalian subject. In some embodiments, the sample may be from a primate, laboratory animal, farm animal, livestock or pet. In some embodiments, the sample may be from a mammal, a marine animal, an amphibian, a bird, a reptile, an insect, and other invertebrates. In some embodiments, the sample may be from a human subject. In some embodiments, the sample may be from the subject's blood, serum, serosal fluid, plasma, lymph, urine, cerebrospinal fluid, saliva, mucous secretions of secretory tissues and organs, vaginal secretions, milk, tears, ascites, for example, fluid from the pleura, pericardium, peritoneum, abdomen, or other body cavities. In some embodiments, the sample may be cell free DNA obtained from a body fluid of a subject, such as plasma or serum. In some embodiments, the sample may be a genomic DNA sample isolated from the source described above. In some embodiments, the sample may be a mixed sample obtained from multiple sources, such as a mixed genomic DNA sample. In some embodiments, the sample may be a bacterial and/or human genomic DNA sample. In some embodiments, the target DNA is in a fetal cell fraction of cell free DNA, and wherein the cell free DNA is from maternal plasma.

In some embodiments, capturing target DNA refers to the process of obtaining a higher percentage of target DNA in a population of polynucleotides. In some embodiments, the percentage of target DNA is increased by about 5%,10%,20%,30%,40%,50%,60%,70%,80%, or more than 90%. In some embodiments, the percentage of target DNA is increased by about 2-fold, 5-fold, 10-fold, 50-fold, or 100-fold. In some embodiments, uracil-containing target DNA can be captured by the methods of the invention by more than 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 200000, 300000, 400000, 500000, or more times. In some embodiments, target DNA may be captured and/or detected from a sample having a low content of target DNA by the methods of the present invention. For example, the target DNA is captured and/or detected from a sample having a ratio of the target DNA to the total DNA in the sample of 1/400, 1/500, 1/600, 1/700, 1/800, 1/900, 1/1000, 1/2000, 1/3000, 1/4000, 1/5000, 1/6000, 1/7000, 1/8000, 1/9000, 1/10000, 1/20000, 1/30000, 1/40000, 1/50000, 1/60000, 1/70000, 1/80000, 1/90000, 1/100000, 1/200000, 1/300000, 1/400000, 1/500000 or less. In some embodiments, the methods of the invention can capture and/or detect target DNA containing a gene mutation as low as 0.2% (or lower), for example, in a sample of non-mutated DNA containing about 99.8% (or higher). In some embodiments, the methods of the invention have been found to be capable of capturing and/or detecting target DNA with high sensitivity and/or specificity from complex nucleic acid samples.

In some embodiments, provided herein is a method of detecting a target DNA, the method comprising obtaining a captured target DNA by a method described herein, and then detecting the target DNA. In some embodiments, the method comprises amplifying the captured target DNA after obtaining the captured target DNA. In some embodiments, the presence and/or sequence of the target DNA may be detected by methods such as hybridization, PCR, isothermal amplification, sequencing, and/or a biochip.

In some embodiments, the fusion protein is used to capture and/or detect target DNA from a sample. In some embodiments, the fusion protein or composition comprising a base editing-pyrimidine binding protein is used to capture and/or detect DNA comprising a target sequence (PAM-containing and cytosine C-containing at a specific position) from a sample. In some embodiments, the fusion protein is used to prepare a composition and/or kit for capturing and/or detecting DNA containing a target sequence (PAM-containing and cytosine C-containing at a specific position). In some embodiments, it relates to the use of a fusion protein, preferably as defined above, for the preparation of a reagent for capturing a DNA comprising a target sequence (PAM-containing and cytosine C at a specific position) from a sample, preferably as defined above.

The method can capture target DNA from complex nucleic acid samples with high sensitivity and specificity, can capture a plurality of target DNA and long fragment target DNA at the same time, and has great application value.

Drawings

Fig. 1: the UdgX-SSBE3 protein captures DNA containing the sequence of interest.

Fig. 2: expression and purification of UdgX-SSBE3 protein.

Fig. 3: DNA sample concentration standard curve.

Fig. 4: q-PCR quantitative analysis of DNA.

Fig. 5: q-PCR quantitative analysis of the captured target DNA in E.coli genome.

Fig. 6: q-PCR quantitative analysis of captured target DNA in Streptomyces genome.

Fig. 7: effect of different kinds of sgrnas input on DNA relative capture rate.

Fig. 8: q-PCR quantitative analysis of simultaneous capture of multiple target DNAs in Streptomyces genome.

Fig. 9: sequencing coverage analysis of simultaneous capture of multiple target DNA in the streptomyces genome.

Fig. 10: effect of different concentrations of sgRNA input on relative DNA capture rate.

Fig. 11: q-PCR quantitative analysis of the captured long fragment of target DNA in Streptomyces genome.

Fig. 12: sequencing coverage analysis of the capture of long fragment target DNA in streptomyces genome.

Fig. 13: q-PCR quantitative analysis of captured target DNA in human saliva genome.

Fig. 14: sequencing coverage analysis of simultaneous capture of target DNA in human saliva genome.

Detailed Description

The invention will be further illustrated by the following examples for a better understanding of the invention, but without limiting the same.

EXAMPLE 1 editing and Capture of target DNA

1.1 expression and purification of UdgX-SSBE3 proteins

UdgX in the fusion protein is uracil-binding protein, and the amino acid sequence of the UdgX is SEQ ID NO:22, and SSBE3 is a cytosine-type base editor, which is optimized on a generic BE 3-type base editor (amino acid sequence see SEQ ID NO: 23). The amino acid sequence of the two sequences which are connected through a linker (SEQ ID NO: 24) is shown as SEQ ID No.1, and the coding nucleotide sequence is shown as SEQ ID No.2.

The UdgX-SSBE3 gene sequence (SEQ ID No. 2) was constructed on the expression vector pSnap-taq (T7) 2-His (containing the Snap tag) to obtain plasmid pSnap-UdgX-SSBE3. The plasmid pSnap-UdgX-SSBE3 was transformed into E.coli BL 21. BL21 bacteria containing plasmid pSnap-UdgX-SSBE3 were inoculated with 0.05% FeCl ₃ Culturing in LB medium of (C) at 37 ℃ until OD600 reaches 0.6, adding IPTG to make the final concentration in the culture system be 0.5mM, and continuing culturing at 16 ℃ overnight. 20mL of the sample was centrifuged at 5000rpm/min for 30min, the supernatant was discarded, and the pellet was resuspended in 2mL of lysate (50 mM Tris-HCl buffer, 50mM NaCl, pH 8). Lysozyme was added to a final concentration of 0.1mg/mL, and the mixture was left on ice for 30min and then sonicated. Centrifugation at 5000rpm for 30min, and the supernatant was combined with Snap magnetic beads (e.g., NEB) overnight at 4deg.C. 2mL of buffer (50 mM Tris-HCl buffer, 50mM NaCl, pH 8) was added to the centrifuge tube containing the magnetic beads, the centrifuge tube was gently turned over several times to resuspend the magnetic beads, magnetic separation, washing the magnetic beads three times, and removing proteins not bound to the magnetic beads.

SDS-PAGE detected UdgX-SSBE3 protein (FIG. 2). SDS-PAGE detects the SSBE3 protein supernatant after ultrasonication and the SSBE3 protein supernatant after overnight binding to Snap beads at 4 ℃. As can be seen from FIG. 1, the protein supernatant after ultrasonication contains the target protein SSBE3, and the content of the target protein in the protein supernatant after being combined with Snap magnetic beads at 4 ℃ overnight is also reduced. It was shown that the target protein SSBE3 can bind to Snap magnetic beads.

1.2 acquisition of DNA fragments

The forward primer (SEQ ID No. 3) and the reverse primer (SEQ ID No. 4) were synthesized and amplified from the plasmid template by PCR to obtain a GFP fragment (SEQ ID No. 5) containing the editing site of interest.

The forward primer (SEQ ID No. 6) and the reverse primer (SEQ ID No. 4) were synthesized and amplified from the plasmid template by PCR to obtain a GFP fragment (SEQ ID No. 7) without the editing site of interest.

1.3 acquisition of sgRNA fragments

The forward primer (SEQ ID No. 8) and the reverse primer (SEQ ID No. 9) were synthesized and amplified from the plasmid template by PCR to obtain the sgDNA fragment (SEQ ID No. 10) corresponding to the editing site of interest.

And (3) using the sgRNA in-vitro transcription kit (in the flourishing industry), and transcribing the sgDNA obtained in the step as a template to obtain the corresponding sgRNA.

1.4 editing and binding of UdgX-SSBE3 protein and DNA fragments

UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively ₂ EDTA,1mM DTT,25ug/ml BSA), 25℃for 5 minutes, then 500ng of DNA after double-strand break was added thereto, and the reaction was carried out at 37℃for 2 hours.

1.5 obtaining of target DNA fragments

After 2 hours of reaction at 37 ℃, the centrifuge tube was gently turned over several times to resuspend the beads and magnetically separate. The reaction solution was removed, 1mL of buffer (50 mM Tris-HCl buffer, 50mM NaCl, pH 8) was added to the centrifuge tube containing the magnetic beads, and the centrifuge tube was gently turned over several times to resuspend the magnetic beads and magnetically separate. Repeated 3 times. Finally, 500uL of buffer (50 mM Tris-HCl buffer, 50mM NaCl, pH 8) was added, proteinase K was added to the resulting eluate of UdgX protein-DNA fragment complex to a final concentration of 100ug/mL, and the solution was treated at 50℃for 30 minutes to crosslink the UdgX-SSBE3 protein-DNA fragment complex, thereby obtaining a target DNA fragment.

1.6 quantitative analysis of DNA fragments

The DNA fragments after the decrosslinking were quantified by q-PCR.

q-PCR reaction system: distilled water 6.4ul,SYBR.Green Realtime PCR Master Mix 10ul, upstream primer (SEQ ID No.11,10 uM) 0.8ul, downstream primer (SEQ ID No.12,10 uM) 0.8ul, sample solution 2ul.

Cycling conditions for q-PCR: 95 ℃ for 30s; PCR cycle (×40 cycles): 95℃for 5s,55℃for 10s,72℃for 15s (data collection); melting curve analysis (Melting Curve Analysis).

The standard curve was drawn after dilution of DNA samples quantified by Nanodrop (fig. 3).

The standard substance concentrations are respectively as follows: 0.000001ng/ul,0.00001ng/ul,0.0001ng/ul,0.001ng/ul and 0.01ng/ul.

The target DNA after the decrosslinking was quantified by q-PCR. The results showed that UdgX-SSBE3 can specifically capture GFP fragments containing the editing site of interest at a recovery concentration of about 314 times that of GFP fragments without the editing site of interest (FIG. 4).

EXAMPLE 2 isolation of target DNA from E.coli genomic DNA sample

2.1 expression and purification of UdgX protein (same as in example 1)

2.2 acquisition of genomic DNA samples

Coli DH 5. Alpha. Genomic DNA was extracted using a genomic extraction kit (Biomega).

2.3 acquisition of sgRNA fragments (same as in example 1)

2.4 editing and binding of UdgX-SSBE3 proteins with DNA fragments

UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively ₂ EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 6ug of the digested E.coli genomic DNA and the digested E.coli DNA were addedThe same amount of GFP fragment containing the editing site (1 ng and 10 ng) or 6ug of the excised E.coli genomic DNA and GFP fragment containing no editing site (500 ng) were added to 1mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na) ₂ EDTA,1mM DTT,25ug/ml BSA), at 37℃for 2h.

2.5 obtaining of target DNA fragment (same as in example 1)

2.6 quantitative analysis of target DNA fragments

The target DNA after the decrosslinking was quantified by q-PCR, and GFP was amplified from the DNA after the decrosslinking using a primer (SEQ ID No. 11) and a primer (SEQ ID No. 12). The results showed that UdgX-SSBE3 can specifically capture GFP fragments containing the editing site of interest, and the total amount recovered was about 21.3 times (6 ug genomic DNA:10ng GFP fragment containing the editing site) and 2.1 times (6 ug genomic DNA:1ng GFP fragment containing the editing site) of GFP fragments not containing the editing site of interest (FIG. 5). Since the ratio of GFP fragment containing editing site and GFP fragment not containing editing site of the initial mixed DNA sample was 10:500 and 1:500, respectively, the specific capture rate of target DNA in the experiment was 1065-fold and 1050-fold.

Example 3 isolation of target DNA from Streptomyces genomic DNA samples.

3.1 expression and purification of UdgX protein (same as in example 1)

3.2 acquisition of genomic DNA samples

Streptomyces genomic DNA was extracted using a genome extraction kit (Biomega).

3.3 acquisition of sgRNA fragments

The forward primer (SEQ ID No.13, SEQ ID No.14 and SEQ ID No. 15) and the reverse primer (SEQ ID No. 9) were synthesized and amplified from the plasmid template by PCR to obtain sgDNA fragments (Act, redD and RedN) corresponding to the target editing site.

3.4 editing and binding of UdgX proteins and DNA fragments

UdgX-SSBE3 protein bound to snap beads was added to 0.5mL of anti-reaction with sgRNA (Act, redD and RedN), respectivelyIn buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na) ₂ EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of Streptomyces genomic DNA after double strand break was added and reacted at 37℃for 2 hours.

3.5 obtaining of target DNA fragment (same as in example 1)

3.6 quantitative analysis of target DNA fragments

The target DNA after the decrosslinking was quantified by q-PCR, and primers (SEQ ID No.16 and SEQ ID No. 17), primers (SEQ ID No.18 and SEQ ID No. 19) and primers (SEQ ID No.20 and SEQ ID No. 21) were used to amplify (Act, redD and RedN) from the DNA after the decrosslinking, respectively. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the editing site of interest in the genome of Streptomyces, with capture amounts of 0.007ng (Act), 0.004ng (RedD) and 0.01ng (RedN) (FIG. 6).

The experiment was repeated as in example 3, in which only 3 sgrnas at concentrations of (1-10 ng/uL) were put into 1 reaction system (1 uL of each sgRNA) while capturing 3 target DNAs. The repeated experiment results show that: with respect to the results of example 2, the capture efficiency of the Act gene was reduced to 25%, the capture efficiency of the RadD gene was reduced to 28%, and the capture efficiency of the RadN gene was reduced to 23% (fig. 7).

Example 4 multiple target DNA were isolated simultaneously from Streptomyces genomic DNA samples.

4.1 expression and purification of UdgX protein (same as in example 1)

4.2 acquisition of genomic DNA samples

Streptomyces genomic DNA was extracted using a genome extraction kit (Biomega).

4.3 acquisition of sgRNA fragments

The forward and reverse primers were synthesized and amplified from the plasmid template by PCR to obtain sgDNA fragments (Target 1-Target 7) corresponding to the Target editing sites.

4.4 editing and binding of UdgX proteins and DNA fragments

Will bind to the snap magnetic beadsUdgX-SSBE3 protein and 10 sgRNAs (Target 1-7, act, radD and RadN) were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na) ₂ EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of Streptomyces genomic DNA after double strand break was added and reacted at 37℃for 2 hours.

4.5 obtaining of target DNA fragment (same as in example 1)

4.6 quantitative analysis of target DNA fragments

The target DNA after the decrosslinking is quantified by q-PCR, and primers are respectively applied to amplify the target DNA from the DNA after the decrosslinking. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the Target editing site in the genome of Streptomyces, with capture amounts of 0.019ng (Target 1), 0.039ng (Target 2), 0.052ng (Target 3), 0.005ng (Target 4), 0.03ng (Target 5), 0.012ng (Target 6), 0.004ng (Target 7), 0.016ng (RedD), and 0.002ng (RedN) and 0.006ng (Act) (FIG. 8).

4.7 second Generation sequencing of target DNA fragments

And (3) taking the target DNA after the decrosslinking as a template, constructing a second-generation library by using a library-building kit (VAHTS Universal DNA Library Prep Kit for Illumina V3) of the Norwegian company, and carrying out second-generation sequencing on the constructed DNA library by using Norhogenic science and technology Co., ltd. Sequencing results showed that 9 target sites could be sequenced, coverage 22022-592551 (FIG. 9).

The experiment was repeated as in example 4, wherein only 10 sgrnas were mixed in equal volumes and added in 1uL in total, the concentration was diluted to 1/10 or 1/20 of the initial concentration, the concentration of the sgrnas in the reaction system was 1/10 or 1/20 of the initial value, and 10 target DNAs were captured simultaneously. The results of the repeated experiments show (fig. 10): the capture efficiency of the target DNA was reduced by a factor of 10 (25.2% -98.6%) after dilution of the sgRNA relative to the results of example 4. After 20-fold dilution of sgRNA, the capture efficiency of target DNA was reduced to (0.02% -1.05%). The minimum amount of sgrnas required for this test input is used to derive the amount of sgrnas that can be input most by one reaction, the greater the amount of sgrnas that can be input, the greater the corresponding types and lengths of target DNA that can be captured.

Example 5 capturing long fragments of target DNA (10 KB) from Streptomyces genomic DNA samples.

5.1 expression and purification of UdgX protein (same as in example 1)

5.2 acquisition of genomic DNA samples

Streptomyces genomic DNA was extracted using a genome extraction kit (Biomega).

5.3 acquisition of sgRNA fragments

The forward and reverse primers were synthesized and amplified from the plasmid template by PCR to obtain sgDNA fragments (Target 2, target8-Target26, 20 of which) corresponding to the Target editing site.

5.4 editing and binding of UdgX proteins and DNA fragments

UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively ₂ EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of Streptomyces genomic DNA after double strand break was added and reacted at 37℃for 2 hours.

5.5 obtaining of target DNA fragment (same as in example 1)

5.6 quantitative analysis of target DNA fragments

The target DNA after the decrosslinking is quantified by q-PCR, and primers are respectively applied to amplify the target DNA from the DNA after the decrosslinking. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the Target editing site in the streptomyces genome in the amounts of 0.095ng (Target 8), 0.111ng (Target 9), 0.004ng (Target 10), 0.005ng (Target 11), 0.069ng (Target 12), 0.035ng (Target 13), 0.007ng (Target 14), 0.034ng (Target 15), 0.19ng (Target 16), 0.145ng (Target 2), 0.005ng (Target 17), 0.082ng (Target 18), 0.264ng (Target 19), 0.005ng (Target 20), 0.016ng (Target 21), 0.103 (Target 22), 0.164 (Target 23), 0.292 (Target 24), 0.549 (Target 25) and 0.493ng (26) (fig. 11).

5.7 second Generation sequencing of target DNA fragments

And (3) taking the target DNA after the decrosslinking as a template, constructing a second-generation library by using a library-building kit (VAHTS Universal DNA Library Prep Kit for Illumina V3) of the Norwegian company, and carrying out second-generation sequencing on the constructed DNA library by using Norhogenic science and technology Co., ltd. Sequencing results showed that 20 target sites could be sequenced, coverage 21105-1744840, and a long fragment of 10KB could be spliced (FIG. 12).

Example 6 simultaneously capturing long fragments of target DNA (10 KB) from a human saliva genomic DNA sample.

6.1 expression and purification of UdgX protein (same as in example 1)

6.2 acquisition of genomic DNA samples

Human saliva genomic DNA was extracted using saliva genomic DNA Rapid extraction kit (Biomed).

6.3 acquisition of sgRNA fragments

The forward and reverse primers were synthesized and amplified from the plasmid template by PCR to give the sgDNA fragment (Target 27-Target 41) corresponding to the editing site of interest.

6.4 editing and binding of UdgX protein and DNA fragments

UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively ₂ EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of human saliva genomic DNA after double-strand break was added and reacted at 37℃for 2 hours.

6.5 obtaining of target DNA fragment (same as in example 1)

6.6 quantitative analysis of target DNA fragments

The target DNA after the decrosslinking is quantified by q-PCR, and primers are respectively applied to amplify the target DNA from the DNA after the decrosslinking. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the Target editing site in human saliva genome in the amounts of 0.00005ng (Target 27), 0.00005ng (Target 28), 0.00008ng (Target 29), 0.0002ng (Target 30), 0.0006ng (Target 31), 0.0002ng (Target 32), 0.002ng (Target 33), 0.0013ng (Target 34), 0.001ng (Target 35), 0.0003ng (Target 36), 0.0023ng (Target 37), 0.0012ng (Target 38), 0.0041ng (Target 39), 0.0037ng (Target 40) and 0.003ng (Target 41) (fig. 13).

6.7 second Generation sequencing of target DNA fragments

And (3) taking the target DNA after the decrosslinking as a template, constructing a second-generation library by using a library-building kit (VAHTS Universal DNA Library Prep Kit for Illumina V3) of the Norwegian company, and carrying out second-generation sequencing on the constructed DNA library by using Norhogenic science and technology Co., ltd. Sequencing results showed that 20 target sites could be sequenced, coverage was 1000-2909520, and long fragments of 10KB could be spliced (FIG. 14).

Claims

1. A fusion protein formed by fusing a cytosine base editor and a uracil binding protein;

more preferably, the uracil-binding protein is UdgX, in particular comprising motif a in UdgX: GEQPG and motif B: sequences of HPSSLL and comprise the conserved region KRRIH; more specifically, the uracil-binding protein UdgX protein is derived fromM. smegmatis,M. avium, R.imtechensis，M.haemophilum, Rhodococcusspp, Streptomycescoelicolor, Gordonianamibiense, BradyrhizobiumjaponicumAndNocardia farcidiathe method comprises the steps of carrying out a first treatment on the surface of the More preferably, the amino acid sequence is shown in SEQ ID No. 22;

the cytosine base editor is a BE3 editor (e.g., VQR-BE3, VRER-BE3, EQR-BE3, etc. with different identified pam types), an aid editor, or an optimized SSBE3 editor thereof;

after the cytosine base editor and the uracil binding protein are connected through a linker, the amino acid sequence of the specific linker is shown in SEQ ID NO: shown at 24.

2. The fusion protein of claim 1, wherein the fusion protein has an amino acid sequence as set forth in SEQ ID No. 1.

3. The fusion protein of claim 1 or 2, wherein the fusion protein further has a purification tag to facilitate easy isolation and/or purification; specifically, the purification tag comprises Snap-tag, his-tag, flag-tag, MBP-tag and biotin tag for protein purification.

4. A polynucleotide encoding a fusion protein according to any one of claims 1 to 3, an expression vector and a recombinant host cell, in particular the polynucleotide sequence is shown in SEQ ID No.2.

5. A method of capturing target DNA from a sample, the method comprising:

1) Contacting, e.g. mixing, a sample containing a target sequence DNA having PAM and cytosine C at a specific position with its targeting sgRNA and a fusion protein according to any one of claims 1 to 3 to obtain a fusion protein-target DNA complex;

2) Recovering the enriched target sequence DNA, preferably wherein the target sequence DNA is double stranded DNA or single stranded DNA or a mixture of both;

preferably, step 2) further comprises binding the fusion protein-target DNA complex to an affinity matrix, preferably magnetic beads, more preferably Snap magnetic beads;

the specific position containing cytosine C means 3 to 25 bases, preferably 5 to 20 bases, more preferably 11 to 17 bases, upstream of the 5' end of the PAM site.

6. The method of claim 5, wherein the sample is from a plant, animal and/or microorganism, such as from a bacterium, archaebacteria, protozoa, fungus, mammal, amphibian, bird, reptile, insect and/or invertebrate, such as from a human, more preferably the sample is from a subject's blood, serum, serosal fluid, plasma, lymph, urine, cerebrospinal fluid, saliva, mucous secretions of secretory tissues and organs, vaginal secretions, milk, tears and/or ascites;

in particular, wherein the target sequence DNA is exogenous DNA relative to the subject, such as DNA from a pathogenic microorganism, e.g., a virus, bacterium, and/or fungus, or endogenous DNA of the subject, e.g., DNA from the subject comprising a genetic mutation, e.g., DNA comprising a pathogenic genetic mutation;

in addition, preferably, the sample comprises a genomic DNA sample, a cell-free DNA sample, an environmental genomic sample, and/or a mixed genomic DNA sample.

7. The method of claim 5, wherein in step 1) of the method, a sample containing the target sequence DNA is contacted with the fusion protein, in combination with a different kind of targeting sgrnas to obtain uracil-binding protein-target DNA complexes; specifically, each sgRNA concentration was 0.1ng/uL-1ng/uL,0.2ng/uL-2ng/uL, and 2ng/uL-20ng/uL, respectively.

8. A method of detecting a target DNA, the method comprising the steps of capturing the target DNA by the method of any one of claims 5 to 7, and detecting the presence or absence of the target DNA;

preferably, the step of capturing the target DNA is followed by a step of amplifying the captured target DNA after obtaining the captured target DNA.

9. Use of a fusion protein according to any one of claims 1 to 3 for the enrichment and/or detection of DNA containing a C-target sequence having PAM and containing cytosine at a specific position from a sample.

10. Use of a fusion protein according to any one of claims 1 to 3 for the preparation of a reagent for capturing DNA containing a target sequence from a sample.