CN116949014A - UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same - Google Patents
UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same Download PDFInfo
- Publication number
- CN116949014A CN116949014A CN202310860511.XA CN202310860511A CN116949014A CN 116949014 A CN116949014 A CN 116949014A CN 202310860511 A CN202310860511 A CN 202310860511A CN 116949014 A CN116949014 A CN 116949014A
- Authority
- CN
- China
- Prior art keywords
- dna
- target
- sample
- fusion protein
- protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 49
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 44
- 102000039446 nucleic acids Human genes 0.000 title abstract description 18
- 108020004707 nucleic acids Proteins 0.000 title abstract description 18
- 150000007523 nucleic acids Chemical class 0.000 title abstract description 18
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 claims abstract description 46
- 102000037865 fusion proteins Human genes 0.000 claims abstract description 42
- 108020001507 fusion proteins Proteins 0.000 claims abstract description 42
- 229940104302 cytosine Drugs 0.000 claims abstract description 23
- 108091012372 uracil binding proteins Proteins 0.000 claims abstract description 13
- 229940035893 uracil Drugs 0.000 claims abstract description 8
- 108020004414 DNA Proteins 0.000 claims description 214
- 108091027544 Subgenomic mRNA Proteins 0.000 claims description 29
- 239000011324 bead Substances 0.000 claims description 25
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 15
- ISAKRJDGNUQOIC-UHFFFAOYSA-N Uracil Chemical compound O=C1C=CNC(=O)N1 ISAKRJDGNUQOIC-UHFFFAOYSA-N 0.000 claims description 14
- 230000027455 binding Effects 0.000 claims description 14
- 238000000746 purification Methods 0.000 claims description 14
- 108091033319 polynucleotide Proteins 0.000 claims description 10
- 102000040430 polynucleotide Human genes 0.000 claims description 10
- 239000002157 polynucleotide Substances 0.000 claims description 10
- 210000003296 saliva Anatomy 0.000 claims description 9
- 230000004927 fusion Effects 0.000 claims description 8
- 239000000203 mixture Substances 0.000 claims description 8
- 102000053602 DNA Human genes 0.000 claims description 7
- 241001465754 Metazoa Species 0.000 claims description 7
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 claims description 6
- 210000004027 cell Anatomy 0.000 claims description 6
- 238000002955 isolation Methods 0.000 claims description 5
- 230000008685 targeting Effects 0.000 claims description 5
- 241000894006 Bacteria Species 0.000 claims description 4
- 108020004682 Single-Stranded DNA Proteins 0.000 claims description 4
- 210000002381 plasma Anatomy 0.000 claims description 4
- 241000233866 Fungi Species 0.000 claims description 3
- 229960002685 biotin Drugs 0.000 claims description 3
- 235000020958 biotin Nutrition 0.000 claims description 3
- 239000011616 biotin Substances 0.000 claims description 3
- 230000007613 environmental effect Effects 0.000 claims description 3
- 239000013604 expression vector Substances 0.000 claims description 3
- 244000005700 microbiome Species 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 210000002966 serum Anatomy 0.000 claims description 3
- 238000011144 upstream manufacturing Methods 0.000 claims description 3
- 241000203069 Archaea Species 0.000 claims description 2
- 206010003445 Ascites Diseases 0.000 claims description 2
- 108091029865 Exogenous DNA Proteins 0.000 claims description 2
- 241000238631 Hexapoda Species 0.000 claims description 2
- 241000270322 Lepidosauria Species 0.000 claims description 2
- 241000124008 Mammalia Species 0.000 claims description 2
- 241000186367 Mycobacterium avium Species 0.000 claims description 2
- 241000051112 Rhodococcus imtechensis Species 0.000 claims description 2
- 241000187432 Streptomyces coelicolor Species 0.000 claims description 2
- 210000004369 blood Anatomy 0.000 claims description 2
- 239000008280 blood Substances 0.000 claims description 2
- 210000001175 cerebrospinal fluid Anatomy 0.000 claims description 2
- 210000003756 cervix mucus Anatomy 0.000 claims description 2
- 239000003153 chemical reaction reagent Substances 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 210000002751 lymph Anatomy 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 235000013336 milk Nutrition 0.000 claims description 2
- 210000004080 milk Anatomy 0.000 claims description 2
- 239000008267 milk Substances 0.000 claims description 2
- 210000000056 organ Anatomy 0.000 claims description 2
- 238000001742 protein purification Methods 0.000 claims description 2
- 230000003248 secreting effect Effects 0.000 claims description 2
- 230000028327 secretion Effects 0.000 claims description 2
- 210000004911 serous fluid Anatomy 0.000 claims description 2
- 210000001138 tear Anatomy 0.000 claims description 2
- 210000001519 tissue Anatomy 0.000 claims description 2
- 210000002700 urine Anatomy 0.000 claims description 2
- 230000035772 mutation Effects 0.000 claims 2
- 241001147828 Mycobacterium haemophilum Species 0.000 claims 1
- 241000700605 Viruses Species 0.000 claims 1
- 244000000010 microbial pathogen Species 0.000 claims 1
- 238000002156 mixing Methods 0.000 claims 1
- 230000001717 pathogenic effect Effects 0.000 claims 1
- 230000035945 sensitivity Effects 0.000 abstract description 4
- 108091008324 binding proteins Proteins 0.000 abstract description 3
- 102000014914 Carrier Proteins Human genes 0.000 abstract 1
- 102000044158 nucleic acid binding protein Human genes 0.000 abstract 1
- 108700020942 nucleic acid binding protein Proteins 0.000 abstract 1
- 239000012634 fragment Substances 0.000 description 64
- 239000000523 sample Substances 0.000 description 54
- 235000018102 proteins Nutrition 0.000 description 38
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 22
- 241000187747 Streptomyces Species 0.000 description 17
- 238000003753 real-time PCR Methods 0.000 description 16
- 238000012163 sequencing technique Methods 0.000 description 15
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 13
- 235000001014 amino acid Nutrition 0.000 description 12
- 239000013612 plasmid Substances 0.000 description 12
- 238000004445 quantitative analysis Methods 0.000 description 12
- 238000006467 substitution reaction Methods 0.000 description 12
- 102100035102 E3 ubiquitin-protein ligase MYCBP2 Human genes 0.000 description 11
- 239000011780 sodium chloride Substances 0.000 description 11
- 150000001413 amino acids Chemical class 0.000 description 10
- 239000000872 buffer Substances 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 9
- 230000002441 reversible effect Effects 0.000 description 9
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 7
- 241000588724 Escherichia coli Species 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 239000011535 reaction buffer Substances 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 239000006228 supernatant Substances 0.000 description 6
- 230000005782 double-strand break Effects 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 238000009396 hybridization Methods 0.000 description 5
- 238000000338 in vitro Methods 0.000 description 5
- 239000000243 solution Substances 0.000 description 5
- 238000013518 transcription Methods 0.000 description 5
- 230000035897 transcription Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000010790 dilution Methods 0.000 description 3
- 239000012895 dilution Substances 0.000 description 3
- 238000011534 incubation Methods 0.000 description 3
- 108020004705 Codon Proteins 0.000 description 2
- 241000196324 Embryophyta Species 0.000 description 2
- 108010067770 Endopeptidase K Proteins 0.000 description 2
- 102000023732 binding proteins Human genes 0.000 description 2
- 238000012258 culturing Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000011901 isothermal amplification Methods 0.000 description 2
- 239000012139 lysis buffer Substances 0.000 description 2
- 238000011880 melting curve analysis Methods 0.000 description 2
- 238000002703 mutagenesis Methods 0.000 description 2
- 231100000350 mutagenesis Toxicity 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000002415 sodium dodecyl sulfate polyacrylamide gel electrophoresis Methods 0.000 description 2
- 238000002525 ultrasonication Methods 0.000 description 2
- 238000005406 washing Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 101150104241 ACT gene Proteins 0.000 description 1
- 244000061520 Angelica archangelica Species 0.000 description 1
- 241000589174 Bradyrhizobium japonicum Species 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- 101100086400 Escherichia coli (strain K12) radD gene Proteins 0.000 description 1
- 230000005526 G1 to G0 transition Effects 0.000 description 1
- 206010064571 Gene mutation Diseases 0.000 description 1
- 235000001287 Guettarda speciosa Nutrition 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 102000016943 Muramidase Human genes 0.000 description 1
- 108010014251 Muramidase Proteins 0.000 description 1
- 241000187480 Mycobacterium smegmatis Species 0.000 description 1
- 108010062010 N-Acetylmuramoyl-L-alanine Amidase Proteins 0.000 description 1
- 241000187654 Nocardia Species 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 108091005804 Peptidases Proteins 0.000 description 1
- 102000035195 Peptidases Human genes 0.000 description 1
- 241000288906 Primates Species 0.000 description 1
- 102000002067 Protein Subunits Human genes 0.000 description 1
- 108010001267 Protein Subunits Proteins 0.000 description 1
- 102000007056 Recombinant Fusion Proteins Human genes 0.000 description 1
- 108010008281 Recombinant Fusion Proteins Proteins 0.000 description 1
- 241000316848 Rhodococcus <scale insect> Species 0.000 description 1
- 102000006943 Uracil-DNA Glycosidase Human genes 0.000 description 1
- 108010072685 Uracil-DNA Glycosidase Proteins 0.000 description 1
- 210000001015 abdomen Anatomy 0.000 description 1
- 230000002378 acidificating effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000001042 affinity chromatography Methods 0.000 description 1
- 238000012867 alanine scanning Methods 0.000 description 1
- 238000010171 animal model Methods 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000013375 chromatographic separation Methods 0.000 description 1
- 238000004132 cross linking Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000012153 distilled water Substances 0.000 description 1
- 230000001605 fetal effect Effects 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- BPHPUYQFMNQIOC-NXRLNHOXSA-N isopropyl beta-D-thiogalactopyranoside Chemical compound CC(C)S[C@@H]1O[C@H](CO)[C@H](O)[C@H](O)[C@H]1O BPHPUYQFMNQIOC-NXRLNHOXSA-N 0.000 description 1
- 239000003446 ligand Substances 0.000 description 1
- 239000007791 liquid phase Substances 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 229960000274 lysozyme Drugs 0.000 description 1
- 235000010335 lysozyme Nutrition 0.000 description 1
- 239000004325 lysozyme Substances 0.000 description 1
- 238000007885 magnetic separation Methods 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 208000012268 mitochondrial disease Diseases 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- 210000003516 pericardium Anatomy 0.000 description 1
- 210000004303 peritoneum Anatomy 0.000 description 1
- 210000004224 pleura Anatomy 0.000 description 1
- 238000003752 polymerase chain reaction Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- -1 salt ion Chemical class 0.000 description 1
- 239000012488 sample solution Substances 0.000 description 1
- 238000002741 site-directed mutagenesis Methods 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 230000009870 specific binding Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/24—Hydrolases (3) acting on glycosyl compounds (3.2)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N15/00—Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
- C12N15/09—Recombinant DNA-technology
- C12N15/11—DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
- C12N15/113—Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N9/00—Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
- C12N9/14—Hydrolases (3)
- C12N9/78—Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Y—ENZYMES
- C12Y305/00—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
- C12Y305/04—Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
- C12Y305/04001—Cytosine deaminase (3.5.4.1)
-
- C—CHEMISTRY; METALLURGY
- C07—ORGANIC CHEMISTRY
- C07K—PEPTIDES
- C07K2319/00—Fusion polypeptide
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Biotechnology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biochemistry (AREA)
- General Health & Medical Sciences (AREA)
- Microbiology (AREA)
- Physics & Mathematics (AREA)
- Medicinal Chemistry (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Plant Pathology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Immunology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to fusion proteins of specific nucleic acid binding proteins and methods of capturing specific nucleic acids. The fusion protein is formed by fusing a cytosine base editor and a uracil binding protein, and the method is a method for capturing DNA containing a specific target sequence (containing PAM and cytosine C at a specific position) from a sample, and comprises the following steps: 1) Contacting a sample containing said target DNA with the UdgX-SSBE3 protein to capture the target DNA, and 2) detecting the target DNA. The invention also provides for the use of DNA containing a specific target sequence to capture and/or detect DNA from a sample by base editing-uracil binding proteins. The method can capture the target DNA from the complex nucleic acid sample with high sensitivity and specificity, and has practical application value.
Description
Technical Field
The present disclosure relates to methods of capturing target nucleic acids. In particular, the present disclosure relates to methods and uses for capturing target nucleic acids using the UdgX-SSBE3 protein.
Background
There are a number of methods and applications for capturing target nucleic acids from polynucleotide samples (e.g., in the whole genome). Due to the influence of complex DNA samples (e.g. influencing the signal to noise ratio), the specific recognition of the sequence of interest is not favored, and capture of the sequence of interest is required. In research work, specific regions were captured using gene capture probesThe region was designed with a length of probe sequences, each sequence was shifted a distance along the gene position, and the captured DNA products were sequenced by synthesizing the above sequences in large quantities by artificial synthesis (Calvo SE, compton AG, hershman SG, et al molecular diagnosis of infantile mitochondrial disease with targeted next-generation sequencing Sci Transl Med,2012,4 (118): 118-110). The current methods for the capture of specific nucleic acids involve multiple steps, require the use of large amounts of nucleic acid samples, and require expensive instrumentation, which is cumbersome, difficult, time-consuming, labor-consuming, low-accuracy, and costly. For example, targeted DNA hybrid capture techniques require the design of specific single-stranded DNA over 100bp in length, and complex hybridization processes of up to tens of hours or more with sample nucleic acids under specific conditions. Whereas long hybridization incubations significantly affect the progress of capture. In addition, higher incubation temperatures and prolonged incubation will affect the salt ion concentration in the mixture, especially when the hybridization reaction is small in volume. For example, the target DNA capturing rate of the existing solid or liquid phase targeting DNA hybrid capturing technology can only reach 400 times (Enrichment of sequencing targets from the human genome by solution hybridization. Ryan Tewhey, masakazu Nakano, xaoyunwang, carlos)Barbara Novak, angelica Giuffre, eric Lin, scott Happe, doug N Roberts, emily M LeProst, eric J Topol, olivier Harismendy and Kelly A Frazer. Genome biol.2009;10 (10) R116.Doi:10.1186/gb-2009-10-10-r 116) and 1000 times (Microarray-based genomic selection for high-through put request sequencing. David T Okou, karyn Meltz Steinberg, christina Middle, david J Cutler, thomas J Albert)&Michael E zwick. Nat methods.2007Nov;4 (11):907-9). In addition, the process has high cost, low specificity and low DNA capturing efficiency, and the capturing effect is greatly influenced by personnel experiment technology and cannot realize automation. There is therefore a need to develop new methods for specific nucleic acid capture that are rapid, efficient, and low cost.
Disclosure of Invention
The invention firstly provides a fusion protein which is formed by fusing a cytosine base editor and a uracil binding protein, wherein the cytosine base editor is UdgX, and the uracil binding protein is a BE3 editor or an optimized SSBE3 editor thereof.
The uracil binding proteins are UdgX proteins from M.smegmatis, M.avium, R.imtechensis, M.haemaphium, rhodococcus, streptomyces coelicolor, gordoniana mibiense, bradyrhizobium japonicum and Nocardia farcidia. More specifically, the amino acid sequence is shown as SEQ ID No. 22.
The cytosine base editor is a cytosine base editor and is a BE3 type editor or a preferable BE3 type editor, particularly a preferable BE3 type editor is SSBE3, and the amino acid sequence of the cytosine base editor is shown as SEQ ID NO: 23.
The uracil binding protein and the cytosine base editor are connected through a linker, and preferably the amino acid sequence of the linker is shown in SEQ ID NO: shown at 24. In one embodiment, the amino acid sequence of the fusion protein is shown in SEQ ID No. 1.
In some embodiments, the fusion protein comprises UdgX having the sequence of motif a (GEQPG) and motif B (HPSSLL) and comprises the conserved region KRRIH. In some embodiments, the fusion protein may be a protein comprising or consisting of the sequence of SEQ ID No.1 or the amino acid sequence encoded by SEQ ID No.2 or a variant thereof.
In some embodiments, the fusion protein may be a protein comprising or consisting of an amino acid sequence that is about 60%,65%,70%,75%,80%,85%,90%,91%,92%,93%,94%,95%,96%,97%,98%,99% or more or 100% identical to the sequence of SEQ ID No.1 or the amino acid sequence encoded by SEQ ID No.2. In some embodiments, the variant of the fusion protein has one or more unnatural amino acids, one or more amino acid substitutions, one or more amino acid insertions, one or more amino acid deletions, or any combination thereof at one or more positions as compared to the fusion protein. In some embodiments, the variant has substantially similar or comparable activity to the fusion protein. In some embodiments, variants of the UdgX-SSBE3 protein may have about 60%,65%,70%,75%,80%,85%,90%,91%,92%,93%,94%,95%,96%,97%,98%,99% or more sequence identity to the amino acid sequence of the fusion protein. As known to those skilled in the art, variants of the protein may be obtained by introducing conservative substitutions, deletions and additions of amino acids in the preparation of the recombinant protein. The desired substitution, deletion or insertion may also be provided by altering the specific codon of the coding sequence. Alternatively, protein variants may be prepared by random or saturation mutagenesis techniques such as alanine scanning mutagenesis, error-prone polymerase chain reaction mutagenesis and oligonucleotide-directed mutagenesis. In some embodiments, conservative substitutions include substitutions between amino acids of similar nature, such as substitutions between hydrophobic amino acids Nle, met, ala, val, leu, lie, substitutions between neutral hydrophilic amino acids Cys, ser, thr, asn, gin, substitutions between acidic amino acids Asp, glu, substitutions between basic amino acids His, lys, arg, substitutions between influencing strand oriented amino acids Gly, pro, and substitutions between aromatic amino acids Trp, tyr, phe. In some embodiments, non-conservative substitutions between the above types may be included.
In some embodiments, the fusion protein also has a purification tag, thereby facilitating easy isolation and/or purification. In some embodiments, purification tags that may be used include tags commonly used for protein purification, such as Snap-tag, his-tag, flag-tag, MBP-tag, biotin, and the like.
The invention thus also provides polynucleotides encoding the fusion proteins, expression vectors and recombinant host cells. In particular, the polynucleotide sequence is shown in SEQ ID No.2.
Furthermore, the present invention provides a method for capturing a specific nucleic acid, and more particularly, to a method for simultaneously editing uracil-producing proteins and uracil-binding proteins for capturing a specific target sequence DNA (PAM-containing and cytosine C-containing at a specific position) using the fusion protein and application thereof. Wherein PAM-containing and cytosine C-containing at a specific position means NGG or NG or other type of PAM sequence, with a length of 8-30bp, preferably 20bp NGG for pre-PAM. The cytosine-containing C at a specific position means 3 to 25 bases, preferably 5 to 20 bases, more preferably 11 to 17 bases, upstream of the 5' end of the PAM site.
In one aspect, provided herein is a method of capturing target DNA from a sample, the method comprising:
1) Contacting a polynucleotide sample containing the target DNA with a specific fusion protein having base editing and uracil binding functions and sgrnas targeting the target sequence to obtain a fusion protein-target DNA complex;
2) Target DNA is detected and captured.
In a specific embodiment, in step 1) of the method, a target sequence DNA (PAM-containing and cytosine C-containing at a specific position) is selected to be captured, a target sequence is targeted to the target sequence using sgrnas targeting the target DNA such that the target base C in the target codon is edited to U, and the fusion protein can be covalently bound to the target base U in the target sequence, thereby obtaining a fusion protein-target DNA complex.
In a specific embodiment, washing the fusion protein-target DNA complex at least 3 times, preferably with PBS buffer or Tris-HCl (pH 8.0), is also comprised in step 1) of the method.
In a specific embodiment, step 2) of the method further comprises binding the fusion protein-target DNA complex to an affinity matrix, preferably magnetic beads, more preferably Snap magnetic beads.
In some embodiments, a polynucleotide sample containing the target DNA may be mixed with the fusion protein and the corresponding sgRNA and contacted to capture the target DNA. In some embodiments, a solution containing the fusion protein and corresponding sgrnas may be added to a polynucleotide sample containing the target DNA or a sample may be added to a solution containing the fusion protein and corresponding sgrnas, which are contacted to capture the target DNA.
In some embodiments, the DNA of interest is double-stranded DNA or single-stranded DNA or a mixture of both comprising DNA of the target sequence (PAM-containing and cytosine C at a specific position). In some embodiments, the DNA of interest contains one or more naturally occurring or artificially added target sequence DNAs (containing PAM and cytosine C at a specific position). In some embodiments, the target DNA is exogenous DNA or endogenous DNA relative to the subject. In some embodiments, the DNA of interest is from a human, a microorganism, an animal, or a plant. The method of the present invention can advantageously achieve capture and detection of target DNA present in a sample at a low abundance. The method of the present invention is directed to capturing target DNA in a sample, i.e. to solving the problem of obtaining low abundance target DNA in a sample.
In some embodiments, the fusion protein may include a protein subunit that specifically binds uracil, such as the UdgX protein. In some embodiments, the UdgX-SSBE3 protein specifically binds to its uracil obtained by base editing without binding to other bases or with greater affinity than binding to other bases. From the prior art, one of ordinary skill in the art can determine that uracil binding proteins have a strong affinity for uracil in DNA. (A unique urosil-DNA binding protein of the uracil DNA glycosylase superfamily, pau Biak Sang, thiruneelakantan Srinath, aravind Goud Patil, eui-Jeon Woo andUmesh Varshney. Nucleic Acids research.30;43 (17): 8452-63.Doi:10.1093/nar/gkv 854.).
In some embodiments, the method comprises isolating a complex of the fusion protein and the target DNA to obtain the target DNA. In some embodiments, the complex of fusion protein and target DNA may be isolated by any suitable separation method. For example, in some embodiments, complexes of fusion proteins and target DNA may be separated from complex nucleic acid samples by affinity separation techniques, and then recovered to obtain captured target DNA. Specific binding between molecules is called affinity, and techniques for purifying biomolecules using affinity are called affinity separation techniques. In some embodiments, the separation is performed by differences in affinity between the stationary phase-based ligand and the target molecule and affinity with other molecules. In some embodiments, the complex of fusion protein and target DNA may be obtained by chromatographic separation, such as affinity column chromatography. In some embodiments, the complex of uracil binding protein and target DNA can be obtained by magnetic bead separation, such as affinity magnetic beads. In some embodiments, the fusion protein has a capture tag, thereby facilitating easy isolation and/or purification. In some embodiments, capture tags that may be used include tags commonly used for protein capture such as Snap-tag, his-tag, flag-tag, MBP-tag, biotin, and the like. In some embodiments, complexes of fusion proteins and target DNA can be obtained by affinity separation using Snap magnetic beads (e.g., the NEB commercially available Snap magnetic beads). In some embodiments, the complex of fusion protein and target DNA is washed at least 3 more times with lysis buffer after binding to magnetic beads or affinity columns. In some embodiments, the lysis buffer is PBS buffer or Tris-HCl (pH 8.0). It has been found that further preference is given to de-crosslinking after removal of impurities, in order to facilitate capturing and/or detecting the target DNA with increased sensitivity and specificity.
In some embodiments, the method may further comprise the step of uncrosslinking the fusion protein-DNA fragment complex. In some embodiments, the step may be accomplished by, for example, adding a proteolytic enzyme, such as proteinase K, to the resulting eluate of the fusion protein-DNA fragment complex. In some embodiments, the method may further comprise amplifying the captured target DNA. Methods for amplifying nucleic acids of interest are widely known in the art and include, for example, PCR amplification, isothermal amplification, and the like.
In some embodiments, the source of the sample is not particularly limited as long as it is a sample that may contain the target DNA. In some embodiments, the sample comprises a genomic DNA sample, a cell-free DNA sample, an environmental genomic sample, and/or a mixed genomic DNA sample. In some embodiments, the sample comprises double stranded DNA, single stranded DNA, or a mixture of both. In some embodiments, the sample may be a polynucleotide sample, such as a sample containing mixed nucleic acids. In some embodiments, the sample may be from a variety of organisms, such as humans, plants, animals, microorganisms, and the like. In some embodiments, the sample may be from bacteria, archaebacteria, protist, fungi, or the like. In some embodiments, the sample may be an environmental genome sample (metagenomic sample). In some embodiments, the sample may be directly or indirectly from the subject. For example, the sample may be collected directly from the subject, or may be from an isolated sample obtained from the subject. In some embodiments, the sample may be from an animal subject, such as a mammalian subject. In some embodiments, the sample may be from a primate, laboratory animal, farm animal, livestock or pet. In some embodiments, the sample may be from a mammal, a marine animal, an amphibian, a bird, a reptile, an insect, and other invertebrates. In some embodiments, the sample may be from a human subject. In some embodiments, the sample may be from the subject's blood, serum, serosal fluid, plasma, lymph, urine, cerebrospinal fluid, saliva, mucous secretions of secretory tissues and organs, vaginal secretions, milk, tears, ascites, for example, fluid from the pleura, pericardium, peritoneum, abdomen, or other body cavities. In some embodiments, the sample may be cell free DNA obtained from a body fluid of a subject, such as plasma or serum. In some embodiments, the sample may be a genomic DNA sample isolated from the source described above. In some embodiments, the sample may be a mixed sample obtained from multiple sources, such as a mixed genomic DNA sample. In some embodiments, the sample may be a bacterial and/or human genomic DNA sample. In some embodiments, the target DNA is in a fetal cell fraction of cell free DNA, and wherein the cell free DNA is from maternal plasma.
In some embodiments, capturing target DNA refers to the process of obtaining a higher percentage of target DNA in a population of polynucleotides. In some embodiments, the percentage of target DNA is increased by about 5%,10%,20%,30%,40%,50%,60%,70%,80%, or more than 90%. In some embodiments, the percentage of target DNA is increased by about 2-fold, 5-fold, 10-fold, 50-fold, or 100-fold. In some embodiments, uracil-containing target DNA can be captured by the methods of the invention by more than 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 200000, 300000, 400000, 500000, or more times. In some embodiments, target DNA may be captured and/or detected from a sample having a low content of target DNA by the methods of the present invention. For example, the target DNA is captured and/or detected from a sample having a ratio of the target DNA to the total DNA in the sample of 1/400, 1/500, 1/600, 1/700, 1/800, 1/900, 1/1000, 1/2000, 1/3000, 1/4000, 1/5000, 1/6000, 1/7000, 1/8000, 1/9000, 1/10000, 1/20000, 1/30000, 1/40000, 1/50000, 1/60000, 1/70000, 1/80000, 1/90000, 1/100000, 1/200000, 1/300000, 1/400000, 1/500000 or less. In some embodiments, the methods of the invention can capture and/or detect target DNA containing a gene mutation as low as 0.2% (or lower), for example, in a sample of non-mutated DNA containing about 99.8% (or higher). In some embodiments, the methods of the invention have been found to be capable of capturing and/or detecting target DNA with high sensitivity and/or specificity from complex nucleic acid samples.
In some embodiments, provided herein is a method of detecting a target DNA, the method comprising obtaining a captured target DNA by a method described herein, and then detecting the target DNA. In some embodiments, the method comprises amplifying the captured target DNA after obtaining the captured target DNA. In some embodiments, the presence and/or sequence of the target DNA may be detected by methods such as hybridization, PCR, isothermal amplification, sequencing, and/or a biochip.
In some embodiments, the fusion protein is used to capture and/or detect target DNA from a sample. In some embodiments, the fusion protein or composition comprising a base editing-pyrimidine binding protein is used to capture and/or detect DNA comprising a target sequence (PAM-containing and cytosine C-containing at a specific position) from a sample. In some embodiments, the fusion protein is used to prepare a composition and/or kit for capturing and/or detecting DNA containing a target sequence (PAM-containing and cytosine C-containing at a specific position). In some embodiments, it relates to the use of a fusion protein, preferably as defined above, for the preparation of a reagent for capturing a DNA comprising a target sequence (PAM-containing and cytosine C at a specific position) from a sample, preferably as defined above.
The method can capture target DNA from complex nucleic acid samples with high sensitivity and specificity, can capture a plurality of target DNA and long fragment target DNA at the same time, and has great application value.
Drawings
Fig. 1: the UdgX-SSBE3 protein captures DNA containing the sequence of interest.
Fig. 2: expression and purification of UdgX-SSBE3 protein.
Fig. 3: DNA sample concentration standard curve.
Fig. 4: q-PCR quantitative analysis of DNA.
Fig. 5: q-PCR quantitative analysis of the captured target DNA in E.coli genome.
Fig. 6: q-PCR quantitative analysis of captured target DNA in Streptomyces genome.
Fig. 7: effect of different kinds of sgrnas input on DNA relative capture rate.
Fig. 8: q-PCR quantitative analysis of simultaneous capture of multiple target DNAs in Streptomyces genome.
Fig. 9: sequencing coverage analysis of simultaneous capture of multiple target DNA in the streptomyces genome.
Fig. 10: effect of different concentrations of sgRNA input on relative DNA capture rate.
Fig. 11: q-PCR quantitative analysis of the captured long fragment of target DNA in Streptomyces genome.
Fig. 12: sequencing coverage analysis of the capture of long fragment target DNA in streptomyces genome.
Fig. 13: q-PCR quantitative analysis of captured target DNA in human saliva genome.
Fig. 14: sequencing coverage analysis of simultaneous capture of target DNA in human saliva genome.
Detailed Description
The invention will be further illustrated by the following examples for a better understanding of the invention, but without limiting the same.
EXAMPLE 1 editing and Capture of target DNA
1.1 expression and purification of UdgX-SSBE3 proteins
UdgX in the fusion protein is uracil-binding protein, and the amino acid sequence of the UdgX is SEQ ID NO:22, and SSBE3 is a cytosine-type base editor, which is optimized on a generic BE 3-type base editor (amino acid sequence see SEQ ID NO: 23). The amino acid sequence of the two sequences which are connected through a linker (SEQ ID NO: 24) is shown as SEQ ID No.1, and the coding nucleotide sequence is shown as SEQ ID No.2.
The UdgX-SSBE3 gene sequence (SEQ ID No. 2) was constructed on the expression vector pSnap-taq (T7) 2-His (containing the Snap tag) to obtain plasmid pSnap-UdgX-SSBE3. The plasmid pSnap-UdgX-SSBE3 was transformed into E.coli BL 21. BL21 bacteria containing plasmid pSnap-UdgX-SSBE3 were inoculated with 0.05% FeCl 3 Culturing in LB medium of (C) at 37 ℃ until OD600 reaches 0.6, adding IPTG to make the final concentration in the culture system be 0.5mM, and continuing culturing at 16 ℃ overnight. 20mL of the sample was centrifuged at 5000rpm/min for 30min, the supernatant was discarded, and the pellet was resuspended in 2mL of lysate (50 mM Tris-HCl buffer, 50mM NaCl, pH 8). Lysozyme was added to a final concentration of 0.1mg/mL, and the mixture was left on ice for 30min and then sonicated. Centrifugation at 5000rpm for 30min, and the supernatant was combined with Snap magnetic beads (e.g., NEB) overnight at 4deg.C. 2mL of buffer (50 mM Tris-HCl buffer, 50mM NaCl, pH 8) was added to the centrifuge tube containing the magnetic beads, the centrifuge tube was gently turned over several times to resuspend the magnetic beads, magnetic separation, washing the magnetic beads three times, and removing proteins not bound to the magnetic beads.
SDS-PAGE detected UdgX-SSBE3 protein (FIG. 2). SDS-PAGE detects the SSBE3 protein supernatant after ultrasonication and the SSBE3 protein supernatant after overnight binding to Snap beads at 4 ℃. As can be seen from FIG. 1, the protein supernatant after ultrasonication contains the target protein SSBE3, and the content of the target protein in the protein supernatant after being combined with Snap magnetic beads at 4 ℃ overnight is also reduced. It was shown that the target protein SSBE3 can bind to Snap magnetic beads.
1.2 acquisition of DNA fragments
The forward primer (SEQ ID No. 3) and the reverse primer (SEQ ID No. 4) were synthesized and amplified from the plasmid template by PCR to obtain a GFP fragment (SEQ ID No. 5) containing the editing site of interest.
The forward primer (SEQ ID No. 6) and the reverse primer (SEQ ID No. 4) were synthesized and amplified from the plasmid template by PCR to obtain a GFP fragment (SEQ ID No. 7) without the editing site of interest.
1.3 acquisition of sgRNA fragments
The forward primer (SEQ ID No. 8) and the reverse primer (SEQ ID No. 9) were synthesized and amplified from the plasmid template by PCR to obtain the sgDNA fragment (SEQ ID No. 10) corresponding to the editing site of interest.
And (3) using the sgRNA in-vitro transcription kit (in the flourishing industry), and transcribing the sgDNA obtained in the step as a template to obtain the corresponding sgRNA.
1.4 editing and binding of UdgX-SSBE3 protein and DNA fragments
UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively 2 EDTA,1mM DTT,25ug/ml BSA), 25℃for 5 minutes, then 500ng of DNA after double-strand break was added thereto, and the reaction was carried out at 37℃for 2 hours.
1.5 obtaining of target DNA fragments
After 2 hours of reaction at 37 ℃, the centrifuge tube was gently turned over several times to resuspend the beads and magnetically separate. The reaction solution was removed, 1mL of buffer (50 mM Tris-HCl buffer, 50mM NaCl, pH 8) was added to the centrifuge tube containing the magnetic beads, and the centrifuge tube was gently turned over several times to resuspend the magnetic beads and magnetically separate. Repeated 3 times. Finally, 500uL of buffer (50 mM Tris-HCl buffer, 50mM NaCl, pH 8) was added, proteinase K was added to the resulting eluate of UdgX protein-DNA fragment complex to a final concentration of 100ug/mL, and the solution was treated at 50℃for 30 minutes to crosslink the UdgX-SSBE3 protein-DNA fragment complex, thereby obtaining a target DNA fragment.
1.6 quantitative analysis of DNA fragments
The DNA fragments after the decrosslinking were quantified by q-PCR.
q-PCR reaction system: distilled water 6.4ul,SYBR.Green Realtime PCR Master Mix 10ul, upstream primer (SEQ ID No.11,10 uM) 0.8ul, downstream primer (SEQ ID No.12,10 uM) 0.8ul, sample solution 2ul.
Cycling conditions for q-PCR: 95 ℃ for 30s; PCR cycle (×40 cycles): 95℃for 5s,55℃for 10s,72℃for 15s (data collection); melting curve analysis (Melting Curve Analysis).
The standard curve was drawn after dilution of DNA samples quantified by Nanodrop (fig. 3).
The standard substance concentrations are respectively as follows: 0.000001ng/ul,0.00001ng/ul,0.0001ng/ul,0.001ng/ul and 0.01ng/ul.
The target DNA after the decrosslinking was quantified by q-PCR. The results showed that UdgX-SSBE3 can specifically capture GFP fragments containing the editing site of interest at a recovery concentration of about 314 times that of GFP fragments without the editing site of interest (FIG. 4).
EXAMPLE 2 isolation of target DNA from E.coli genomic DNA sample
2.1 expression and purification of UdgX protein (same as in example 1)
2.2 acquisition of genomic DNA samples
The forward primer (SEQ ID No. 3) and the reverse primer (SEQ ID No. 4) were synthesized and amplified from the plasmid template by PCR to obtain a GFP fragment (SEQ ID No. 5) containing the editing site of interest.
The forward primer (SEQ ID No. 6) and the reverse primer (SEQ ID No. 4) were synthesized and amplified from the plasmid template by PCR to obtain a GFP fragment (SEQ ID No. 7) without the editing site of interest.
Coli DH 5. Alpha. Genomic DNA was extracted using a genomic extraction kit (Biomega).
2.3 acquisition of sgRNA fragments (same as in example 1)
2.4 editing and binding of UdgX-SSBE3 proteins with DNA fragments
UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively 2 EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 6ug of the digested E.coli genomic DNA and the digested E.coli DNA were addedThe same amount of GFP fragment containing the editing site (1 ng and 10 ng) or 6ug of the excised E.coli genomic DNA and GFP fragment containing no editing site (500 ng) were added to 1mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na) 2 EDTA,1mM DTT,25ug/ml BSA), at 37℃for 2h.
2.5 obtaining of target DNA fragment (same as in example 1)
2.6 quantitative analysis of target DNA fragments
The target DNA after the decrosslinking was quantified by q-PCR, and GFP was amplified from the DNA after the decrosslinking using a primer (SEQ ID No. 11) and a primer (SEQ ID No. 12). The results showed that UdgX-SSBE3 can specifically capture GFP fragments containing the editing site of interest, and the total amount recovered was about 21.3 times (6 ug genomic DNA:10ng GFP fragment containing the editing site) and 2.1 times (6 ug genomic DNA:1ng GFP fragment containing the editing site) of GFP fragments not containing the editing site of interest (FIG. 5). Since the ratio of GFP fragment containing editing site and GFP fragment not containing editing site of the initial mixed DNA sample was 10:500 and 1:500, respectively, the specific capture rate of target DNA in the experiment was 1065-fold and 1050-fold.
Example 3 isolation of target DNA from Streptomyces genomic DNA samples.
3.1 expression and purification of UdgX protein (same as in example 1)
3.2 acquisition of genomic DNA samples
Streptomyces genomic DNA was extracted using a genome extraction kit (Biomega).
3.3 acquisition of sgRNA fragments
The forward primer (SEQ ID No.13, SEQ ID No.14 and SEQ ID No. 15) and the reverse primer (SEQ ID No. 9) were synthesized and amplified from the plasmid template by PCR to obtain sgDNA fragments (Act, redD and RedN) corresponding to the target editing site.
And (3) using the sgRNA in-vitro transcription kit (in the flourishing industry), and transcribing the sgDNA obtained in the step as a template to obtain the corresponding sgRNA.
3.4 editing and binding of UdgX proteins and DNA fragments
UdgX-SSBE3 protein bound to snap beads was added to 0.5mL of anti-reaction with sgRNA (Act, redD and RedN), respectivelyIn buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na) 2 EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of Streptomyces genomic DNA after double strand break was added and reacted at 37℃for 2 hours.
3.5 obtaining of target DNA fragment (same as in example 1)
3.6 quantitative analysis of target DNA fragments
The target DNA after the decrosslinking was quantified by q-PCR, and primers (SEQ ID No.16 and SEQ ID No. 17), primers (SEQ ID No.18 and SEQ ID No. 19) and primers (SEQ ID No.20 and SEQ ID No. 21) were used to amplify (Act, redD and RedN) from the DNA after the decrosslinking, respectively. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the editing site of interest in the genome of Streptomyces, with capture amounts of 0.007ng (Act), 0.004ng (RedD) and 0.01ng (RedN) (FIG. 6).
The experiment was repeated as in example 3, in which only 3 sgrnas at concentrations of (1-10 ng/uL) were put into 1 reaction system (1 uL of each sgRNA) while capturing 3 target DNAs. The repeated experiment results show that: with respect to the results of example 2, the capture efficiency of the Act gene was reduced to 25%, the capture efficiency of the RadD gene was reduced to 28%, and the capture efficiency of the RadN gene was reduced to 23% (fig. 7).
Example 4 multiple target DNA were isolated simultaneously from Streptomyces genomic DNA samples.
4.1 expression and purification of UdgX protein (same as in example 1)
4.2 acquisition of genomic DNA samples
Streptomyces genomic DNA was extracted using a genome extraction kit (Biomega).
4.3 acquisition of sgRNA fragments
The forward and reverse primers were synthesized and amplified from the plasmid template by PCR to obtain sgDNA fragments (Target 1-Target 7) corresponding to the Target editing sites.
And (3) using the sgRNA in-vitro transcription kit (in the flourishing industry), and transcribing the sgDNA obtained in the step as a template to obtain the corresponding sgRNA.
4.4 editing and binding of UdgX proteins and DNA fragments
Will bind to the snap magnetic beadsUdgX-SSBE3 protein and 10 sgRNAs (Target 1-7, act, radD and RadN) were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na) 2 EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of Streptomyces genomic DNA after double strand break was added and reacted at 37℃for 2 hours.
4.5 obtaining of target DNA fragment (same as in example 1)
4.6 quantitative analysis of target DNA fragments
The target DNA after the decrosslinking is quantified by q-PCR, and primers are respectively applied to amplify the target DNA from the DNA after the decrosslinking. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the Target editing site in the genome of Streptomyces, with capture amounts of 0.019ng (Target 1), 0.039ng (Target 2), 0.052ng (Target 3), 0.005ng (Target 4), 0.03ng (Target 5), 0.012ng (Target 6), 0.004ng (Target 7), 0.016ng (RedD), and 0.002ng (RedN) and 0.006ng (Act) (FIG. 8).
4.7 second Generation sequencing of target DNA fragments
And (3) taking the target DNA after the decrosslinking as a template, constructing a second-generation library by using a library-building kit (VAHTS Universal DNA Library Prep Kit for Illumina V3) of the Norwegian company, and carrying out second-generation sequencing on the constructed DNA library by using Norhogenic science and technology Co., ltd. Sequencing results showed that 9 target sites could be sequenced, coverage 22022-592551 (FIG. 9).
The experiment was repeated as in example 4, wherein only 10 sgrnas were mixed in equal volumes and added in 1uL in total, the concentration was diluted to 1/10 or 1/20 of the initial concentration, the concentration of the sgrnas in the reaction system was 1/10 or 1/20 of the initial value, and 10 target DNAs were captured simultaneously. The results of the repeated experiments show (fig. 10): the capture efficiency of the target DNA was reduced by a factor of 10 (25.2% -98.6%) after dilution of the sgRNA relative to the results of example 4. After 20-fold dilution of sgRNA, the capture efficiency of target DNA was reduced to (0.02% -1.05%). The minimum amount of sgrnas required for this test input is used to derive the amount of sgrnas that can be input most by one reaction, the greater the amount of sgrnas that can be input, the greater the corresponding types and lengths of target DNA that can be captured.
Example 5 capturing long fragments of target DNA (10 KB) from Streptomyces genomic DNA samples.
5.1 expression and purification of UdgX protein (same as in example 1)
5.2 acquisition of genomic DNA samples
Streptomyces genomic DNA was extracted using a genome extraction kit (Biomega).
5.3 acquisition of sgRNA fragments
The forward and reverse primers were synthesized and amplified from the plasmid template by PCR to obtain sgDNA fragments (Target 2, target8-Target26, 20 of which) corresponding to the Target editing site.
And (3) using the sgRNA in-vitro transcription kit (in the flourishing industry), and transcribing the sgDNA obtained in the step as a template to obtain the corresponding sgRNA.
5.4 editing and binding of UdgX proteins and DNA fragments
UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively 2 EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of Streptomyces genomic DNA after double strand break was added and reacted at 37℃for 2 hours.
5.5 obtaining of target DNA fragment (same as in example 1)
5.6 quantitative analysis of target DNA fragments
The target DNA after the decrosslinking is quantified by q-PCR, and primers are respectively applied to amplify the target DNA from the DNA after the decrosslinking. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the Target editing site in the streptomyces genome in the amounts of 0.095ng (Target 8), 0.111ng (Target 9), 0.004ng (Target 10), 0.005ng (Target 11), 0.069ng (Target 12), 0.035ng (Target 13), 0.007ng (Target 14), 0.034ng (Target 15), 0.19ng (Target 16), 0.145ng (Target 2), 0.005ng (Target 17), 0.082ng (Target 18), 0.264ng (Target 19), 0.005ng (Target 20), 0.016ng (Target 21), 0.103 (Target 22), 0.164 (Target 23), 0.292 (Target 24), 0.549 (Target 25) and 0.493ng (26) (fig. 11).
5.7 second Generation sequencing of target DNA fragments
And (3) taking the target DNA after the decrosslinking as a template, constructing a second-generation library by using a library-building kit (VAHTS Universal DNA Library Prep Kit for Illumina V3) of the Norwegian company, and carrying out second-generation sequencing on the constructed DNA library by using Norhogenic science and technology Co., ltd. Sequencing results showed that 20 target sites could be sequenced, coverage 21105-1744840, and a long fragment of 10KB could be spliced (FIG. 12).
Example 6 simultaneously capturing long fragments of target DNA (10 KB) from a human saliva genomic DNA sample.
6.1 expression and purification of UdgX protein (same as in example 1)
6.2 acquisition of genomic DNA samples
Human saliva genomic DNA was extracted using saliva genomic DNA Rapid extraction kit (Biomed).
6.3 acquisition of sgRNA fragments
The forward and reverse primers were synthesized and amplified from the plasmid template by PCR to give the sgDNA fragment (Target 27-Target 41) corresponding to the editing site of interest.
And (3) using the sgRNA in-vitro transcription kit (in the flourishing industry), and transcribing the sgDNA obtained in the step as a template to obtain the corresponding sgRNA.
6.4 editing and binding of UdgX protein and DNA fragments
UdgX-SSBE3 protein and sgRNA bound to the snap beads were added to 0.5mL of reaction buffer (50 mM Tris-HCl, pH8,50mM NaCl,1mM Na, respectively 2 EDTA,1mM DTT,25ug/ml BSA), at 25℃for 5 minutes, after which 8ug of human saliva genomic DNA after double-strand break was added and reacted at 37℃for 2 hours.
6.5 obtaining of target DNA fragment (same as in example 1)
6.6 quantitative analysis of target DNA fragments
The target DNA after the decrosslinking is quantified by q-PCR, and primers are respectively applied to amplify the target DNA from the DNA after the decrosslinking. The results showed that UdgX-SSBE3 can specifically capture DNA fragments containing the Target editing site in human saliva genome in the amounts of 0.00005ng (Target 27), 0.00005ng (Target 28), 0.00008ng (Target 29), 0.0002ng (Target 30), 0.0006ng (Target 31), 0.0002ng (Target 32), 0.002ng (Target 33), 0.0013ng (Target 34), 0.001ng (Target 35), 0.0003ng (Target 36), 0.0023ng (Target 37), 0.0012ng (Target 38), 0.0041ng (Target 39), 0.0037ng (Target 40) and 0.003ng (Target 41) (fig. 13).
6.7 second Generation sequencing of target DNA fragments
And (3) taking the target DNA after the decrosslinking as a template, constructing a second-generation library by using a library-building kit (VAHTS Universal DNA Library Prep Kit for Illumina V3) of the Norwegian company, and carrying out second-generation sequencing on the constructed DNA library by using Norhogenic science and technology Co., ltd. Sequencing results showed that 20 target sites could be sequenced, coverage was 1000-2909520, and long fragments of 10KB could be spliced (FIG. 14).
Claims (10)
1. A fusion protein formed by fusing a cytosine base editor and a uracil binding protein;
more preferably, the uracil-binding protein is UdgX, in particular comprising motif a in UdgX: GEQPG and motif B: sequences of HPSSLL and comprise the conserved region KRRIH; more specifically, the uracil-binding protein UdgX protein is derived fromM. smegmatis,M. avium, R.imtechensis,M.haemophilum, Rhodococcusspp, Streptomycescoelicolor, Gordonianamibiense, BradyrhizobiumjaponicumAndNocardia farcidiathe method comprises the steps of carrying out a first treatment on the surface of the More preferably, the amino acid sequence is shown in SEQ ID No. 22;
the cytosine base editor is a BE3 editor (e.g., VQR-BE3, VRER-BE3, EQR-BE3, etc. with different identified pam types), an aid editor, or an optimized SSBE3 editor thereof;
after the cytosine base editor and the uracil binding protein are connected through a linker, the amino acid sequence of the specific linker is shown in SEQ ID NO: shown at 24.
2. The fusion protein of claim 1, wherein the fusion protein has an amino acid sequence as set forth in SEQ ID No. 1.
3. The fusion protein of claim 1 or 2, wherein the fusion protein further has a purification tag to facilitate easy isolation and/or purification; specifically, the purification tag comprises Snap-tag, his-tag, flag-tag, MBP-tag and biotin tag for protein purification.
4. A polynucleotide encoding a fusion protein according to any one of claims 1 to 3, an expression vector and a recombinant host cell, in particular the polynucleotide sequence is shown in SEQ ID No.2.
5. A method of capturing target DNA from a sample, the method comprising:
1) Contacting, e.g. mixing, a sample containing a target sequence DNA having PAM and cytosine C at a specific position with its targeting sgRNA and a fusion protein according to any one of claims 1 to 3 to obtain a fusion protein-target DNA complex;
2) Recovering the enriched target sequence DNA, preferably wherein the target sequence DNA is double stranded DNA or single stranded DNA or a mixture of both;
preferably, step 2) further comprises binding the fusion protein-target DNA complex to an affinity matrix, preferably magnetic beads, more preferably Snap magnetic beads;
the specific position containing cytosine C means 3 to 25 bases, preferably 5 to 20 bases, more preferably 11 to 17 bases, upstream of the 5' end of the PAM site.
6. The method of claim 5, wherein the sample is from a plant, animal and/or microorganism, such as from a bacterium, archaebacteria, protozoa, fungus, mammal, amphibian, bird, reptile, insect and/or invertebrate, such as from a human, more preferably the sample is from a subject's blood, serum, serosal fluid, plasma, lymph, urine, cerebrospinal fluid, saliva, mucous secretions of secretory tissues and organs, vaginal secretions, milk, tears and/or ascites;
in particular, wherein the target sequence DNA is exogenous DNA relative to the subject, such as DNA from a pathogenic microorganism, e.g., a virus, bacterium, and/or fungus, or endogenous DNA of the subject, e.g., DNA from the subject comprising a genetic mutation, e.g., DNA comprising a pathogenic genetic mutation;
in addition, preferably, the sample comprises a genomic DNA sample, a cell-free DNA sample, an environmental genomic sample, and/or a mixed genomic DNA sample.
7. The method of claim 5, wherein in step 1) of the method, a sample containing the target sequence DNA is contacted with the fusion protein, in combination with a different kind of targeting sgrnas to obtain uracil-binding protein-target DNA complexes; specifically, each sgRNA concentration was 0.1ng/uL-1ng/uL,0.2ng/uL-2ng/uL, and 2ng/uL-20ng/uL, respectively.
8. A method of detecting a target DNA, the method comprising the steps of capturing the target DNA by the method of any one of claims 5 to 7, and detecting the presence or absence of the target DNA;
preferably, the step of capturing the target DNA is followed by a step of amplifying the captured target DNA after obtaining the captured target DNA.
9. Use of a fusion protein according to any one of claims 1 to 3 for the enrichment and/or detection of DNA containing a C-target sequence having PAM and containing cytosine at a specific position from a sample.
10. Use of a fusion protein according to any one of claims 1 to 3 for the preparation of a reagent for capturing DNA containing a target sequence from a sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310860511.XA CN116949014A (en) | 2023-07-13 | 2023-07-13 | UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310860511.XA CN116949014A (en) | 2023-07-13 | 2023-07-13 | UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116949014A true CN116949014A (en) | 2023-10-27 |
Family
ID=88457716
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310860511.XA Pending CN116949014A (en) | 2023-07-13 | 2023-07-13 | UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116949014A (en) |
-
2023
- 2023-07-13 CN CN202310860511.XA patent/CN116949014A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11339432B2 (en) | Nucleic acid constructs and methods of use | |
CN102732506B (en) | Methods and compositions for enriching target polynucleotides or non-target polynucleotides from a mixture of target and non-target polynucleotides | |
JP7033602B2 (en) | Barcoded DNA for long range sequencing | |
CN112301016B (en) | Application of novel mlCas12a protein in nucleic acid detection | |
CN109971834B (en) | Normal temperature nucleic acid amplification reaction | |
US9145580B2 (en) | Methods and compositions for enriching either target polynucleotides or non-target polynucleotides from a mixture of target and non-target polynucleotides | |
US20240076717A1 (en) | Methods and compositions for selective cleavage of nucleic acids with recombinant nucleases | |
CN113699214A (en) | Sequencing method based on gene capture technology | |
CN109266628B (en) | Fused TaqDNA polymerase and application thereof | |
CN109486788B (en) | Mutant DNA polymerase and preparation method and application thereof | |
CN116949014A (en) | UdgX-SSBE3 protein and method for capturing specific nucleic acid by using same | |
CN112080555A (en) | DNA methylation detection kit and detection method | |
CN111041023A (en) | Specific nucleic acid binding proteins and methods for enriching for specific nucleic acids | |
CN111615560A (en) | Methods for selecting polynucleotides based on enzyme interaction duration | |
CN114958808B (en) | CRISPR/Cas system for small-sized genome editing and special CasX protein thereof | |
CN118186066A (en) | Application and method of C2C9 nuclease in preparing gene detection product | |
CN113337489A (en) | Novel chimera TsCas12a protein and preparation technology | |
JP2006094830A (en) | Microorganism community analysis method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |