CN113717256A

CN113717256A - Fusion protein and application thereof

Info

Publication number: CN113717256A
Application number: CN202011308076.2A
Authority: CN
Inventors: 姚红杰; 李尧益; 王新秀; 黄赛南
Original assignee: Guangzhou Institute of Biomedicine and Health of CAS; Bioisland Laboratory
Current assignee: Guangzhou Institute of Biomedicine and Health of CAS; Bioisland Laboratory
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-11-30
Anticipated expiration: 2040-11-19
Also published as: CN113717256B

Abstract

The invention relates to a fusion protein for preparing a single-cell in-situ active R-loop library and application thereof, wherein the fusion protein relates to R-loop specific binding protein HBD, MNase nuclease and Tn5 transposase. The fusion protein is mainly used for R-loop detection and high-throughput library construction. The fusion protein is used for detecting R-loop and constructing a high-throughput library, can realize in-situ active R-loop detection, can improve library construction efficiency, reduce library background, improve the accuracy of an R-loop detection technology, and simplify an R-loop detection process.

Description

Fusion protein and application thereof

Technical Field

The invention relates to the technical field of biology, in particular to a fusion protein and application thereof.

Background

R-loop is a three-stranded nucleic acid structure, i.e., one strand of RNA binds to one strand of double-stranded DNA, and the other strand of DNA leaves a loop with RNA-DNA hybrid strands. R-loop length has previously been considered harmful to cells and has received less attention. In recent years, R-loop, an important element of the cell genome, has been involved in many biological functions and has become an important research field in epigenetics. Currently, most methods for detecting R-loop are based on the DRIP-seq of antibody S9.6 and the derivation of DRIP-seq.

DRIP-seq ((DNA: RNA hybrid immunoprepitition and sequencing) is a co-immunoprecipitation and high-throughput sequencing analysis technology based on an antibody specifically recognizing a DNA: RNA heterozygous chain, and has been used for detecting R-loop distribution at the whole genome level of model organisms such as human, mice, yeast and the like.A derivative method of DRIP-seq and DRIP-seq roughly comprises the steps of collecting cells and other biological samples, extracting genomic DNA, cutting the genomic DNA into DNA fragments with certain size by using restriction endonuclease, immunoprecipitating and enriching the DNA fragments containing the R-loop by using an antibody S9.6, purifying and recovering the DNA fragments, constructing a library by using different library construction methods, but the existing derivative methods of DRIP-seq and DRIP-seq have the defects of poor resolution of detection signals and poor specificity of the antibody with limited S9.6. particularly S9.6 can recognize double-chain RNA. And active R-loop means are scarce as to how to detect single cell levels.

In general, the existing R-loop detection methods have some defects, mainly including: 1. the amount of cells used is large; 2. not in situ detection; 3. lack of strand-specific information; 4. the time required by the process is long.

However, the establishment of a novel single-cell active R-loop detection method has important significance for researching the biological function of R-loop, and the key point is that a more suitable polypeptide is needed to enable the establishment of the novel single-cell active R-loop detection method.

Disclosure of Invention

The invention aims to provide a fusion protein for preparing a single-cell in-situ active R-loop library, which can be used for a novel single-cell active R-loop detection method and has the advantages of less required cell amount, short detection time and the like.

The technical scheme for achieving the purpose is as follows.

A fusion protein comprises a dimer formed by a first functional region and a second functional region, wherein the first functional region comprises R-loop specific binding protein (HBD);

the second functional region comprises MNase nuclease or Tn5 transposase;

a linker connecting the first functional region and the second functional region.

In some embodiments, the HBD is an R-loop specific recognition protein, the amino acid sequence of the R-loop specific recognition protein is shown in SEQ ID NO.1, or the HBD is an amino acid sequence which is substituted, deleted or added with one or more amino acids on the basis of the sequence shown in SEQ ID NO.1 and has the same function.

In some embodiments, the MNase nuclease is a wild-type MNase truncation, and further preferably has an amino acid sequence shown in SEQ ID No.2, or an amino acid sequence with one or more amino acids substituted, deleted or added on the basis of the sequence shown in SEQ ID No.2, and has the same function.

In some embodiments, the Tn5 transposase is a wild type Tn5 transposase mutant, the amino acid sequence of which is shown in SEQ ID No.3, or an amino acid sequence which is obtained by substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID No.3 and has the same function.

In some embodiments, the amino acid sequence of the linker is shown in SEQ ID NO. 4.

In some embodiments, the kit further comprises a protein purification tag, wherein the protein purification tag comprises a His tag, a GST tag, a MBP tag, a SUMO tag and other affinity chromatography purification.

In some of these embodiments, the second functional region of the protein is linked N-terminal (nitrogen-terminal) to the functional region of amino acids of the first functional region.

In some embodiments, the monomer of the fusion protein is HBD-MNase or HBD-Tn5, the amino acid sequence of the fusion protein HBD-MNase is shown as SEQ ID NO.5, or the fusion protein HBD-MNase is an amino acid sequence which is obtained by substituting, deleting or adding one or more amino acids on the basis of the sequence shown as SEQ ID NO.5 and has the same function; the amino acid sequence of the fusion protein HBD-Tn5 is shown in SEQ ID NO.6, or is an amino acid sequence which is obtained by substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID NO.6 and has the same function.

The invention also aims to provide the application of the fusion protein in preparing an R-loop high-throughput sequencing library of a biological sample.

In some embodiments, the fusion protein is used for preparing an R-loop high-throughput sequencing library of a biological sample, wherein the biological sample comprises but is not limited to a culture cell sample and a tissue sample which are not crosslinked, fixed or frozen/subjected to crosslinking, fixing or freezing treatment; the high-throughput sequencing library is a high-throughput sequencing library for detecting R-loop.

Another objective of the invention is to provide a method for preparing a high-throughput sequencing library of a biological sample.

A method of preparing a high throughput sequencing library of a biological sample comprising the steps of:

the method comprises the steps of collecting and processing a biological sample to obtain a single cell suspension;

washing a biological sample by using a buffer solution, and adding a proper amount of fusion protein to fully combine the fusion protein and then washing the unbound protein;

activating the fusion protein, and fully reacting to obtain small fragment DNA or labeled DNA fragments which are recognized and cut by the fusion protein;

fourthly, adding a stop solution to terminate the reaction, and purifying and recovering the DNA fragment;

fifthly, carrying out PCR amplification to complete library construction.

The fusion protein is mainly used for constructing an R-loop high-throughput detection library, and compared with the existing method, the fusion protein has the following beneficial effects:

the invention creatively connects a proper MNase nuclease or Tn5 transposase with a specific binding protein HBD through a linker (linker) to form a brand-new fusion protein, and the fusion protein can obviously reduce the required cell amount in the R-loop high-throughput detection process, even reach the level of single cells: based on traditional methods such as DRIP-seq and the like, the cell demand needs to reach the level of ten million, but the detection method only needs one hundred thousand cells at most.

The fusion protein constructed by the invention can obviously improve the specificity of detection of the R-loop, the DRIP-seq in the traditional detection mode uses the S9.6 antibody to capture the R-loop in the fragmented genome, but the specificity of the S9.6 antibody in the mode is not strong, the DRIP-seq can capture the hybrid chain in the R-loop and can also combine with double-stranded RNA, so that the detected signal is probably not real R-loop, RNaseH can specifically digest the R-loop, the specific binding capacity of the R-loop is hundreds of times stronger than that of S9.6, and the specific recognition and capture of the R-loop can be effectively improved by utilizing the specific binding capacity of RNaseH binding domain HBD to the R-loop.

The fusion protein constructed by the invention can obviously simplify the experimental process in the R-loop high-throughput detection process, improve the library construction efficiency and reduce the library background: the library building process only needs half an hour at least, the operation time can be completed within five minutes, the library building based on Tn5 can obviously reduce the background of the library, the Tn5 in the fusion protein HBD-Tn5 related by the invention can add specific aptamers at two ends of a cutting site while cutting a genome, and after a positive fragment added with the aptamers is released, the library building method of adding A and the aptamers to the terminal repair is not needed, the library preparation can be directly and efficiently realized through PCR, the experimental process is obviously simplified, and the library building efficiency is improved; namely, the fusion protein can rapidly realize high-flux single cell library preparation in R-loop high-flux detection: the library building process of the detection method needs two hours at most, and the library building based on HBD-Tn5 only needs half an hour at least.

The fusion protein can be subjected to in-situ detection in the R-loop high-flux detection process, and further space in-situ information of a sample is effectively reserved. The detection of the invention does not need the traditional chemical reagent treatment for crosslinking, does not need the traditional enzyme digestion or ultrasonic damage of subcellular structures, and after cell punching, the fusion protein related to the invention is incubated with cells, and Ca is added²⁺Or Mg²⁺The fusion protein is activated under the action, the R-loop is cut in situ and the genome is released, and the most original structure of the active cell is really reserved in the process. While the traditional DRIP-seq needs to cross-link cells with tens of millions of cell volumes, and needs to fragment the genome by enzyme digestion or ultrasound, so that in-situ information is lost.

Drawings

FIG. 1: and (3) constructing a map of the HBD-MNase expression vector.

FIG. 2: HBD-Tn5 expression vector construction map.

FIG. 3: and (3) purifying the high-purity HBD-MNase.

FIG. 4: the result of purification of high purity HBD-Tn 5.

FIG. 5: and (5) detecting the activity of HBD-MNase.

FIG. 6: HBD-Tn5 activity assay results.

FIG. 7: the HBD-MNase fusion protein disclosed by the invention is applied to R-mapping result analysis and comparison of track graphs with the traditional DRIP-seq.

FIG. 8: the HBD-Tn5 fusion protein disclosed by the invention is applied to R-mapping result analysis and a track graph compared with the DRIP-seq traditional detection method.

Detailed Description

In order that the invention may be more readily understood, reference will now be made to the following more particular description of the invention, examples of which are set forth below. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. These embodiments are provided so that this disclosure will be thorough and complete. It is to be understood that the experimental procedures in the following examples, where specific conditions are not noted, are generally in accordance with conventional conditions, or with conditions recommended by the manufacturer. The various reagents used in the examples are commercially available.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The present invention is further illustrated by the following specific examples, which are not intended to limit the scope of the invention.

The fusion protein formed by the invention can be used for detecting R-loop with high flux in the following embodiments.

The fusion protein comprises a dimer formed by a first functional region and a second functional region, wherein the first functional region comprises R-loop specific binding protein HBD, and the amino acid sequence of the R-loop specific binding protein HBD is shown in SEQ ID NO. 1;

the second functional region comprises MNase nuclease (the amino acid sequence of which is shown in SEQ ID NO.2) and Tn5 transposase (the amino acid sequence of which is shown in SEQ ID NO. 3);

a connecting structure (linker) connecting the first functional region and the second functional region, wherein the amino acid sequence is shown in SEQ ID NO.4, and the protein purification Tag is 6 × His Tag.

The monomer of the fusion protein is HBD-MNase or HBD-Tn5, the amino acid sequence of the fusion protein HBD-MNase is shown as SEQ ID NO.5, and the amino acid sequence of the fusion protein HBD-Tn5 is shown as SEQ ID NO. 6.

SEQ ID NO.1

>HBD

>MFYAVRRGRRTGVFLSWSECKAQVDRFPAARFKKFATEDEAWAF

SEQ ID NO.2

>MNase

>ATSTKKLHKEPATLIKAIDGDTVKLMYKGQPMTFRLLLVDTPETKHPKKGVEKYGPEAS AFTKKMVENAKKIEVEFDKGQRTDKYGRGLAYIYADGKMVNEALVRQGLAKVAYVYKP NNTHEQHLRKSEAQAKKEKLNIWSEDNADSGQ

SEQ ID NO.3

>Tn5

>MITSALHRAADWAKSVFSSAALGDPRRTARLVNVAAQLAKYSGKSITISSEGSKAMQEG AYRFIRNPNVSAEAIRKAGAMQTVKLAQEFPELLAIEDTTSLSYRHQVAEELGKLGSIQDKS RGWWVHSVLLLEATTFRTVGLLHQEWWMRPDDPADADEKESGKWLAAAATSRLRMGS MMSNVIAVCDREADIHAYLQDKLAHNERFVVRSKHPRKDVESGLYLYDHLKNQPELGGY QISIPQKGVVDKRGKRKNRPARKASLSLRSGRITLKQGNITLNAVLAEEINPPKGETPLKWL LLTSEPVESLAQALRVIDIYTHRWRIEEFHKAWKTGAGAERQRMEEPDNLERMVSILSFVA VRLLQLRESFTPPQALRAQGLLKEAEHVESQSAETVLTPDECQLLGYLDKGKRKRKEKAG SLQWAYMAIARLGGFMDSKRTGIASWGALWEGWEALQSKLDGFLAAKDLMAQGIKI

Linker1 (for HBD-MNase)

>DDDKEF

Linker2 (for HBD-Tn5)

>DDDKEFGGGGS(SEQ ID NO.4)

>6×His Tag

>HHHHHH

SEQ ID NO.5

>HBD-MNase

>MFYAVRRGRRTGVFLSWSECKAQVDRFPAARFKKFATEDEAWAFDDDKEFGGGGSATS TKKLHKEPATLIKAIDGDTVKLMYKGQPMTFRLLLVDTPETKHPKKGVEKYGPEASAFTK KMVENAKKIEVEFDKGQRTDKYGRGLAYIYADGKMVNEALVRQGLAKVAYVYKPNNTH EQHLRKSEAQAKKEKLNIWSEDNADSGQ

SEQ ID NO.6

>HBD-Tn5

>MFYAVRRGRRTGVFLSWSECKAQVDRFPAARFKKFATEDEAWAFDDDKEFGGGGSMIT SALHRAADWAKSVFSSAALGDPRRTARLVNVAAQLAKYSGKSITISSEGSKAMQEGAYRFI RNPNVSAEAIRKAGAMQTVKLAQEFPELLAIEDTTSLSYRHQVAEELGKLGSIQDKSRGW WVHSVLLLEATTFRTVGLLHQEWWMRPDDPADADEKESGKWLAAAATSRLRMGSMMS NVIAVCDREADIHAYLQDKLAHNERFVVRSKHPRKDVESGLYLYDHLKNQPELGGYQISIP QKGVVDKRGKRKNRPARKASLSLRSGRITLKQGNITLNAVLAEEINPPKGETPLKWLLLTS EPVESLAQALRVIDIYTHRWRIEEFHKAWKTGAGAERQRMEEPDNLERMVSILSFVAVRLL QLRESFTPPQALRAQGLLKEAEHVESQSAETVLTPDECQLLGYLDKGKRKRKEKAGSLQW AYMAIARLGGFMDSKRTGIASWGALWEGWEALQSKLDGFLAAKDLMAQGIKI

Example 1

Design, expression and purification of HBD-MNase fusion protein

1. Construction of HBD-MNase fusion protein expression vector

The first functional region (SEQ ID NO.1) and the second functional region (SEQ ID NO.2) are respectively amplified by PCR through primers. The primers are as follows:

first functional region forward primer:

ATGGGTCGCGGATCCGAATTCATGTTCTATGCGGTGAGGAG SEQ ID NO.7

first functional region reverse primer:

GAACTCCTTATCGTCATCAAAGGCCCAGGCCTCATCTT SEQ ID NO.8

second functional region forward primer:

GATGACGATAAGGAGTTCGCAACTTCAACTAAAAAATTACA SEQ ID NO.9

second functional region reverse primer:

GGTGGTGGTGGTGGTGCTCGAGTTATTGACCTGAATCAGCGTTGTC SEQ ID NO.10

secondly, amplifying two DNA fragments into a fragment in a bridge PCR mode, namely amplifying a first functional region by using a forward primer and a reverse primer of the first functional region, and amplifying a second functional region by using a forward primer and a reverse primer of the second functional region; and then using the PCR product of the first functional region and the PCR product of the second functional region as templates, and amplifying spliced fragments of the first functional region and the second functional region by using a forward primer of the first functional region and a reverse primer of the second functional region.

Cloning of the first functional region (source of template Sequence: NCBI Reference Sequence: NM-011275.3):

PCR reaction (50. mu.L) (enzyme used KOD-Plus Toyobo Cat # KOD-201):

10×KOD Plus Buffer 5μL，dNTP 4μL，Mg₂SO4 2μL，F+R Primer(10mM)2μL，template 1μL (500ng)，KOD Plus 1μL，ddH₂O 35μL；

the procedure is as follows: pre-denaturation at 95 ℃ for 3 min; denaturation at 95 ℃ for 30 s; annealing at 60 ℃ for 30 s; extension at 68 ℃ for 15 s; stretching at 68 deg.C for 5 min; storing at 12 ℃; for a total of 35 cycles.

Cloning of the second functional region (source of template sequence: GenBank: V01281.1):

and (3) PCR reaction system:

10×KOD Plus Buffer 5μL，dNTP 4μL，25mM Mg₂SO4 2μL，F+R Primer(10mM)2μL， template 1μL(500ng)，KOD-Plus 1μL，ddH₂O 35μL；

the procedure is as follows: pre-denaturation at 95 ℃ for 3 min; denaturation at 95 ℃ for 30 s; annealing at 60 ℃ for 30 s; extension at 68 ℃ for 30 s; stretching at 68 deg.C for 5 min; storing at 12 ℃; for a total of 35 cycles.

Detecting the PCR product by 1% agarose Gel electrophoresis, and recovering the target fragment by a Biospin Gel Extraction Kit Gel recovery Kit (BIO FLUX, Cat # BSC02M 1)).

Bridge PCR reaction system:

10×KOD Plus Buffer 5μL，dNTP 4μL，25mM Mg₂SO 42 μ L, F + R Primer (10mM)2 μ L, template 2 μ L (molar ratio of two functional fragments 1: 1), KOD-Plus 1 μ L, ddH₂O 34μL；

The procedure is as follows: pre-denaturation at 95 ℃ for 3 min; denaturation at 95 ℃ for 30 s; annealing at 60 ℃ for 30 s; extension at 68 ℃ for 40 s; stretching at 68 deg.C for 5 min; the cells were stored at 12 ℃ for 35 cycles.

The PCR products were detected by 1% agarose Gel electrophoresis, Biospin Gel Extraction Kit Gel recovery Kit (BIO FLUX,

cat # BSC02M1)) to recover the target fragment.

(3) The fragment was cloned into the expression vector pET-28a, as shown in FIG. 1.

Adopting a homologous recombination mode, wherein a reaction system comprises the following steps:

20ng of pET-28a ((EcoRI + Xhol double restriction enzyme product)), 60ng of the target fragment recovered in the second step, 2 muL of 5 Xligation-Free Cloning (ABM), ddH₂Make up to 10 μ L and ice-bath for 30 min. The homologous recombination product was transformed into DH5 alpha (Trans).

The Sanger sequencing ensures the integrity of the carrier sequence.

Single clones were single-sequenced with T7 promoter.

2. Expression and purification of HBD-MNase fusion protein

The HBD-MNase fusion protein expression plasmid with correct sequencing is transferred into a BL21(DE3) expression strain.

Inoculating a monoclonal strain expressing HBD-MNase fusion protein into LB culture medium containing kanamycin, culturing at 37 ℃, 220rpm and over night.

Inoculating the culture in the second seed into 100mL of LB culture medium, culturing at 37 ℃ and 220rpm for 3h until OD is 0.6-0.8. Fourth, the culture bottle in the third crop was placed in a refrigerator at 4 ℃ for 15min, and IPTG was added to adjust the final concentration to 0.5 mM.

Fifthly, culturing at 160rpm for 2h at 20 ℃, collecting thalli, and crushing with ultrasound or high pressure to obtain the protein solution.

Sixthly, centrifuging 13000g of the protein solution obtained by the step I for 30min at 4 ℃ by using a centrifugal machine, precipitating a bacterial genome by using PEI (with the final concentration of 0.05%) in the protein solution obtained by the supernatant, centrifuging 13000g of the protein solution for 30min at 4 ℃ by using a 0.45-micrometer filter head, and filtering the supernatant.

Affinity purification using Ni-Column (Ni-NTA 1ml Pre-Packed gradient Column Industrial Cat # C600791-0010) the nickel Column was first equilibrated, after completion of the Buffer run in the Ni Column, by adding a Pre-cooled 20mM imidazole (formulated in 50mM Tris-HCl, pH 7.5, 0.8M NaCl, 0.2% Triton X-100 and 10% glycerol) to the equilibrated nickel Column. Thereafter, the protein solution was applied to the column 5 times, and then the column was washed several times with 150mL of pre-cooled 20mM Imidazole to wash unbound proteins, then with 10mL of 50mM Imidazole, and finally with 5mL of pre-cooled 300mM Imidazole to elute the protein of interest. The eluted target protein was dialyzed overnight at 4 ℃ in a dialysis solution (20mM Tris-HCl, pH 7.5, 150mM NaCl, 10% glycerol). HBD-MNase is concentrated by a 10kDa protein concentration column, the protein concentration is detected by a BCA method after concentration, and meanwhile, the residual bacterial genome is detected by the Qubit (the residual genome amount is controlled to be less than 0.5 ng/. mu.L).

The purified protein was subjected to SDS-PAGE and stained with Coomassie Brilliant blue as shown in FIG. 3. FT represents the flow-through of the protein on the column, imidazole represents the elution products of imidazole eluents with different concentrations (50mM imidazole represents the non-specific or weakly-bound protein by washing, 300mM imidazole represents the target protein solution finally eluted), M represents protein marker, and the red asterisk marked band in the 300mM imidazole lane is the purified target protein product. The amino acid sequence of the HBD-MNase fusion protein is shown as SEQ ID NO. 5. The function of the HBD-Tn5 fusion protein can be realized by the skilled person through substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID NO.5 and the amino acid sequence with the same function according to the common knowledge of the skilled person.

Example 2 design, expression and purification of HBD-Tn5 fusion protein

1. Construction of HBD-Tn5 fusion protein expression vector

The first functional region (SEQ ID NO.1) and the second functional region (SEQ ID NO.3) are PCR-amplified by using primers. The primers are as follows:

first functional region forward primer:

ATGGGTCGCGGATCCGAATTCATGTTCTATGCGGTGAGGAG SEQ ID NO.7

first functional region reverse primer:

GGCTTTAGCCGCTGCCTCCTTTGCGGCAGCAAAGGCCCAGGCCTCATCTT SEQ ID NO.11

second functional region forward primer:

GCTGCCGCAAAGGAGGCAGCGGCTAAAGCCATGATTACCAGTGCACTGCA SEQ ID NO.12

second functional region reverse primer:

GGTGGTGGTGGTGGTGCTCGAGTTAGATTTTAATGCCCTGCGCCATC SEQ ID NO.13

Cloning of the first functional region:

source of template sequence for PCR reaction (50 μ L): NCBI Reference Sequence NM-011275.3: 10 XKOD Plus Buffer 5. mu.L, dNTP 4. mu.L, 25mM Mg₂SO4 2μL，F+R Primer(10mM)2μL， template 1μL(500ng)，KOD Plus 1μL，ddH₂O 35μL；

The procedure is as follows: pre-denaturation at 95 ℃ for 3 min; denaturation at 95 ℃ for 30 s; annealing at 60 ℃ for 30 s; extension at 68 ℃ for 15 s; stretching at 68 deg.C for 5 min; the cells were stored at 12 ℃ for 35 cycles.

Cloning of the second functional region:

and (3) PCR reaction system:

the procedure is as follows: pre-denaturation at 95 ℃ for 3 min; denaturation at 95 ℃ for 30 s; annealing at 60 ℃ for 30 s; extension at 68 ℃ for 1min30 s; stretching at 68 deg.C for 5 min; the cells were stored at 12 ℃ for 35 cycles.

Bridge PCR reaction system (enzyme used KOD-Plus Toyobo Cat # KOD-201):

The procedure is as follows: pre-denaturation at 95 ℃ for 3 min; denaturation at 95 ℃ for 30 s; annealing at 60 ℃ for 30 s; extending at 68 ℃ for 1min for 45 s; stretching at 68 deg.C for 5 min; the cells were stored at 12 ℃ for 35 cycles.

Cloning the fragment into expression vector pET-28a, as shown in FIG. 2.

20ng of pET-28a (EcoRI + Xhol double enzyme digestion product); 60ng of the obtained target fragments are recovered; 5 × Ligation-Free Cloning 2 μ L (ABM); ddH₂Make up to 10 μ L and ice-bath for 30 min. The homologous recombination product was transformed into DH5 alpha (TransGen).

The Sanger sequencing ensures the integrity of the carrier sequence.

Single clones were sequenced bidirectionally with T7promoter and T7 terminator.

2. Expression and purification of HBD-Tn5 fusion protein

The HBD-Tn5 fusion protein expression plasmid with correct sequencing is transferred into a BL21(DE3) expression strain.

Inoculating a monoclonal strain expressing the HBD-Tn5 fusion protein into LB culture medium containing kanamycin, and culturing at 37 ℃ and 220rpm overnight.

Inoculating the culture in the second seed into 100mL of LB culture medium, culturing at 37 ℃ and 220rpm for 3h until OD is 0.6-0.8.

Fourth, the culture bottle in the third crop was placed in a refrigerator at 4 ℃ for 15min, and IPTG was added to adjust the final concentration to 0.5 mM.

Fifthly, culturing at the temperature of 37 ℃ and the rpm of 160 for 2h, collecting thalli, and crushing with ultrasound or high pressure to obtain a protein solution.

Sixthly, centrifuging 13000g of the protein solution obtained by the step of the sixth, for 30min, precipitating a bacterial genome by using PEI (with a final concentration of 0.05%), centrifuging 13000g of a centrifugal machine at the temperature of the step of the fourth, and filtering by using a filter head with the size of 0.45 mu m.

Affinity purification with Ni column

The Ni column was equilibrated first, and after the Buffer flow in the Ni column was complete, a pre-cooled 20mM Imidazole (prepared in 50mM Tris-HCl, pH 7.5, 0.8M NaCl, 0.2% Triton X-100, and 10% glycerol) was added to equilibrate the Ni column. Thereafter, the protein was loaded onto the column 5 times, the column was washed several times with 150mL of pre-cooled 20mM Imidazole, then 10mL of 35mM Imidazole, and finally the protein was eluted with 5mL of pre-cooled 300mM Imidazole. The eluted protein was directly dialyzed in buffer (20mM Tris-HCl, pH 7.5, 150mM NaCl, 10% glycerol) at 4 ℃ overnight. HBD-Tn5 was concentrated using a 30kDa protein concentration column, and the protein concentration was measured by BCA method after concentration, while the remaining bacterial genome was detected by Qubit (the amount of remaining genome was controlled to less than 0.5 ng/. mu.L).

The purified protein was subjected to SDS-PAGE and stained with Coomassie Brilliant blue as shown in FIG. 4. Lane FT shows flow-through, imidazole shows elution products from imidazole eluents of different concentrations (35mM imidazole shows unbound protein washed away, 300mM imidazole shows target protein solution finally eluted), M shows protein marker, and the red asterisk marked band in 300mM imidazole lane is the purified target protein product.

The amino acid sequence of the HBD-Tn5 fusion protein is shown in SEQ ID NO. 6. The function of the HBD-Tn5 fusion protein can be realized by the skilled person through substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID NO.6 and the amino acid sequence with the same function according to the common knowledge of the skilled person.

Example 3 detection of HBD-MNase Activity

Mixing different amounts of HBD-MNase (input amount shown in FIG. 5) with 550ng of genomic DNA, and adding Ca²⁺The final concentration was adjusted to 10mM, and the reaction was carried out in ice bath for 10 min. Agarose gel electrophoresis detection shows that HBD-MNase can cut the genome DNA into DNA fragments with the size of 100bp, and the result is shown in FIG. 5: under the condition of the same genome input amount (550ng), with the increase of the input amount of the fusion protein HBD-MNase (from 0ng to 578.4ng), the gradual narrowing and even disappearance of a genome band can be obviously seen, and a gradually increased and dispersed genome can appear below a lane, so that the in vitro enzyme digestion result shows that the fusion protein HBD-MNase has higher enzyme activity.

Example 4 HBD-Tn5 Activity assay

15pmol of HBD-Tn5 was mixed with 200ng of genomic DNA, and Mg was added²⁺The reaction was carried out at 55 ℃ for 10min to a final concentration of 10mM, and agarose gel electrophoresis was carried out for detection, as shown in FIG. 6: compared with a control group which is not treated by HBD-Tn5, the addition of HBD-Tn5 can effectively break the genome DNA, so that an obvious genome band disappears, dispersed genome distribution appears below a lane, most of fragment distribution is below 1000bp, and in-vitro enzyme digestion genome experiments prove that the fusion protein HBD-Tn5 disclosed by the invention has higher enzyme activity.

Example 5

The HBD-MNase and HBD-Tn5 obtained by the invention are respectively applied to non-crosslinked cells in a natural state to carry out in-situ detection on R-loop, and the detection method comprises the following steps:

firstly, 100000 HEK 293T cells cultured in vitro are collected, washed once with PBS (PH 7.2), centrifuged for 3min at 600g at room temperature, and then supernatant is removed, and washed once with 200 muL wash buffer (10mM HEPES (4-hydroxyethyl piperazine ethanesulfonic acid), 150mM NaCl, 0.5mM spermidine (spermidine)) and 5% Digitonin;

add binding buffers (10mM HEPES, 10mM KCl, 1mM CaCl) to 8. mu.L of Con A beads (Bangslabs)₂，1mM MnCl₂) Mixing and incubating with cells in the first step for 15min after cleaning, centrifuging the cells at 4 ℃ by a centrifuge for 100g for instant separation, discarding supernatant (magnetic force rack), adding fusion protein HBD-MNase or HBD-Tn5 into 100 mu L of wash buffer, wherein the final concentration of the fusion protein HBD-MNase or HBD-Tn5 is 1 mu M, Digitonin (digitalis saponin), 1 mu L of PIC (100 x), and incubating the cells at 4 ℃ for 2-12 h;

washing with 200 mu L wash buffer for three times, washing off redundant protein, centrifuging with 100g centrifuge at 4 ℃, and discarding supernatant (magnetic rack);

addition of 10mM Ca²⁺Or Mg²⁺Activating fusion protein HBD-MNase or HBD-Tn5 to cut DNA (R-mapping, wherein the reaction condition of the fusion protein HBD-MNase is 0 +/-0.5 ℃ for 30min, and the reaction condition of the fusion protein HBD-Tn5 is 37-55 ℃ for 1 h);

after the cutting reaction is terminated by adding 10mM EDTA, phenol: chloroform: extracting DNA with isoamyl alcohol;

sixthly, performing end repairing and connection of an Adapter (Vazyme VAHTS Adapter-S for illumina) on the extracted genome fragment based on the library establishment of HBD-MNase (R-mapping), and performing PCR library establishment (Vazyme index for illumina).

And (3) PCR system: 24 μ L of DNA, 1 μ L i5(10mM), 1 μ L i7(10mM), 10 μ L of 5 XKAPA HiFi Fidelity buffer, 1 μ L of KAPA HiFi hot start, 11.5 μ L of ddH₂O, 1.5. mu.L dNTP; a total of 50. mu.L.

The procedure is as follows: pre-denaturation at 98 ℃ for 45 s; denaturation at 98 ℃ for 15 s; annealing at 60 ℃ for 30 s; extending for 1min at 72 ℃; total extension at 72 deg.C for 1min, preservation at 12 deg.C for 13 cycles; library products were sequenced by illumina.

Based on HBD-Tn5(R-mapping) library construction, the purified genome was subjected to PCR library construction (Vazyme index for illumina).

And (3) PCR system: 23 μ L of DNA, 1 μ L i5(10mM), 1 μ L i7(10mM), 25 μ L of 2 XMix buffer (NEB Cat # M0541S); the procedure is as follows: 72 ℃ for 5 min; pre-denaturation at 98 ℃ for 30 s; denaturation at 98 ℃ for 15 s; annealing at 60 ℃ for 30 s; extending for 1min at 72 ℃; total extension at 72 deg.C for 5min, preservation at 16 deg.C for 13 cycles; the library products were subjected to illumina sequencing.

The biological sample for detection of the fusion protein of the present application may include, but is not limited to, a cultured cell sample, a tissue sample or other biological sample that is not crosslinked, fixed or frozen/processed by crosslinking, fixing or freezing.

After obtaining DNA by using the commercially available kit or method, PCR library construction may be performed by using other polymerases, including isothermal polymerases and other polymerases with strand displacement properties, RNA or DNA dependent polymerases.

As shown in fig. 7 and 8: the results obtained by applying the fusion proteins HBD-MNase and HBD-Tn5 in the invention to in-situ detection of R-loop (R-mapping) are compared with the results of the traditional method DRIP-seq (R-loop formation is a discrete mechanical of methylated human CpG island reagents, Ginno et al, Mol cell 2012Mar 30; 45(6):814-25) for detecting R-loop.

As shown in fig. 7: under the condition of relatively small cell amount, the R-mapping based on HBD-MNase can capture positive signals with the same DRIP-seq, the peak signals on the gene locus are more concentrated, the DRIP-seq signals are relatively dispersed, and the signal value of the R-mapping detection method is higher, so that the signal-to-noise ratio (shown in figure 7) is effectively improved compared with the traditional method. Meanwhile, a track graph also shows that the invention can detect signals (shown in figure 7) which can not be captured by DRIP-seq, and the signals specifically detected by the invention can be digested after RNase H treatment (shown in figure 7), which shows that the signals are real R-loop, and the R-mapping detection method is proved to be capable of effectively realizing accurate capture of the R-loop compared with the traditional method;

based on the application of HBD-Tn5 fusion protein, the invention not only consumes less cell amount, but also greatly shortens the library construction time to within half an hour based on the characteristic that Tn5 cuts genome and is directly connected with adapter, and then the library is constructed by PCR.

As shown in fig. 8: according to the invention, based on the comparative analysis of results of applying R-mapping of HBD-Tn5 fusion protein to in-situ detection of R-loop (on a picture 8) in a natural state non-crosslinked cell and DRIP-seq (under a picture 8) of traditional detection of R-loop, R-loop peak signals obtained by applying the R-mapping detection method of the fusion protein are more concentrated and relatively more accurate, DRIP-seq signals are relatively dispersed, and meanwhile, the signal value obtained by the invention is higher, namely compared with the traditional DRIP-seq, the signal-to-noise ratio can be effectively improved by detecting the in-situ R-loop through HBD-Tn5 used by the invention.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

Sequence listing

<110> Guangzhou biomedical and health research institute of Chinese academy of sciences

<120> fusion protein and application thereof

<160> 13

<170> SIPOSequenceListing 1.0

<210> 1

<211> 44

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 1

Met Phe Tyr Ala Val Arg Arg Gly Arg Arg Thr Gly Val Phe Leu Ser

1 5 10 15

Trp Ser Glu Cys Lys Ala Gln Val Asp Arg Phe Pro Ala Ala Arg Phe

20 25 30

Lys Lys Phe Ala Thr Glu Asp Glu Ala Trp Ala Phe

35 40

<210> 2

<211> 149

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 2

Ala Thr Ser Thr Lys Lys Leu His Lys Glu Pro Ala Thr Leu Ile Lys

1 5 10 15

Ala Ile Asp Gly Asp Thr Val Lys Leu Met Tyr Lys Gly Gln Pro Met

20 25 30

Thr Phe Arg Leu Leu Leu Val Asp Thr Pro Glu Thr Lys His Pro Lys

35 40 45

Lys Gly Val Glu Lys Tyr Gly Pro Glu Ala Ser Ala Phe Thr Lys Lys

50 55 60

Met Val Glu Asn Ala Lys Lys Ile Glu Val Glu Phe Asp Lys Gly Gln

65 70 75 80

Arg Thr Asp Lys Tyr Gly Arg Gly Leu Ala Tyr Ile Tyr Ala Asp Gly

85 90 95

Lys Met Val Asn Glu Ala Leu Val Arg Gln Gly Leu Ala Lys Val Ala

100 105 110

Tyr Val Tyr Lys Pro Asn Asn Thr His Glu Gln His Leu Arg Lys Ser

115 120 125

Glu Ala Gln Ala Lys Lys Glu Lys Leu Asn Ile Trp Ser Glu Asp Asn

130 135 140

Ala Asp Ser Gly Gln

145

<210> 3

<211> 476

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 3

Met Ile Thr Ser Ala Leu His Arg Ala Ala Asp Trp Ala Lys Ser Val

1 5 10 15

Phe Ser Ser Ala Ala Leu Gly Asp Pro Arg Arg Thr Ala Arg Leu Val

20 25 30

Asn Val Ala Ala Gln Leu Ala Lys Tyr Ser Gly Lys Ser Ile Thr Ile

35 40 45

Ser Ser Glu Gly Ser Lys Ala Met Gln Glu Gly Ala Tyr Arg Phe Ile

50 55 60

Arg Asn Pro Asn Val Ser Ala Glu Ala Ile Arg Lys Ala Gly Ala Met

65 70 75 80

Gln Thr Val Lys Leu Ala Gln Glu Phe Pro Glu Leu Leu Ala Ile Glu

85 90 95

Asp Thr Thr Ser Leu Ser Tyr Arg His Gln Val Ala Glu Glu Leu Gly

100 105 110

Lys Leu Gly Ser Ile Gln Asp Lys Ser Arg Gly Trp Trp Val His Ser

115 120 125

Val Leu Leu Leu Glu Ala Thr Thr Phe Arg Thr Val Gly Leu Leu His

130 135 140

Gln Glu Trp Trp Met Arg Pro Asp Asp Pro Ala Asp Ala Asp Glu Lys

145 150 155 160

Glu Ser Gly Lys Trp Leu Ala Ala Ala Ala Thr Ser Arg Leu Arg Met

165 170 175

Gly Ser Met Met Ser Asn Val Ile Ala Val Cys Asp Arg Glu Ala Asp

180 185 190

Ile His Ala Tyr Leu Gln Asp Lys Leu Ala His Asn Glu Arg Phe Val

195 200 205

Val Arg Ser Lys His Pro Arg Lys Asp Val Glu Ser Gly Leu Tyr Leu

210 215 220

Tyr Asp His Leu Lys Asn Gln Pro Glu Leu Gly Gly Tyr Gln Ile Ser

225 230 235 240

Ile Pro Gln Lys Gly Val Val Asp Lys Arg Gly Lys Arg Lys Asn Arg

245 250 255

Pro Ala Arg Lys Ala Ser Leu Ser Leu Arg Ser Gly Arg Ile Thr Leu

260 265 270

Lys Gln Gly Asn Ile Thr Leu Asn Ala Val Leu Ala Glu Glu Ile Asn

275 280 285

Pro Pro Lys Gly Glu Thr Pro Leu Lys Trp Leu Leu Leu Thr Ser Glu

290 295 300

Pro Val Glu Ser Leu Ala Gln Ala Leu Arg Val Ile Asp Ile Tyr Thr

305 310 315 320

His Arg Trp Arg Ile Glu Glu Phe His Lys Ala Trp Lys Thr Gly Ala

325 330 335

Gly Ala Glu Arg Gln Arg Met Glu Glu Pro Asp Asn Leu Glu Arg Met

340 345 350

Val Ser Ile Leu Ser Phe Val Ala Val Arg Leu Leu Gln Leu Arg Glu

355 360 365

Ser Phe Thr Pro Pro Gln Ala Leu Arg Ala Gln Gly Leu Leu Lys Glu

370 375 380

Ala Glu His Val Glu Ser Gln Ser Ala Glu Thr Val Leu Thr Pro Asp

385 390 395 400

Glu Cys Gln Leu Leu Gly Tyr Leu Asp Lys Gly Lys Arg Lys Arg Lys

405 410 415

Glu Lys Ala Gly Ser Leu Gln Trp Ala Tyr Met Ala Ile Ala Arg Leu

420 425 430

Gly Gly Phe Met Asp Ser Lys Arg Thr Gly Ile Ala Ser Trp Gly Ala

435 440 445

Leu Trp Glu Gly Trp Glu Ala Leu Gln Ser Lys Leu Asp Gly Phe Leu

450 455 460

Ala Ala Lys Asp Leu Met Ala Gln Gly Ile Lys Ile

465 470 475

<210> 4

<211> 11

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 4

Asp Asp Asp Lys Glu Phe Gly Gly Gly Gly Ser

1 5 10

<210> 5

<211> 204

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 5

Met Phe Tyr Ala Val Arg Arg Gly Arg Arg Thr Gly Val Phe Leu Ser

1 5 10 15

Trp Ser Glu Cys Lys Ala Gln Val Asp Arg Phe Pro Ala Ala Arg Phe

20 25 30

Lys Lys Phe Ala Thr Glu Asp Glu Ala Trp Ala Phe Asp Asp Asp Lys

35 40 45

Glu Phe Gly Gly Gly Gly Ser Ala Thr Ser Thr Lys Lys Leu His Lys

50 55 60

Glu Pro Ala Thr Leu Ile Lys Ala Ile Asp Gly Asp Thr Val Lys Leu

65 70 75 80

Met Tyr Lys Gly Gln Pro Met Thr Phe Arg Leu Leu Leu Val Asp Thr

85 90 95

Pro Glu Thr Lys His Pro Lys Lys Gly Val Glu Lys Tyr Gly Pro Glu

100 105 110

Ala Ser Ala Phe Thr Lys Lys Met Val Glu Asn Ala Lys Lys Ile Glu

115 120 125

Val Glu Phe Asp Lys Gly Gln Arg Thr Asp Lys Tyr Gly Arg Gly Leu

130 135 140

Ala Tyr Ile Tyr Ala Asp Gly Lys Met Val Asn Glu Ala Leu Val Arg

145 150 155 160

Gln Gly Leu Ala Lys Val Ala Tyr Val Tyr Lys Pro Asn Asn Thr His

165 170 175

Glu Gln His Leu Arg Lys Ser Glu Ala Gln Ala Lys Lys Glu Lys Leu

180 185 190

Asn Ile Trp Ser Glu Asp Asn Ala Asp Ser Gly Gln

195 200

<210> 6

<211> 531

<212> PRT

<213> Artificial Sequence (Artificial Sequence)

<400> 6

Met Phe Tyr Ala Val Arg Arg Gly Arg Arg Thr Gly Val Phe Leu Ser

1 5 10 15

Trp Ser Glu Cys Lys Ala Gln Val Asp Arg Phe Pro Ala Ala Arg Phe

20 25 30

Lys Lys Phe Ala Thr Glu Asp Glu Ala Trp Ala Phe Asp Asp Asp Lys

35 40 45

Glu Phe Gly Gly Gly Gly Ser Met Ile Thr Ser Ala Leu His Arg Ala

50 55 60

Ala Asp Trp Ala Lys Ser Val Phe Ser Ser Ala Ala Leu Gly Asp Pro

65 70 75 80

Arg Arg Thr Ala Arg Leu Val Asn Val Ala Ala Gln Leu Ala Lys Tyr

85 90 95

Ser Gly Lys Ser Ile Thr Ile Ser Ser Glu Gly Ser Lys Ala Met Gln

100 105 110

Glu Gly Ala Tyr Arg Phe Ile Arg Asn Pro Asn Val Ser Ala Glu Ala

115 120 125

Ile Arg Lys Ala Gly Ala Met Gln Thr Val Lys Leu Ala Gln Glu Phe

130 135 140

Pro Glu Leu Leu Ala Ile Glu Asp Thr Thr Ser Leu Ser Tyr Arg His

145 150 155 160

Gln Val Ala Glu Glu Leu Gly Lys Leu Gly Ser Ile Gln Asp Lys Ser

165 170 175

Arg Gly Trp Trp Val His Ser Val Leu Leu Leu Glu Ala Thr Thr Phe

180 185 190

Arg Thr Val Gly Leu Leu His Gln Glu Trp Trp Met Arg Pro Asp Asp

195 200 205

Pro Ala Asp Ala Asp Glu Lys Glu Ser Gly Lys Trp Leu Ala Ala Ala

210 215 220

Ala Thr Ser Arg Leu Arg Met Gly Ser Met Met Ser Asn Val Ile Ala

225 230 235 240

Val Cys Asp Arg Glu Ala Asp Ile His Ala Tyr Leu Gln Asp Lys Leu

245 250 255

Ala His Asn Glu Arg Phe Val Val Arg Ser Lys His Pro Arg Lys Asp

260 265 270

Val Glu Ser Gly Leu Tyr Leu Tyr Asp His Leu Lys Asn Gln Pro Glu

275 280 285

Leu Gly Gly Tyr Gln Ile Ser Ile Pro Gln Lys Gly Val Val Asp Lys

290 295 300

Arg Gly Lys Arg Lys Asn Arg Pro Ala Arg Lys Ala Ser Leu Ser Leu

305 310 315 320

Arg Ser Gly Arg Ile Thr Leu Lys Gln Gly Asn Ile Thr Leu Asn Ala

325 330 335

Val Leu Ala Glu Glu Ile Asn Pro Pro Lys Gly Glu Thr Pro Leu Lys

340 345 350

Trp Leu Leu Leu Thr Ser Glu Pro Val Glu Ser Leu Ala Gln Ala Leu

355 360 365

Arg Val Ile Asp Ile Tyr Thr His Arg Trp Arg Ile Glu Glu Phe His

370 375 380

Lys Ala Trp Lys Thr Gly Ala Gly Ala Glu Arg Gln Arg Met Glu Glu

385 390 395 400

Pro Asp Asn Leu Glu Arg Met Val Ser Ile Leu Ser Phe Val Ala Val

405 410 415

Arg Leu Leu Gln Leu Arg Glu Ser Phe Thr Pro Pro Gln Ala Leu Arg

420 425 430

Ala Gln Gly Leu Leu Lys Glu Ala Glu His Val Glu Ser Gln Ser Ala

435 440 445

Glu Thr Val Leu Thr Pro Asp Glu Cys Gln Leu Leu Gly Tyr Leu Asp

450 455 460

Lys Gly Lys Arg Lys Arg Lys Glu Lys Ala Gly Ser Leu Gln Trp Ala

465 470 475 480

Tyr Met Ala Ile Ala Arg Leu Gly Gly Phe Met Asp Ser Lys Arg Thr

485 490 495

Gly Ile Ala Ser Trp Gly Ala Leu Trp Glu Gly Trp Glu Ala Leu Gln

500 505 510

Ser Lys Leu Asp Gly Phe Leu Ala Ala Lys Asp Leu Met Ala Gln Gly

515 520 525

Ile Lys Ile

530

<210> 7

<211> 41

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 7

atgggtcgcg gatccgaatt catgttctat gcggtgagga g 41

<210> 8

<211> 38

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 8

gaactcctta tcgtcatcaa aggcccaggc ctcatctt 38

<210> 9

<211> 41

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 9

gatgacgata aggagttcgc aacttcaact aaaaaattac a 41

<210> 10

<211> 46

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

ggtggtggtg gtggtgctcg agttattgac ctgaatcagc gttgtc 46

<210> 11

<211> 50

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 11

ggctttagcc gctgcctcct ttgcggcagc aaaggcccag gcctcatctt 50

<210> 12

<211> 50

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 12

gctgccgcaa aggaggcagc ggctaaagcc atgattacca gtgcactgca 50

<210> 13

<211> 47

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 13

ggtggtggtg gtggtgctcg agttagattt taatgccctg cgccatc 47

Claims

1. A fusion protein is characterized by comprising a dimer formed by a first functional region and a second functional region, wherein the first functional region comprises R-loop specific binding protein (HBD);

the second functional region comprises MNase nuclease or Tn5 transposase;

and the connecting structure connecting the first functional area and the second functional area.

2. The fusion protein of claim 1, wherein the HBD is an R-loop specific recognition protein, and the amino acid sequence of the R-loop specific recognition protein is shown in SEQ ID No.1, or the HBD is an amino acid sequence which is obtained by substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID No.1 and has the same function.

3. The fusion protein of claim 1, wherein the MNase nuclease is a wild-type MNase truncation, preferably having an amino acid sequence as shown in SEQ ID No.2, or an amino acid sequence with one or more amino acids substituted, deleted or added based on the sequence shown in SEQ ID No.2, and having the same function.

4. The fusion protein of claim 1, wherein the Tn5 transposase is a wild type Tn5 transposase mutant, and the amino acid sequence thereof is shown in SEQ ID No.3, or is an amino acid sequence that is obtained by substituting, deleting or adding one or more amino acids based on the sequence shown in SEQ ID No.3, and has the same function.

5. The fusion protein of claim 1, wherein the amino acid sequence of the linking structure is DDDKEF or DDDKEFGGGGS.

6. The fusion protein of claim 1, wherein the fusion protein has a protein purification tag attached thereto, wherein the protein purification tag is a His tag, a GST tag, an MBP tag, or a SUMO tag.

7. The fusion protein of claim 1, wherein the second functional domain of the fusion protein is N-terminal to the functional amino acid domain of the first functional domain.

8. The fusion protein of claim 1, wherein the monomer of the fusion protein is HBD-MNase or HBD-Tn5, and the amino acid sequence of the HBD-MNase is shown in SEQ ID NO.5, or the fusion protein is an amino acid sequence which is obtained by substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID NO.5 and has the same function; the amino acid sequence of the fusion protein HBD-Tn5 is shown in SEQ ID NO.6, or is an amino acid sequence which is obtained by substituting, deleting or adding one or more amino acids on the basis of the sequence shown in SEQ ID NO.6 and has the same function.

9. Use of the fusion protein of any one of claims 1-8 for preparing an R-loop high-throughput sequencing library of a biological sample.

10. Use of the fusion protein of any one of claims 1-8 in an in situ active R-loop assay.