DNA (deoxyribonucleic acid) joint as well as preparation method and application thereof
Technical Field
The invention particularly relates to a DNA joint and a preparation method and application thereof.
Background
In the second-generation sequencing experiment, the construction of a DNA sequencing library based on DNA linker connection is an important and basic experimental technology. Besides being independently applied to the construction of a conventional genome sequencing library, the technology is also an important link in the construction of libraries such as RNA-seq, ChIP-seq, RRBS and the like. DNA fragments are finally converted into DNA sequencing libraries after a) end-filling, b) adding A, c) DNA linker ligation and d) PCR enrichment with universal primers matching the linker sequence. Obviously, only if the two ends of a DNA fragment are simultaneously connected with a DNA adaptor, the fragment can be enriched in the subsequent PCR enrichment process and then sequenced. The proportion of the DNA fragments with adapters connected to both ends to all DNA fragments is also called transformation efficiency, and is one of the core criteria of the quality of DNA library construction, and not only conventional DNA library construction, but also some problems which need to be considered in the construction of libraries with high attention paid to original effective molecules (such as cfDNA library, PCR-free library).
To improve ligation efficiency, more common research has focused on the optimization of various functional enzymes and buffers during the library construction process. For example, terminal addition of A is an important step in the construction of DNA libraries, and compared with blunt-end ligation, terminal 3 'addition of A and 3' overhanging T base of a DNA linker can mediate TA ligation with higher ligation efficiency. Therefore, researchers have developed or improved some functional enzymes to increase the efficiency of terminal addition of a. Researchers have also optimized buffer systems by ligation, by adding reagents such as high polymer polyethylene glycol (PEG), small molecule propylene glycol, glycerol, etc. to achieve high efficiency ligation. Although these studies can also effectively improve the ligation efficiency, they either require complicated and high-standard equipment to produce and modify the enzyme, or they cannot be used due to patent protection, or they require the purchase of commercial reagents at a high cost, and therefore, there is a strong need to develop a simple method for improving the library transformation efficiency.
Disclosure of Invention
The invention aims to solve the technical problem of providing a DNA joint capable of improving the library transformation efficiency and a preparation method and application thereof.
In order to solve the technical problems, the invention adopts the following technical scheme:
the inventor finds in the experimental process that the conventional Y-type linker can be non-specifically connected with the DNA fragment during the connection, and supposes that the non-connection end of the conventional Y-type linker can be related to the 3 'hydroxyl and/or 5' phosphate at the non-connection end of the DNA linker.
To this end, the present invention provides, in one aspect, a DNA adaptor whose non-ligation ends are blocked at 5 'and/or 3' positions, thereby ensuring that they do not interfere with the ligation reaction, thereby improving the ligation efficiency of the DNA adaptor.
The base composition of the DNA linker described in the present invention may comprise one or more of a (adenine), T (thymine), C (cytosine), G (guanine), U (uracil), 5mC (5' methyl modified cytosine). Preferably, the non-connecting end of the DNA linker is blocked by a fluorescent group.
The fluorescent group in the present invention may be a commonly used fluorescent group, but is preferably one or more of FAM, HEX, and ROX.
The groups used in the present invention for blocking the non-linked terminal 5 'and 3' may be the same or different groups.
The DNA linker of the present invention may be various types of linkers, preferably, the DNA linker is a Y-type linker; wherein the Y-type linker includes, but is not limited to, conventional Y-type linker, UMI-containing Y-type linker.
Preferably, the non-linked ends are blocked at the 5 'phosphate and/or 3' hydroxyl groups.
According to a specific and preferred embodiment, the 5 'and 3' of the non-ligation terminal of the DNA linker are respectively blocked by fluorescent groups, so that the library conversion efficiency (double-linker ligation efficiency) can be improved from 10% to 60% or even higher by using the DNA linker for ligation in a ligation system consisting of the most common T4-DNA ligase and a buffer solution, and the library conversion efficiency can be simply, efficiently and cheaply improved by the linker of the invention.
The structure of a typical non-ligated end-blocked Y-type DNA linker of the present invention is shown in FIG. 1.
The other aspect of the invention is to provide a preparation method of the DNA adaptor, which comprises the steps of chemically synthesizing two single-stranded DNAs with reverse complementary partial sequences of the double-stranded DNA adaptor, respectively, blocking the non-connection ends of any one or two of the single-stranded DNAs, and then synthesizing and purifying to obtain the DNA adaptor.
In the present invention, the method for synthesizing the DNA adaptor from two single-stranded DNAs includes, but is not limited to, one or more of annealing, single-stranded extension, and restriction enzyme cleavage.
The third aspect of the invention provides an application of the DNA adaptor in constructing a DNA sequencing library.
The fourth aspect of the invention provides a method for constructing a DNA sequencing library, wherein the DNA adaptor is used for connecting DNA fragments.
According to one embodiment, the construction method comprises the following steps:
(1) preparing a DNA fragment;
(2) filling the ends of the DNA fragment in the step (1) with T4 DNA polymerase;
(3) adding a phosphate group at the 5' end of the DNA fragment treated in the step (2) by adopting T4 PNK;
(4) adding A to the 3' end of the DNA fragment treated in the step (3) by Klenow exo-or Taq polymerase;
(5) performing the connection of the DNA fragment and the DNA linker in the step (4) by using T4 DNA L igase;
(6) and carrying out PCR amplification and enrichment on the DNA fragment with the two ends connected with the joint to obtain a DNA sequencing library.
The present inventors have conducted extensive and intensive studies and found for the first time that in the most common ligation system comprising T4-DNA ligase and a buffer, the Y-linker with a fluorophore-blocked end is used for ligation, so that non-specific ligation between the non-ligated end and the DNA fragment can be avoided, and the ligation efficiency between the linker and the DNA fragment can be improved, thereby improving the library transformation efficiency, and increasing the library transformation efficiency (double-linker ligation efficiency) from 10% to 60% or even higher, and thus the method is a simple, efficient, and low-cost method for obtaining high library transformation efficiency.
Due to the implementation of the technical scheme, compared with the prior art, the invention has the following advantages:
the DNA joint of the invention can avoid the non-specific connection of the non-connection end and the DNA fragment, and improve the connection efficiency of the joint and the DNA fragment, thereby improving the conversion efficiency of the library.
Drawings
FIG. 1 is a typical non-ligated end-blocked Y-type DNA linker. The thicker two lines represent two DNA single strands, and the multiple short thin lines in parallel between represent that the part is two complementary DNA paired regions; the letter T represents the overhanging thymine base at the 3 'end of the linker and P represents the phosphorylated modification at the 5' end of the linker, both of which are used to mediate TA ligation between DNA fragments that are end-filled, 5 'phosphorylated, 3' plus A, by the DNA linker during DNA library construction. F and F ' represent fluorescent groups for blocking the non-connecting end of the DNA joint, and are blocked on 5 ' phosphate of 5 ' end base and 3 ' hydroxyl of 3 ' end base;
FIG. 2 shows the result of capillary electrophoresis detection of the ligation product in example 1, blue fluorescence channel. DNAQC labels the blue fluorophore FAM, and the length is increased after the linker is connected, so that the type of the ligation product can be estimated by the corresponding length of the ligation product (no linker is connected, a single linker is connected, and a double linker is connected). A) Is a Basic-AD linker ligation product, B) is a Basic-AD-F linker ligation product blocked by a fluorophore: the product peaks corresponding to the well-defined double-adaptor, single-adaptor and unlinked adaptors are boxed separately. It can be seen that the efficiency of double-adaptor ligation, which represents the transformation efficiency of the library, was greatly increased from 12% to 60% after the ends of the DNA adaptor were blocked using fluorescence.
FIG. 3 shows the results of capillary electrophoresis detection of the ligation products in example 1, blue/green/red fluorescence channels. Because DNAQC is marked with blue fluorescent group FAM and can be detected in a blue fluorescent channel, Basic-R-F is marked with green fluorescent group HEX and can be detected in a green fluorescent channel, and Basic-F-F is marked with red fluorescent group ROX and can be detected in a red fluorescent channel. The superposition of fluorescence peaks shows that the corresponding connecting products have multiple fluorescent group labels at the same time, and the types of the connecting products can be judged according to the fluorescent group labels. It can be seen that the definite single-end connection of Basic-R-F (blue-green fluorescence superposition) and the single-end connection of Basic-F-F (blue-red fluorescence superposition) are detected near the position of 120bp, and the definite connection products of the two ends respectively connected with the Basic-F-F/Basic-F-R single-chain (three-color fluorescence superposition) are detected near the position of 160 bp.
FIG. 4 shows the result of capillary electrophoresis detection of the ligation product in example 2, which is a blue fluorescence channel. DNAQC labels the blue fluorophore FAM, and the length is increased after the linker is ligated, so the type of DS-AD linker product is presumed to be ligated by the corresponding length of the ligation product (no linker, single linker and double linker). The product peaks corresponding to the well-defined double-adaptor, single-adaptor and unlinked adaptors are boxed separately.
FIG. 5 shows the results of capillary electrophoresis detection of the ligation products in example 2, blue/green/red fluorescence channels. As the DNAQC is marked with the blue fluorescent group FAM and can be detected in a blue fluorescent channel, the DS-R is marked with the green fluorescent group HEX and can be detected in a green fluorescent channel, and the DS-F is marked with the red fluorescent group ROX and can be detected in a red fluorescent channel. The superposition of fluorescence peaks shows that the corresponding connecting products have multiple fluorescent group labels at the same time, and the types of the connecting products can be judged according to the fluorescent group labels. As can be seen, a definite single-end-linked DS-R (blue-green fluorescence superposition) product is detected near the position of 140bp, a definite single-end-linked DS-F (blue-red fluorescence superposition) product is detected near the position of 165bp, and definite double-end-linked DS-F/DS-R single-stranded ligation products (three-color fluorescence superposition) are detected near the position of 220 bp.
FIG. 6 shows the results of library construction using different DNA adapters for low initial cfDNA in example 3, A)5ng initial cfDNA, PCR amplified for 10 cycles; B)1ng starting amount of cfDNA, 14 cycles of PCR amplification results. Both of them clearly show that the connection efficiency is greatly improved after the DNA joint is sealed by the fluorescent group. The DNA ladder was 100,200,300,400,500,600,700,800,900,1000 bp.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the present invention is not limited to the following examples. The implementation conditions adopted in the embodiments can be further adjusted according to different requirements of specific use, and the implementation conditions not mentioned are conventional conditions in the industry.
Example 1: method for verifying DNA connection efficiency of conventional Y-shaped joint by capillary electrophoresis
1. The following primer sequences were synthesized
DNAQC-F:ccg GAATTC TT[6-FAM-dT]GCCTTCATTGAGCGCTACTT(SEQ ID NO.1)
DNAQC-R:aaaa CTGCAG TTCCAGGGTCTTCTCAATCCAG(SEQ ID NO.2)
2. Using Thermus aquaticus genome DNA as a template, carrying out PCR amplification by using the primers, and obtaining specific double-stranded DNA with the length of 89bp after amplification, wherein the sequence is as follows: ccgGAATTCTTTGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAAACTGCAGtttt (SEQ ID NO. 3).
3. The resulting quality-controlled DNA fragment was purified by using DNA Clean & Concentrator-5(200Preps) w/Zymo-Spin ICColumns (clamped) from Zymo, and named DNAQC.
4. Chemical synthesis the following linker sequence, HP L C purification.
Basic-F:ACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.4)
Basic-R:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC(SEQ ID NO.5)
Basic-F-F:[ROX]ACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.6)
Basic-R-F:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[HEX](SEQ ID NO.7)
5. The linker sequence synthesized above was dissolved to a final concentration of 100. mu.M using annealing buffer (10mM Tris-Cl pH 8.0,1mM EDTA, 50mM NaCl). 25 μ l of Basic-F and 25 μ l of Basic-R were mixed in a 0.2ml thin-walled tube to give 50 μ l of Basic-AD (final concentration 50 μ M); mu.l of Basic-F-F and 25. mu.l of Basic-R-F were mixed in a 0.2ml thin-walled tube to give 50. mu.l of Basic-AD-F (final concentration 50. mu.M).
6. Annealing the Basic-AD and Basic-AD-F in a PCR instrument under the condition of denaturation at 95 ℃ for 10min and cooling to 25 ℃ at 0.1 ℃/second; and keeping the temperature at 25 ℃ for 2 hours to obtain the prepared DNA adaptor.
7. Using conventional library construction reagents to construct a library by using DNAQC in the steps, preparing a system shown in the following table 1, filling the tail ends, and adding A;
TABLE 1
5μl
|
End-filling + A buffer
|
1μl
|
T4 DNA polymerase
|
1μl
|
T4 PNK
|
1μl
|
Taq
|
10ng
|
DNAQC
|
Make up to 30 μ l
|
H2O |
Reaction conditions are as follows: 30 minutes at 20 ℃ and 30 minutes at 70 ℃.
8. After the end is filled and A is added, the splice connection is carried out, first using H2O A system as in Table 2 below was formulated by diluting Basic-AD with Basic-AD-F to 2. mu.M:
TABLE 2
30μl
|
End-filled + A fragment
|
26μl
|
Ligation buffer
|
3μl
|
T4 DNA-liagse
|
1μl
|
2μM Basic-AD/Basic-AD-F |
Reaction conditions are as follows: 30 minutes at 20 ℃.
9. Diluting 1. mu.l of the ligation product by 5 times, preparing a reaction system as listed in the following Table 3, and reacting under the reaction conditions as listed in the following Table 4;
TABLE 3
Reagent
|
Volume of
|
Hi-Di
|
9μL
|
Liz500
|
0.1μL
|
Diluted amplification product
|
1μL
|
Total volume
|
10μL |
TABLE 4
Step (ii) of
|
Reaction temperature
|
Reaction time
|
Number of cycles
|
Denaturation of the material
|
95℃
| 5min |
|
1
|
Cold storage
|
4℃
| Forever |
|
1 |
10. The ABI 3730 gene analyzer was set on the computer and tested using the SNaPshot/STR program.
11. Data analysis, namely opening original data by using Peakscan software, performing data analysis by using L iz500 as a reference, and judging a connection product (single joint connection and double joint connection) corresponding to each peak according to the position of each product peak and the theoretical product length, wherein the connection product (single joint connection and double joint connection) corresponding to each peak can be judged according to the position of each product peak, and Basic-AD-F can be more accurately judged as the source of each product peak because the joint is marked with fluorescence.
Example 2: capillary electrophoresis verification of connection efficiency of Y-type joint DNA containing UMI sequence
1. DNA QC was prepared as in example 1.
2. The following fragments were chemically synthesized and purified using HP L C
DS-F:[ROX]AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.8)
DS-R:TCTTCTACAGTCANNNNNNNNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[HEX](SEQ ID NO.9)
3. Using H2O the above synthesized DS-F/DS-R was dissolved to a final concentration of 100. mu.M. Mu.l DS-F and 25. mu.l DS-R were mixed together in a 0.2ml thin-walled tube to give 50. mu.l DS-AD (final concentration 50. mu.M).
4. Annealing the DS-AD in a PCR instrument under the condition of denaturation at 95 ℃ for 10min and cooling to 25 ℃ at 0.1 ℃/second; and held at 25 ℃ for 2 hours.
5. Single-chain extension by formulating the system described in Table 5 below
TABLE 5
50μl
|
Annealed product
|
10μl
|
10X NEB buffer2
|
5μl
|
Klenow exo-(5U/μl)
|
Make up to 100 mul
|
H2O |
The reaction was carried out at 37 ℃ for 1 hour.
6. The product was ethanol precipitated and resuspended in 85. mu.l of H2O is in; and formulating the system of Table 6 below
TABLE 6
85μl
|
Single-stranded extension product
|
10μl
|
10X NEB Cutsmart buffer
|
5μl
|
HpyCH4III(5U/μl) |
The reaction was carried out at 37 ℃ for 1 hour.
7. The above products were ethanol precipitated and resuspended in 50. mu.l of annealing buffer (10mM Tris-ClpH 8.0,1mM EDTA, 50mM NaCl) at a final concentration of about 20. mu.M for successful preparation of the DS-seq linker.
8. DNAQC was subjected to library construction using a conventional library construction method, the DNA linker was replaced with a DS-seq linker, and the linker ligation efficiencies were also examined using an ABI 3730 gene analyzer and analyzed using Peakscan software, as in example 1.
9. The double-joint connection efficiency of the DS-AD can be calculated to be 51% according to the peak area of each peak; the single-joint connection efficiency is 32%; the proportion of unconnected any linkers was 17%. Each peak can correspond to different types of ligation products, and non-specific ligation is avoided, which indicates that the fluorescent group-blocked modification method is also applicable to the ligation reaction of the UMI-containing linker, and the experimental results are shown in the attached figures 4 and 5.
Example 3cfDNA Low initial library construction
1. cfDNA was extracted from 5ml of plasma using MagMAX cell-free DNA isolation kit extraction from Thermo Fisher corporation.
2. Reference example 1 was used to synthesize Basic-AD and Basic-AD-F.
3. 5ng cfDNA and 1ng cfDNA were taken, library construction was performed using conventional library construction reagents, the following Table 7 system was prepared to fill in the ends and A was added
TABLE 7
5μl
|
End balance + A buffer
|
1μl
|
T4 DNA polymerase
|
1μl
|
T4 PNK
|
1μl
|
Taq
|
1ng/5ng
|
cfDNA
|
Make up to 30 μ l
|
H2O |
Reaction conditions are as follows: 30 minutes at 20 ℃ and 30 minutes at 70 ℃.
4. After the end is filled and A is added, the splice connection is carried out, first using H2O Basic-AD and Basic-AD-F were diluted to 2. mu.M to formulate the following system 8:
TABLE 8
30μl
|
End-filled + A fragment
|
26μl
|
Connection bufferFlushing liquid
|
3μl
|
T4 DNA-ligase
|
1μl
|
2μM Basic-AD/Basic-AD-F |
Reaction conditions are as follows: 30 minutes at 20 ℃.
5. The ligation products were purified using the MagicPure Size Selection DNA Beads from Hokko, Inc., and finally 15. mu. l H2And (4) eluting with O.
6. The library was enriched by PCR amplification using NEBNext Ultra II Q5 Master Mix from NEB, 1ng of starting DNA for 14 cycles and 5ng of starting DNA for 10 cycles.
7. Taking 3. mu.l of PCR product, using agarose gel electrophoresis for detection, as shown in FIG. 6, it can be seen that the band brightness of the amplification product connected with Basic-AD-F linker is significantly higher regardless of the initial amount of 1ng or 5ng compared with Basic-AD. The above results fully demonstrate that the efficiency of the double-joint connection is greatly improved after the end of the joint is closed by the fluorescent group.
The present invention has been described in detail in order to enable those skilled in the art to understand the invention and to practice it, and it is not intended to limit the scope of the invention, and all equivalent changes and modifications made according to the spirit of the present invention should be covered by the present invention.
Sequence listing
<110> Shanghai sky Hao Biotech Co., Ltd
<120> DNA linker and preparation method and application thereof
<160>9
<170>SIPOSequenceListing 1.0
<210>1
<211>31
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>1
ccggaattct tgccttcatt gagcgctact t 31
<210>2
<211>32
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>2
aaaactgcag ttccagggtc ttctcaatcc ag 32
<210>3
<211>89
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>3
ccggaattct ttgccttcat tgagcgctac tttcagagct tccccaaggt gcgggcctgg 60
attgagaaga ccctggaaac tgcagtttt 89
<210>4
<211>33
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>4
acactctttc cctacacgac gctcttccga tct 33
<210>5
<211>33
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>5
gatcggaaga gcacacgtct gaactccagt cac 33
<210>6
<211>33
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>6
acactctttc cctacacgac gctcttccga tct 33
<210>7
<211>33
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>7
gatcggaaga gcacacgtct gaactccagt cac 33
<210>8
<211>58
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>8
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210>9
<211>59
<212>DNA
<213> Artificial sequence (rengongxulie)
<400>9
tcttctacag tcannnnnnn nnnnnagatc ggaagagcac acgtctgaac tccagtcac 59