DNA (deoxyribonucleic acid) connector as well as preparation method and application thereof
Technical Field
The invention particularly relates to a DNA joint and a preparation method and application thereof.
Background
In the second generation sequencing experiments, DNA sequencing library construction based on DNA adaptor ligation is an important and fundamental experimental technique. In addition to application solely to conventional genome sequencing library construction, this technique is also an important link in library construction of RNA-seq, chIP-seq, RRBS, and the like. The DNA fragments are finally converted into DNA sequencing libraries after a) end-filling, b) addition of A, c) DNA adaptor ligation and d) PCR enrichment with universal primers matching the adaptor sequences. Clearly, only the double ends of a DNA fragment are simultaneously ligated to DNA adaptors, and the fragment can be enriched and sequenced during the subsequent PCR enrichment. The proportion of double-ended ligated-adaptor DNA fragments to all DNA fragments is also called conversion efficiency, and is one of the core criteria of DNA library construction quality, not only is conventional DNA library construction, but also some problems which must be considered in the construction of libraries (such as micro cfDNA libraries and PCR-free libraries) which are highly concerned about the original effective molecules.
In order to improve the connection efficiency, more common researches focus on optimization of various functional enzymes and buffers in the process of library construction. For example, end-to-end addition of A is an important step in the construction of DNA libraries, and the addition of A at the 3' end of the end can mediate TA ligation with greater ligation efficiency than blunt-ended ligation. Thus, researchers have developed or improved functional enzymes to increase the efficiency of terminal addition A. Researchers have also optimized buffer systems by ligation, and have achieved efficient ligation by adding reagents such as high polymer polyethylene glycol (PEG), small molecule propylene glycol, glycerol, and the like. Although these studies can also effectively improve the ligation efficiency, it is necessary to produce and reform enzymes with complicated and high-standard equipment, or to purchase commercial reagents at a relatively high cost because of patent protection, so that there is a strong need to develop a simple method for improving the library transformation efficiency.
Disclosure of Invention
The invention aims to provide a DNA joint capable of improving library transformation efficiency, and a preparation method and application thereof.
In order to solve the technical problems, the invention adopts the following technical scheme:
the inventors have found during the course of experiments that conventional Y-type linkers, when ligated, can undergo non-specific ligation to DNA fragments, presumably related to the 3 'hydroxyl and/or 5' phosphate of the non-ligated end of the DNA linker itself.
To this end, in one aspect, the invention provides a DNA adaptor which is 5 'and/or 3' blocked from the non-ligated end of the DNA adaptor, thereby ensuring that it does not interfere with the ligation reaction, thereby improving the ligation efficiency of the DNA adaptor.
The base composition of the DNA linker described in the present invention may comprise one or more of a (adenine), T (thymine), C (cytosine), G (guanine), U (uracil), 5mC (5' methyl modified cytosine). Preferably, the non-linked ends of the DNA linker are blocked with a fluorescent group.
The fluorophore in the present invention may be a conventional fluorophore, but is preferably one or more of FAM, HEX, ROX.
The groups used in the present invention for blocking the non-linking ends 5', 3' may be the same groups or may be different groups.
The DNA linker of the present invention may be various types of linkers, preferably, the DNA linker is a Y-type linker; wherein the Y-type linker includes, but is not limited to, conventional Y-type linkers, UMI-containing Y-type linkers.
Preferably, the non-linked terminal 5 'phosphate and/or 3' hydroxyl groups are blocked.
According to a specific and preferred embodiment, the 5 'and 3' of the non-linked ends of the DNA linker are blocked with fluorophores, respectively, so that the DNA linker can be used for ligation in the most common ligation system consisting of T4-DNA ligase and buffer, the library conversion efficiency (double-linker ligation efficiency) is improved from 10% to 60% or more, and the linker of the invention can simply, efficiently and cost-effectively improve the library conversion rate.
A block diagram of a typical non-ligated end-blocked Y-type DNA linker of the present invention is shown in FIG. 1.
The invention also provides a preparation method of the DNA adaptor, which is characterized in that two part sequences of the double-stranded DNA adaptor are respectively and chemically synthesized into single-stranded DNA with reverse complementarity, non-connecting ends of any one or two single-stranded DNA are blocked, and then the DNA adaptor is obtained through synthesis and purification.
In the present invention, the method of synthesizing the DNA adaptor from the two single-stranded DNAs includes, but is not limited to, one or more of annealing, single-stranded extension, restriction enzyme cleavage.
In a third aspect, the invention provides the use of said DNA adaptor in the construction of a DNA sequencing library.
In a fourth aspect, the present invention provides a method for constructing a DNA sequencing library, wherein the DNA adaptor is used for connecting DNA fragments.
According to one embodiment, the construction method comprises the following steps:
(1) Preparing a DNA fragment;
(2) Filling the ends of the DNA fragments in the step (1) with T4 DNA polymerase;
(3) Adding a phosphate group at the 5' -end of the DNA fragment treated in the step (2) by adopting T4 PNK;
(4) Adding A to the 3' -end of the DNA fragment treated in the step (3) through Klenow exo-or Taq polymerase;
(5) Ligating the DNA fragment of step (4) to the DNA adaptor using T4 DNA Ligase;
(6) And (3) carrying out PCR amplification enrichment on the DNA fragments with the double ends connected with the upper connector to obtain a DNA sequencing library.
The inventor of the present invention has studied extensively and intensively, and found for the first time that in the most common T4-DNA ligase and buffer solution composed of a connection system, the use of fluorescent groups closed end Y type joint for connection, can avoid non-connection of the end and DNA fragment, promote the connection efficiency of the joint and DNA fragment, thereby promoting the library conversion efficiency, library conversion efficiency (double joint connection efficiency) from 10% to 60% or higher, is a simple and efficient, low cost method for obtaining high library conversion rate.
Due to the implementation of the technical scheme, compared with the prior art, the invention has the following advantages:
the DNA joint can avoid the non-specific connection of the non-connection end and the DNA fragment, and improves the connection efficiency of the joint and the DNA fragment, thereby improving the conversion efficiency of the library.
Drawings
FIG. 1 is a typical non-ligated end-blocked Y-type DNA linker. The thicker two lines represent two DNA single strands, and the parallel thin lines between the two lines represent the two DNA complementary pairing regions; the letter T represents thymine bases protruding from the 3 'end of the linker ligation, P represents the phosphorylation modification of the 5' end of the linker ligation, both of which are used to mediate TA ligation between DNA fragments of 3 'plus A, with the DNA linker being aligned with the end, 5' phosphorylated, when the DNA is in stock. F and F ' represent fluorophores for blocking the non-ligated end of the DNA linker, on the 5' phosphate of the 5' terminal base and on the 3' hydroxyl of the 3' terminal base;
FIG. 2 shows the results of capillary electrophoresis detection of the ligation products in example 1, blue fluorescence channel. Dnatc labels the blue fluorescent group FAM, the length increases after ligation of the linker, so the type of ligation product (not ligating any linker, ligating a single linker and ligating a double linker) can be deduced from the corresponding length of the ligation product. A) Is a Basic-AD linker ligation product, B) is a Basic-AD-F linker ligation product blocked by a fluorescent group: the boxes respectively frame the product peaks corresponding to the well-defined double-linker connection, single-linker connection and non-linker connection. It can be seen that the efficiency of double-linker ligation, which represents the library transformation efficiency, was greatly improved from 12% to 60% after blocking the DNA linker ends using fluorescence.
FIG. 3 shows the results of capillary electrophoresis detection of ligation products in example 1, blue/green/red fluorescence channels. The DNAQC marks the blue fluorescent group FAM, can be detected in a blue fluorescent channel, the Basic-R-F marks the green fluorescent group HEX, can be detected in the green fluorescent channel, and the Basic-F-F marks the red fluorescent group ROX, and can be detected in the red fluorescent channel. The superposition of fluorescence peaks shows that the corresponding connection products have multiple fluorescent group marks at the same time, and the type of the connection products can be judged according to the fluorescence peak superposition. In the figure, it can be seen that a clear single-end connection Basic-R-F (blue-green fluorescence superposition) and a single-end connection Basic-F-F (blue-red fluorescence superposition) are detected near the 120bp position, and a clear double-end connection Basic-F-F/Basic-F-R single-chain connection product (three-color fluorescence superposition) is detected near the 160bp position.
FIG. 4 shows the results of capillary electrophoresis detection of the ligation products in example 2, blue fluorescence channel. DNAQC labels the blue fluorescent group FAM and the length increases after ligation of the linker, so that the type of DS-AD linker product (not ligated to any linker, single linker and double linker) can be deduced from the corresponding length of the ligation product. The boxes respectively frame the product peaks corresponding to the well-defined double-linker connection, single-linker connection and non-linker connection.
FIG. 5 shows the results of capillary electrophoresis detection of ligation products in example 2, blue/green/red fluorescence channels. The DNAQC marks the blue fluorescent group FAM, the fluorescent dye can be detected in a blue fluorescent channel, the DS-R marks the green fluorescent group HEX, the fluorescent dye can be detected in a green fluorescent channel, and the DS-F marks the red fluorescent group ROX, and the fluorescent dye can be detected in a red fluorescent channel. The superposition of fluorescence peaks shows that the corresponding connection products have multiple fluorescent group marks at the same time, and the type of the connection products can be judged according to the fluorescence peak superposition. As can be seen in the figure, a clear single-ended DS-R (blue-green fluorescence superposition) product was detected near the 140bp position, a clear single-ended DS-F (blue-red fluorescence superposition) product was detected near the 165bp position, and a clear double-ended DS-F/DS-R single-stranded ligation product (trichromatic fluorescence superposition) was detected near the 220bp position.
FIG. 6 is the results of library construction using different DNA adaptors for low starting amount cfDNA in example 3, A) 5ng starting amount cfDNA, PCR amplification for 10 cycles; b) 1ng of initial cfDNA, PCR amplified for 14 cycles. Both clearly indicate that the connection efficiency is greatly improved after the DNA joint is blocked by the fluorescent group. The DNA ladder is 100,200,300,400,500,600,700,800,900,1000bp.
Detailed Description
The present invention will be described in further detail with reference to the following examples, but the present invention is not limited to the following examples. The implementation conditions adopted in the embodiments can be further adjusted according to different requirements of specific use, and the implementation conditions which are not noted are conventional conditions in the industry.
Example 1: capillary electrophoresis to verify DNA connection efficiency of conventional Y-type connector
1. The following primer sequences were synthesized
DNAQC-F:ccg GAATTC TT[6-FAM-dT]GCCTTCATTGAGCGCTACTT(SEQ ID NO.1)
DNAQC-R:aaaa CTGCAG TTCCAGGGTCTTCTCAATCCAG(SEQ ID NO.2)
2. Using Thermus aquaticus genome DNA as a template, carrying out PCR amplification by using the primer, and obtaining specific double-stranded DNA with the length of 89bp after the amplification, wherein the sequence is as follows: ccgGAATTCTTTGCCTTCATTGAGCGCTACTTTCAGAGCTTCCCCAAGGTGCGGGCCTGGATTGAGAAGACCCTGGAAACTGCAGtttt (SEQ ID NO. 3).
3. After purification using DNA clear & Concentrator-5 (200 Preps) w/Zymo-Spin IC Columns (supplied) from Zymo company, a quality control DNA fragment designated DNAQC was obtained.
4. The following linker sequences were chemically synthesized and purified by HPLC.
Basic-F:ACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.4)
Basic-R:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC(SEQ ID NO.5)
Basic-F-F:[ROX]ACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.6)
Basic-R-F:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC[HEX](SEQ ID NO.7)
5. The above synthesized linker sequence was dissolved to a final concentration of 100. Mu.M using an annealing buffer (10 mM Tris-Cl pH 8.0,1mM EDTA,50mM NaCl). Mu.l Basic-F and 25. Mu.l Basic-R were mixed in a 0.2ml thin-walled tube to 50. Mu.l Basic-AD (final concentration 50. Mu.M); mu.l of Basic-F-F and 25. Mu.l of Basic-R-F were mixed in a 0.2ml thin-walled tube to give 50. Mu.l of Basic-AD-F (final concentration 50. Mu.M).
6. Placing the Basic-AD and Basic-AD-F into a PCR instrument for annealing under the conditions of denaturation at 95 ℃ for 10min and cooling to 25 ℃ at 0.1 ℃/sec; and kept at 25 ℃ for 2 hours to obtain the prepared DNA joint.
7. Carrying out library construction on DNAQC in the steps by using a conventional library construction reagent, and preparing a system of the following table 1 to fill up the tail end and add A;
TABLE 1
5μl
|
End filling +A buffer
|
1μl
|
T4 DNA polymerase
|
1μl
|
T4 PNK
|
1μl
|
Taq
|
10ng
|
DNAQC
|
Up to 30. Mu.l
|
H 2 O |
Reaction conditions: 30 minutes at 20℃and 30 minutes at 70 ℃.
8. After the terminal is complemented and A is added, the joint connection is carried out, firstly H is used 2 O Basic-AD and Basic-AD-F were diluted to 2. Mu.M and the system of Table 2 was prepared as follows:
TABLE 2
30μl
|
End filled-in +A fragment
|
26μl
|
Connection buffer solution
|
3μl
|
T4 DNA-liagse
|
1μl
|
2μM Basic-AD/Basic-AD-F |
Reaction conditions: 30 minutes at 20 ℃.
9. Taking 1 μl of the ligation product, diluting 5 times, preparing a reaction system shown in the following Table 3, and reacting according to the reaction conditions shown in Table 4;
TABLE 3 Table 3
Reagent(s)
|
Volume of
|
Hi-Di
|
9μL
|
Liz500
|
0.1μL
|
Amplified product after dilution
|
1μL
|
Total volume of
|
10μL |
TABLE 4 Table 4
Step (a)
|
Reaction temperature
|
Reaction time
|
Cycle number
|
Denaturation (denaturation)
|
95℃
|
5min
|
1
|
Refrigerating
|
4℃
|
Forever
|
1 |
10. The ABI 3730 gene analyzer was run on-line and tested using the SNaPshot/STR program.
11. Data analysis, using Peakscan software to open the original data, and using Liz500 as a reference to perform data analysis; according to the positions of the peaks of the products and the product length corresponding to the theory, the connection products (which are not connected, connected with a single connector and connected with a double connector) corresponding to the peaks can be judged, and the Basic-AD-F can judge the sources of the peaks of the products more accurately because the connectors are marked with fluorescence, and the test results of the embodiment are shown in the accompanying drawings 2 and 3. The double-joint connection efficiency of Basic-AD-F can be calculated to be 60% according to the peak area of each peak; the single joint connection efficiency is 28%; the proportion of any non-attached linker was 12% (FIG. 2-B). However, in Basic-AD ligation, 2 abnormal ligation peaks were present in addition to the three distinct product peaks described above, accounting for 37% of the total, with a double-linker ligation efficiency of 12%; the single joint connection efficiency is 28%; the proportion of any non-attached linker was 23% (FIG. 2-A). The above results fully demonstrate that the efficiency of double-linker ligation is greatly improved after the linker ends are blocked by fluorophores.
Example 2: capillary electrophoresis for verifying DNA connection efficiency of UMI sequence-containing Y-type connector
1. DNA QC was prepared as in example 1.
2. The following fragments were chemically synthesized and purified using HPLC
DS-F:[ROX]AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT(SEQ ID NO.8)
DS-R:TCTTCTACAGTCANNNNNNNNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[HEX](SEQ ID NO.9)
3. Using H 2 O the synthesized DS-F/DS-R described above was dissolved to a final concentration of 100. Mu.M. Mu.l of DS-F and 25. Mu.l of DS-R were mixed to 50. Mu.l of DS-AD (final concentration 50. Mu.M) in a 0.2ml thin-walled tube.
4. Putting the DS-AD into a PCR instrument for annealing under the conditions of denaturation at 95 ℃ for 10min and cooling to 25 ℃ at 0.1 ℃/sec; and maintained at 25℃for 2 hours.
5. Single strand extension by formulating the system described in Table 5 below
TABLE 5
50μl
|
Annealed product
|
10μl
|
10X NEB buffer2
|
5μl
|
Klenow exo-(5U/μl)
|
Up to 100. Mu.l
|
H 2 O |
The reaction conditions were 37℃for 1 hour.
6. After ethanol precipitation of the above product, it was resuspended in 85. Mu.l of H 2 O is as follows; and formulating the system of Table 6 below
TABLE 6
85μl
|
Single-stranded extension products
|
10μl
|
10X NEB Cutsmart buffer
|
5μl
|
HpyCH4III(5U/μl) |
The reaction conditions were 37℃for 1 hour.
7. After ethanol precipitation, the above product was resuspended in 50. Mu.l of annealing buffer (10 mM Tris-ClpH 8.0,1mM EDTA,50mM NaCl), at which point the final concentration was about 20. Mu.M for successful DS-seq linker preparation.
8. As in example 1, DNAQCs were pooled using conventional pooling methods, DNA linkers were replaced with DS-seq linkers, and linker ligation efficiencies were also detected using the ABI 3730 gene analyzer and analyzed using Peakscan software.
9. The double-joint connection efficiency of DS-AD can be calculated to be 51% according to the peak area of each peak; the single joint connection efficiency is 32%; the proportion of any splice not attached was 17%. Each peak can correspond to a different type of ligation product without non-specific ligation, indicating that the modification method of fluorescent group blocking is equally applicable to the ligation reaction of UMI-containing linkers, as shown in FIG. 4 and FIG. 5.
EXAMPLE 3cfDNA Low initial library construction
1. cfDNA was extracted from 5ml plasma using a MagMAX cell-free DNA isolation kit kit extraction from Thermo Fisher.
2. Basic-AD and Basic-AD-F were synthesized as described in reference to example 1.
3. 5ng and 1ng cfDNA were taken, respectively, library construction was performed using conventional library building reagents, and the following Table 7 system was prepared to fill up the ends and add A
TABLE 7
5μl
|
Terminal balance + a buffer
|
1μl
|
T4 DNA polymerase
|
1μl
|
T4 PNK
|
1μl
|
Taq
|
1ng/5ng
|
cfDNA
|
Up to 30. Mu.l
|
H 2 O |
Reaction conditions: 30 minutes at 20℃and 30 minutes at 70 ℃.
4. After the terminal is complemented and A is added, the joint connection is carried out, firstly H is used 2 O Basic-AD and Basic-AD-F were diluted to 2. Mu.M and formulated as the system of Table 8 below:
TABLE 8
30μl
|
End filled-in +A fragment
|
26μl
|
Connection buffer solution
|
3μl
|
T4 DNA-ligase
|
1μl
|
2μM Basic-AD/Basic-AD-F |
Reaction conditions: 30 minutes at 20 ℃.
5. The ligation product was purified using the full gold company MagicPure Size Selection DNA Beads, using a final 15. Mu. l H 2 And (3) eluting O.
6. PCR amplification was performed using NEB company NEBNext Ultra II Q Master Mix, 14 cycles of 1ng of starting DNA and 10 cycles of 5ng of starting DNA to enrich the library.
7. As shown in FIG. 6, the band brightness of amplified products attached to Basic-AD-F adapter was significantly higher than that of Basic-AD, regardless of whether the initial amount was 1ng or 5ng, as measured by agarose gel electrophoresis, with 3. Mu.l of PCR product. The above results fully demonstrate that the efficiency of double-linker ligation is greatly improved after the linker ends are blocked by fluorophores.
The present invention has been described in detail with the purpose of enabling those skilled in the art to understand the contents of the present invention and to implement the same, but not to limit the scope of the present invention, and all equivalent changes or modifications made according to the spirit of the present invention should be included in the scope of the present invention.
Sequence listing
<110> Shanghai Dimens Biotechnology Co., ltd
<120> a DNA linker, method for preparing the same and use thereof
<160> 9
<170> SIPOSequenceListing 1.0
<210> 1
<211> 31
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 1
ccggaattct tgccttcatt gagcgctact t 31
<210> 2
<211> 32
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 2
aaaactgcag ttccagggtc ttctcaatcc ag 32
<210> 3
<211> 89
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 3
ccggaattct ttgccttcat tgagcgctac tttcagagct tccccaaggt gcgggcctgg 60
attgagaaga ccctggaaac tgcagtttt 89
<210> 4
<211> 33
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 4
acactctttc cctacacgac gctcttccga tct 33
<210> 5
<211> 33
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 5
gatcggaaga gcacacgtct gaactccagt cac 33
<210> 6
<211> 33
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 6
acactctttc cctacacgac gctcttccga tct 33
<210> 7
<211> 33
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 7
gatcggaaga gcacacgtct gaactccagt cac 33
<210> 8
<211> 58
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 8
aatgatacgg cgaccaccga gatctacact ctttccctac acgacgctct tccgatct 58
<210> 9
<211> 59
<212> DNA
<213> Artificial sequence (rengngxulie)
<400> 9
tcttctacag tcannnnnnn nnnnnagatc ggaagagcac acgtctgaac tccagtcac 59