CN117512063A

CN117512063A - DNA library construction method, device and application thereof

Info

Publication number: CN117512063A
Application number: CN202311225560.2A
Authority: CN
Inventors: 任悍; 矫昕霖; 刘佳慧; 任重敢; 谭晶晶; 吴德伦; 赵美茹
Original assignee: Beijing Jiyinjia Medical Laboratory Co ltd; Shenzhen Guiinga Medical Laboratory
Current assignee: Beijing Jiyinjia Medical Laboratory Co ltd; Shenzhen Guiinga Medical Laboratory
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2024-02-06

Abstract

The present disclosure provides a DNA library construction method, a kit and applications thereof. Using the methods of the present disclosure, probes or primers can be quality controlled. The quality control can be completed by only comparing the next machine data of the sequencing library with the probe sequence without evaluating the actual use effect of the primer probe, the time cost and the expensive actual test cost are saved, the long-standing technical problem that a small amount of synthesis errors in the primer or probe synthesis process cannot be judged directly from the primer probe use effect is solved, and the blank of the quality control technology for the self-construction of the probe is filled.

Description

DNA library construction method, device and application thereof

Technical Field

The invention relates to the field of gene sequencing, in particular to a method and a device for constructing a library based on DNA and application thereof.

Background

To date, the quality control methods of oligonucleotide primers or probes are broadly divided into two categories: quality control of synthesized fragments and purity and quality control of actual use effect, wherein:

quality control after synthesis and quality control after mixing mainly monitor the purity of the primer or probe, and the quality control modes of oligonucleotides with different lengths are also different. Generally speaking, the quality of the oligonucleotides smaller than 70nt is controlled by high performance liquid chromatography, and the quality of the oligonucleotides larger than 70nt is controlled by polyacrylamide gel electrophoresis. In addition, there are also methods for mass control of oligonucleotides of 20-100nt by analyzing the oligonucleotides using mass spectrometry, generating ion charge distributions of different valence states for the nucleic acids by electrospray ionization, and calculating the molecular weight of the nucleic acids using deconvolution.

The above technical means do not distinguish between nucleic acids of the same nucleotide composition, or of different ligation sequences (isomers): since the probe set or the primer set usually contains thousands or even tens of thousands of oligonucleotide chains, for the oligonucleotide chains with the same length, whether a certain probe or a plurality of probes in the oligonucleotide chains are in missed mixing or not and the phenomenon of multiple mixing cannot be determined after the oligonucleotide chains are mixed; the quality control of the case of nucleotide synthesis sequence errors is also not possible.

Generally speaking, quality control after purification can be indirectly judged by actual enrichment conditions of target sequences, libraries are constructed after multiplex PCR or libraries are constructed first and then probe hybridization capture experiments are carried out, the obtained enriched libraries are subjected to on-machine sequencing, the depth and average depth of a region to be enriched by a primer or a probe to be quality controlled are calculated from sequencing results by using a bioinformatics method, depth coefficients (target region depth/average depth) are obtained through calculation, the action effect of the probe or the primer is judged according to the height of the depth coefficients, and if the depth coefficient of the target region of a certain probe or primer is too low or the depth coefficient is 0, the difference or the absence of the mass of the certain probe or the primer is considered.

However, this conventional indirect quality control method makes it difficult to monitor and determine the presence of small amounts of non-targeted sequences in a probe set.

Moreover, practical evaluation experiments often require significant additional time and labor costs, and because the probe manufacturer and probe user are often not the same department, the design and actual delivery of probe primers is not ideal. The efficiency of primer or probe target binding is partially random, especially in current pathogen targeted enrichment applications (target next generation sequencing, tNGS), where the number of pathogens covered by a primer or probe can reach hundreds of thousands, and practical evaluation tests do not allow for the complete evaluation of all pathogens covered by a primer or probe designed according to a database using all pathogen standards, thus a method for sequence quality control of a probe or primer is highly desirable.

On the other hand, primers or probes used in molecular biology generally use 5' end modification (such as thio modification of the 5' end of the primer, amino modification of the 5' end, thiol modification of the 5' end, biotin modification of the 5' end of the probe in a biotin-streptavidin hybridization capture system), and the traditional library construction scheme cannot directly construct a sequencing library except for phosphorylation modification of the 5' end, so that sequencing quality control cannot be realized under the condition that the primer or the probe is not subjected to non-phosphorylation modification of the 5' end.

The existing single-stranded DNA library construction method has obvious bias, the accurate proportion of each primer probe in the primer probes to be controlled cannot be obtained, and semi-quantification of the primer probes cannot be realized.

Disclosure of Invention

In order to solve at least one of the above problems, the present disclosure provides a DNA library construction method, apparatus and application thereof. Using the methods of the present disclosure, probes or primers can be quality controlled, as well as semi-quantitative analysis.

According to a first aspect of the present disclosure, there is provided a method of constructing a DNA library, the method comprising the steps of:

1) Adding poly (X) at the 3' -end of a single-stranded DNA molecule _n Tail to obtain a polypeptide (X) _n A first single-stranded DNA molecule of the tail;

2) Obtaining a double stranded DNA molecule based on said first single stranded DNA molecule using an extension primer, wherein said double stranded DNA molecule comprises said first single stranded DNA molecule and a second single stranded DNA molecule complementary to said first single stranded DNA molecule, said extension primer being capable of annealing to poly (X) at the 3' end of said first single stranded DNA molecule _n On the tail; and

3) Distal poly (X) of double-stranded adaptors to the double-stranded DNA molecules _n And connecting one end of the tail, and amplifying the second single-stranded DNA molecule to obtain an amplification product, wherein the amplification product forms a DNA sequencing library.

In some embodiments, in said step 1), said single stranded DNA molecule is selected from 10 to 150nt in length.

In some specific embodiments, in the step 1), the single-stranded DNA molecule length is selected from 10nt, 20nt, 30nt, 40nt, 50nt, 60nt, 70nt, 80nt, 90nt, 100nt, 110nt, 120nt, 130nt, 140nt, 150nt.

In some specific embodiments, in said step 1), said single stranded DNA molecule is selected from 20 to 120nt in length.

In some embodiments, the single stranded DNA molecule has a modification at the 5' end.

In some specific embodiments, the single stranded DNA molecule 5' end modification comprises a non-phosphorylated modification.

In some specific embodiments, the single stranded DNA molecule 5' terminal modification includes, but is not limited to, amino modification, diphenylcyclooctyne modification, biotin modification, desthiobiotin, sulfhydryl modification, dithiol modification, ferrocene modification, tetrahydrofuran modification, thio modification, phosphorothioate modification, digoxin modification, cholesterol modification, azobenzene modification, methylene blue modification, binaphthyl modification, ruthenium modification, and the like.

In some specific embodiments, the single stranded DNA molecule 5' terminal modification comprises a thio modification, an amino modification, a thiol modification, or a biotin modification.

In some embodiments, in the step 1), the n represents the number of bases X.

In some specific embodiments, n is selected from integers from 6 to 12.

In some specific embodiments, n is selected from 6, 7, 8, 9, 10, 11, or 12.

In some specific embodiments, the X is selected from any one of the four A, T, C, G bases.

In some specific embodiments, the X is selected from the group consisting of base C or G.

In some specific embodiments, the X is selected from a G base.

In some embodiments, in said step 1), poly (X) is reacted with a terminal transferase _n The tail is added to the 3' end of the single stranded DNA molecule.

In some embodiments, in the step 2), the 3' end of the extension primer comprises (Y) _m Base unit, wherein base Y is complementary to base X, e.g., when base X is G, base Y is C and m represents the number of bases Y.

In some specific embodiments, m is selected from integers from 4 to 12.

In some specific embodiments, m is selected from 4, 5, 6, 7, 8, 9, 10, 11, or 12.

In some embodiments, the extension primer (Y) _m The 5' end of the base unit also includes a nucleotide sequence other than poly (X) _n One or more bases complementary to the tail.

In some specific embodiments, the extension primer is selected from 20 to 40nt in length.

In some specific embodiments, the length of the extension primer is selected from 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40nt.

In some embodiments, in said step 3), amplification is performed using amplification primers.

In some embodiments, the amplification primers comprise a first amplification primer and/or a second amplification primer.

In some specific embodiments, the length of the first amplification primer and/or the second amplification primer sequence is each independently selected from 20 to 40nt.

In some specific embodiments, the length of the first amplification primer and/or the second amplification primer sequence is each independently selected from 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40nt.

In some embodiments, in said step 3), said double-stranded adaptor is attached to a single end of said double-stranded DNA molecule.

In some specific embodiments, the double-stranded adaptor comprises: a first adaptor single strand to be ligated to the 5' end of the first single strand DNA molecule; and a second adaptor single strand to be ligated to the 3' -end of the second single strand DNA molecule.

In some specific embodiments, in said step 3), the 3 'end of the first adaptor single strand and the 5' end of the second adaptor single strand in said double-stranded adaptor are blunt ends, and said double-stranded adaptor is linked to said double-stranded DNA molecule by blunt ends.

In some embodiments, the blunt end of the double-stranded adaptor has one or more random complementary base pairs.

In some embodiments, the blunt end of the double-stranded adaptor has 1 to 10 random complementary base pairs.

In some specific embodiments, the blunt end of the double-stranded adaptor has 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 random complementary base pairs.

In some embodiments, these random complementary base pairs of the double-stranded adaptor can identify the attached single-stranded molecule, functioning as a molecular tag (UMI). The random complementary base pairs can reduce the bias of the double-stranded adaptor to double-stranded molecule.

In some specific embodiments, molecular tags (UMI) can be used for calibration of amplification or sequencing errors.

In some embodiments, the blunt-ended second adaptor single-stranded 5 'end of the double-stranded adaptor is ligated to the 3' end of the second single-stranded DNA molecule, and the 3 'end of the first adaptor single-stranded is not ligated to the modified 5' end of the first single-stranded DNA molecule. As described above, since the 5 'end of the first single-stranded DNA molecule is modified, the 3' end of the first adaptor single strand of the double-stranded adaptor is not linked to the 5 'end of the first single-stranded DNA molecule, so that a gap exists between the blunt end of the double-stranded adaptor and the 5' end of the first single-stranded DNA molecule. In this way, in a subsequent amplification, only the second single stranded DNA molecule can be amplified.

In some embodiments, the double-stranded adaptor comprises one or more complementary base pairs, e.g., comprising 5 to 20 complementary base pairs, in addition to the random complementary base pairs described above.

In some specific embodiments, the double-stranded adaptor comprises 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 complementary base pairs in addition to the random complementary base pairs described above.

In some embodiments, the double-stranded adaptor further comprises a non-complementary unit at the end remote from the random complementary base.

In some embodiments, the two strands of the non-complementary unit of the double-stranded linker each independently comprise 10 to 30 bases.

In some specific embodiments, the two strands of the non-complementary unit of the double stranded linker each independently comprise 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 bases.

In some embodiments, the second amplification primer is reverse complementary to the 3' end of the second adaptor single strand.

In some embodiments, the 5' end of the first adaptor single strand is not complementary to any amplification primers. Thus, even if a primer dimer is formed, the two-way amplification cannot be realized, and the concentration of the primer dimer can be effectively reduced.

In some embodiments, the method further comprises sequencing the sequencing library of step 3). In some embodiments, the method further comprises performing a bioinformatic analysis on the sequencing data.

In some embodiments, the bioinformatics analysis comprises double-ended read combining of base quality-controlled sequencing data, poly (X) _n Subsequent analysis was performed after tail excision.

In some embodiments, the subsequent analysis includes length quality control including length statistics of reads and sequence alignment including poly (X) _n And (3) comparing the sequence after excision with the sequence of the probe set, and completing the quality control of the primer and the probe of the single-stranded DNA by analyzing the sequence comparison condition and sequence statistical depth coefficient.

According to a second aspect of the present disclosure, there is provided an apparatus for constructing a library based on single stranded DNA, the apparatus comprising:

a tail linker unit for forming poly (X) at the 3' -end of the single stranded DNA molecule _n Tail to obtain a polymer (X) _n A first single-stranded DNA molecule at the tail;

an extension unit based on the first single-stranded DNA molecule to obtain a double-stranded DNA molecule, wherein the double-stranded DNA molecule comprises the first single-stranded DNA molecule and a second single-stranded DNA molecule complementary to the first single-stranded DNA molecule, the extension primer being capable of annealing to poly (X) at the 3' -end of the first single-stranded DNA molecule _n On the tail;

a linker linking unit for attaching a double-stranded linker to the double-stranded DNA molecule at a distance from poly (X) _n One end of the tail is connected; and

an amplification unit for amplifying the second single stranded DNA molecule to obtain amplification products, the amplification products constituting a DNA sequencing library.

In some embodiments, the n in the tail linking unit represents the number of bases X.

In some embodiments, n is selected from integers of 6 to 12.

In some embodiments, n is selected from 6, 7, 8, 9, 10, 11, or 12.

In some embodiments, the X in the tail linking unit is selected from any of the four A, T, C, G bases.

In some specific embodiments, the X is selected from a G base.

In some embodiments, the tail linker unit is used to link poly (X) using a terminal transferase _n The tail is added to the 3' end of the single stranded DNA molecule.

In some embodiments, in the amplification unit, amplification is performed using amplification primers.

In some embodiments, the first amplification primer and/or the second amplification primer sequence are each independently selected from 20 to 40nt in length.

In some specific embodiments, the first amplification primer and/or the second amplification primer sequence lengths are each independently selected from 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40nt.

In some embodiments, the 3' end of the extension primer comprises (Y) m base units, wherein base Y is complementary to base X, e.g., when base X is G, base Y is C and m represents the number of bases Y.

In some embodiments, m is selected from integers from 4 to 12.

In some embodiments, m is selected from 4, 5, 6, 7, 8, 9, 10, 11, or 12.

In some embodiments, the 5' end of the (Y) m base unit of the extension primer further comprises a primer that is not associated with poly (X) _n One or more bases complementary to the tail.

In some embodiments, the extension primer is selected from 20 to 40nt in length.

In some embodiments, the double-stranded adaptor comprises: a first adaptor single strand to be ligated to the 5' end of the first single strand DNA molecule; and a second adaptor single strand to be ligated to the 3' -end of the second single strand DNA molecule.

In some embodiments, the 3 'end of the first adaptor single strand and the 5' end of the second adaptor single strand are blunt ends, and the adaptor junction unit is linked to the double stranded DNA molecule by the blunt ends.

In some embodiments, the blunt end of the linker connecting unit has one or more random complementary base pairs.

In some embodiments, the blunt end of the linker connecting unit has 1 to 10 random complementary base pairs.

In some embodiments, the blunt end of the linker linking unit has 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 random complementary base pairs.

In some embodiments, these random complementary base pairs of the linker linking unit can identify the linked single stranded molecule, functioning as a molecular tag. The random complementary base pairs can reduce the bias of the double-stranded adaptor to double-stranded molecule.

In some embodiments, the 5 'end of the second adaptor single strand is ligated to the 3' end of the second single strand DNA molecule. As described above, since the 5 'end of the first single-stranded DNA molecule is modified, the 3' end of the first adaptor single-stranded of the adaptor connecting element is not connected to the 5 'end of the first single-stranded DNA molecule, so that a gap exists between the blunt end of the adaptor connecting element and the 5' end of the first single-stranded DNA molecule. In this way, in a subsequent amplification, only the second single stranded DNA molecule can be amplified.

In some embodiments, the linker linking unit comprises one or more complementary base pairs, e.g., comprising 10 to 30 complementary base pairs, in addition to the random complementary base pairs described above.

In some specific embodiments, the linker linking unit comprises 10, 20, or 30 complementary base pairs in addition to the random complementary base pairs described above.

In some embodiments, the linker linking unit further comprises a non-complementary unit at the end remote from the random complementary base.

In some embodiments, the two strands of the non-complementary unit of the linker linking unit each independently comprise 20 to 60 bases.

In some specific embodiments, the two strands of the non-complementary unit of the linker linking unit each independently comprise 20, 30, 40, 50, or 60 bases.

In some embodiments, the method further comprises sequencing the sequencing library of the amplification units and performing a bioinformatic analysis on the sequencing data.

In some embodiments, the subsequent analysis includes length quality control including length statistics of reads and sequence alignment including poly (X) _n And (3) comparing the sequence after tail excision with the sequence of the probe set, and completing the quality control of the primer and the probe of the single-stranded DNA by analyzing the sequence comparison condition and sequence statistical depth coefficient.

According to a third aspect of the present disclosure there is provided the use of the method of the first aspect and/or the device of the second aspect in probe or primer substance control.

In some embodiments, the method and/or the device is used to sequence and/or semi-quantitatively or quantitatively analyze one or more probes or primers.

According to the method, the quality control is carried out on the primer probe by constructing the oligonucleotide sequencing library, the quality control is carried out without evaluating the actual use effect of the primer probe, the quality control can be finished by only comparing the next machine data of the sequencing library with the probe sequence, and specific detection information including the sequence accuracy of the primer or the probe group, the length of each sequence, the proportion accuracy of different sequences, the pollution degree of non-target sequences and the like is obtained, so that the time cost and the expensive actual test cost are saved.

The method can effectively control the quality of the primer probe base synthesis error, breaks the long-standing technical problem that a small amount of synthesis errors in the primer or probe synthesis process cannot be judged directly from the primer probe using effect, and fills the blank of the quality control technology of the self-library construction of the probe.

Drawings

FIG. 1 shows a library construction procedure of primers and probes.

FIG. 2 shows a schematic representation of blunt end nick ligation.

FIG. 3 shows the bioinformatic alignment of primer and probe sequencing sequences.

FIG. 4 shows the sequence detection results for quality control of a probe set purified twice in succession using the same SPE cartridge.

Fig. 5 shows the quality control results of the probe set in comparison with the depth factor actually captured by the probe. In the figure, the abscissa represents the probe number, and the ordinate represents the number of probe detection sequences and the actual captured depth coefficient of the probe.

FIG. 6 shows a comparison of the effect of the tail of different bases on the results of the library of the present invention.

Detailed Description

In library construction, the adaptor ligation is typically performed using T4DNA ligase, however T4DNA ligase has a base end bias such as a base end with a different ligation efficiency from the other four bases (N). For probe set quality control, this pooling bias can lead to inconsistent pooling efficiency of probes in the probe set, thereby introducing errors in the semi-quantitative process of probes. In the single-chain library construction process of the probe, the double-chain connector with random complementary bases at the flat end is used for carrying out single-end notch connection on DNA molecules, so that the bias of T4DNA ligase is overcome, and the semi-quantitative detection of the quality control of the probe set is realized.

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. The specific embodiments described herein are for purposes of illustration only and are not to be construed as limiting the invention in any way. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure. Such structures and techniques are also described in a number of publications.

Definition of the definition

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly used in the art to which this invention belongs. For the purposes of explaining the present specification, the following definitions will apply, and terms used in the singular will also include the plural and vice versa, as appropriate.

The terms "a" and "an" as used herein include plural referents unless the context clearly dictates otherwise.

In the present disclosure, the term "terminal transferase" refers to an enzyme that is capable of adding deoxynucleotides to one or more 3' ends of a DNA molecule in a template-independent manner. "terminal transferase activity" refers to the terminal transferase activity of any enzyme having that ability.

In the present disclosure, the term "basic quality control" includes filtering low quality reads obtained from sequencing to obtain high quality reads.

In some embodiments, the base quality control further comprises splitting the sequencing data into sequencing data of different sample sources according to the barcode sequence.

In the present disclosure, the term "linker" refers to a nucleic acid that can be attached to an RNA or DNA molecule. The linker may be RNA or single-stranded/double-stranded DNA, or a mixture thereof, and may be single-stranded or double-stranded to the RNA or DNA molecule; the linker may be a perfectly double-stranded complementary linker or may be a hairpin linker (i.e., a molecule that base pairs with itself to form a structure with a double-stranded stem and loop, wherein the 3 'and 5' ends of the molecule are linked to the 5 'and 3' ends of the double-stranded DNA molecule). Y-linkers are also possible. A linker length in the range of 15 to 100 bases, e.g., 50 to 70 bases, may also include linkers of lengths outside of this range.

In the present disclosure, the term "A-T ligation" refers to filling the 3' end of a DNA molecule with a polymerase, adding an A tail, and then ligating a linker comprising a T overhang to the DNA molecule with the A-tail, a known common manner of cohesive end ligation.

In the present disclosure, the term "single ended nick-ligation" is a reaction process that is distinguished from conventional double-ended ligation adaptors, meaning that the adaptor is ligated to only one end of a DNA molecule, one strand of the double-stranded adaptor cannot be ligated to a double-stranded DNA molecule to form a nick, and the other strand is ligated to one strand of the DNA molecule.

In the present disclosure, the term "SPE column purification" is one of the usual post-DNA purification modes, after fixing the synthesized product to the SPE column by solid phase extraction, removing impurities by a detergent, and then eluting the successfully synthesized DNA with an eluting solvent, thereby completing the purification. Compared with the traditional PAGE gel purification mode, the SPE column has higher purification efficiency and shorter purification duration. Because of the high cost of SPE cartridges, a single SPE cartridge is typically reused to purify multiple batches of probes. In the course of repeated use, after the last batch of DNA has been purified, a small amount may remain on the SPE cartridge due to incomplete washing of the cartridge. Thus, if there is a residue of the last purified molecule in the SPE cartridge during the purification of the next batch of DNA, it will be eluted into the next batch of DNA, resulting in the inclusion of non-targeting sequences in the probe set or primer set. The oligonucleotide chains of these non-targeting sequences may affect the practical use of the probe set or primer set, such as non-uniform data, excessive numbers of non-targeting sequences, etc.

The present disclosureThe method is carried out by adding poly (X) to the 3' -end of single-stranded DNA _n And (3) tail, namely, obtaining double-stranded DNA by utilizing reverse complementary sequence extension, and connecting the other end of the double-stranded DNA by a double-stranded joint to obtain a double-stranded DNA library with known sequences at both ends. Wherein poly (G) _n The addition of tail is compared to poly (C) _n Poly (a) n and poly (T) n are more advantageous for controlling the amount and rate of addition and sequencing can utilize more uniform read lengths. See fig. 1.

On the other hand, the double-stranded adaptor-ligated extension product is ligated to the 3' -end of one strand of double-stranded DNA by single-ended nick ligation, and the resulting double-stranded DNA product has one strand containing a double-ended known sequence and the other strand containing only the original single-stranded DNA molecule and the polyG tail. The ligation is followed by amplification using the DNA strand containing the double-ended known sequence as template, while the other strand is not amplified nor detected by sequencing. See fig. 2.

By arranging random complementary base pairs at the terminal of the connector, various base pairs at the terminal of the connector are uniformly distributed, and compared with the traditional A-T connection, the bias of the base at the terminal of the connector in the connection reaction is improved.

The double-stranded joint comprises a Y-shaped joint, the sequence of the connecting end is complementary, the non-connecting end of the forking part is a non-complementary sequence, one strand of the double-stranded joint contains a sequence complementary to the amplification primer and is connected with the 3' end of one strand of double-stranded DNA; the other strand is not complementary to any amplification primer sequence and is not attached to the 5' end of the other strand of double-stranded DNA. That is, the double-stranded adaptor is connected to the double-stranded DNA in a single-stranded nick manner. Even if the double-stranded linker is ligated to form a linker dimer, amplification is not possible in the subsequent amplification reaction, and thus the proportion of non-specific products can be effectively reduced.

Compared with the conventional primer probe, the fragment length interval of the primer probe can only be controlled by polyacrylamide gel electrophoresis quality after synthesis, and the method disclosed by the disclosure can detect the accurate nucleic acid length and the nucleic acid sequence of a single oligonucleotide;

compared with other library building methods, the method disclosed by the disclosure has the advantages that the double-stranded connector and the extension product are used for carrying out 5' -end flat-end single-end notch connection, the method is higher than single-stranded connection, the deviation is lower, the uniformity of a probe library is higher, and the semi-quantitative detection of the quality control of a probe set can be realized.

The method solves the problem that the single end of the non-phosphorylated modified probe cannot be directly added with a single-stranded joint, so that a library cannot be constructed.

The flow operation of the invention is simpler and more convenient, probe targeting capture quality control test is not needed, so that not only are a lot of experiment time and experiment cost saved, but also unstable results caused by unskilled operation of complicated hybridization capture flow of an experimenter can be eliminated.

Examples and figures are provided below to aid in the understanding of the invention. It is to be understood that these examples and drawings are for illustrative purposes only and are not to be construed as limiting the invention in any way. The actual scope of the invention is set forth in the following claims. It will be understood that any modifications and variations may be made without departing from the spirit of the invention.

Examples

Example 1: used for quality control of probe purification mode, i.e. monitoring the probe molecule residue of SPE column after purifying probe

In this example, the same SPE cartridge was used, followed by two purifications, each of which purified 48 different 5' -end biotin-modified probes. The quality control of the probe after the twice purification is respectively carried out by the invention, and the steps are as follows:

Poly-dG tail addition:

1.1 100ng of the probe sets to be quality controlled were placed in 0.2ml PCR tubes and filled to 27.6. Mu.L with double distilled water.

1.2 after thawing the reagents, the reaction solutions were prepared with reference to Table 1.

TABLE 1 Poly-dG tail addition reaction System

Component (A)	Single reaction volume (μL)
		SPE column purification and test probe (100 ng) +water	27.6
10x terminal transferase reaction buffer	4
		2.5mMCoCl ₂	4
1mMdGTP	4
		20U/. Mu.L terminal transferase	0.4
Total volume of	40

Terminal Transferase for this example was purchased from New England Biolabs; dGTP was purchased from Takara.

1.3 mixing the reaction mixture by shaking with a vortex oscillator, instantly centrifuging with a mini centrifuge, and placing the mixture in a PCR instrument for reaction under the conditions shown in Table 2.

TABLE 2 Poly-dG tail addition reaction conditions

Temperature (temperature)	Time
		Heat cover (42 ℃ C.)	Opening the valve
37℃	20 minutes
		4℃	Holding

After the reaction was completed, purification was performed using 100. Mu.LAxygen magnetic beads, and the purified product was dissolved back in 20. Mu.L double distilled water.

2. Primer extension:

2.1 after thawing the reagents, the reaction solution was prepared with reference to Table 3, and the extended primer sequences were: 5'-3': GAACGACATGG CTACGATCCGACTTCCCCCCCC (SEQ ID NO: 1).

TABLE 3 primer binding reaction System

Component (A)	Single reaction volume (μL)
		Poly-dG-tailed DNA (step 1.2 purification of the product)	20
20uM extension primer	1.5
		Total volume of	21.5

2.2 mixing the reaction mixture by shaking with a vortex oscillator, performing instantaneous centrifugation with a mini centrifuge, and placing the mixture in a PCR instrument for reaction under the following reaction conditions.

TABLE 4 primer binding reaction conditions

Temperature (temperature)	Time
		Heat cover (80 ℃ C.)	Opening the valve
75℃	For 5 minutes
		25℃	For 10 minutes
25℃	Holding

2.3 after completion of the reaction, the reagents shown in Table 5 were added to the reaction solution.

TABLE 5 primer extension reaction System

Component (A)	Single reaction volume (μL)
		DNA (step 2.2 step product)	21.5
10mMdNTP	0.75
		10xT4PNK buffer	5
3U/. Mu.LT 4DNA polymerase	1
		ddH2O	21.75
Total volume of	50

T4 DNAPolymerase of this example was purchased from Enzymatics;10X T4 PNKBuffer from Enzymatics.

2.4 the reaction mixture was mixed by shaking with a vortex shaker, centrifuged instantaneously with a mini centrifuge, and placed in a PCR apparatus for reaction under the conditions shown in Table 6.

TABLE 6 primer extension reaction conditions

Temperature (temperature)	Time
		Heat cover (70 ℃ C.)	Opening the valve
30℃	15 minutes
		65℃	15 minutes
25℃	Holding

3. And (3) joint connection:

3.1 after thawing the reagents, the reagents shown in Table 7 were added sequentially to the reaction products with reference to the following reaction system, using the linker sequence: top 5'-3': CAAGGTTCGAATCGGCCTCCGACTTNN (SEQ ID NO: 2); and bottom5'-3': NNAAGTCGGAGGCCAAGCGGTCTTAGGAAGAC (SEQ ID NO: 3).

TABLE 7 ligation reaction System

Component (A)	Single reaction volume (μL)
		DNA (step 1.3 step product)	50
15 mu M joint	1
		2x quick connect buffer	44
600U/. Mu.LT 4DNA ligase	5
		Total volume of	100

T4 DNALigase of this example was purchased from Enzymatics;2X Rapid Ligation Buffer from Enzymatics.

3.2 the reaction mixture was mixed by shaking with a vortex oscillator, centrifuged instantaneously with a mini centrifuge, and reacted in a PCR instrument under the conditions shown in Table 8.

TABLE 8 ligation reaction conditions

Temperature (temperature)	Time
		Thermal cover	Closing
23℃	15 minutes
		4℃	Holding

Purification was performed using 60. Mu.LAxygen magnetic beads and the purified product was dissolved back in 20. Mu.L double distilled water.

4. Library amplification:

4.1 after thawing the reagents, a PCR reaction system was prepared according to the amounts of reagents shown in Table 9.

TABLE 9 library amplification reaction System

The embodiment of the inventionProAmplification Mix from the next holy organism;

4.2 mixing the reaction mixture by shaking with a vortex oscillator, instantly centrifuging with a mini centrifuge, and placing the mixture in a PCR instrument for reaction, wherein the reaction conditions are shown in Table 10.

TABLE 10 library amplification reaction conditions

Purification was performed using 45. Mu.LAxygen magnetic beads and the purified product was dissolved back in 30. Mu.LTE buffer.

5. Library sequencing and data analysis:

after DNB preparation of the library using the MGIDNB preparation kit, sequencing was performed using a DNBSEQ-T7 sequencer. Sequencing off-machine data the probe set sequences were aligned using the analytical method of the present invention.

Experimental results: quality control method of 5' -end modified probe of the invention

In this example, the method of the present invention is used to directly control the quality of the sequence of the probes, and the quality of different probe sets with modifications at the 5' end after two SPE column purifications is respectively controlled, and the sequences in the actual probe library are compared with the target probe sequences, and the results are shown in fig. 4 and table 11, where the first set of probes are purified by SPE columns, and after the columns are washed, the second set of probes are purified, and about 0.2% of the first set of probes remain in the obtained second set of probes. See table 11.

TABLE 11 detection of different batches of probe purification

Detection conditions of different batches of probe purification	First batch of probes	Second batch of probes	First batch/second batch
				Average depth (first batch probe purification)	359484.6	0	NA
Average depth (second batch probe purification)	1304.2	589717.4	0.22％

The quality control of the probe group purified by the SPE column twice can obviously show that the first group of probes remain in the second group of purified probes, and the method can effectively monitor whether the probes remain and the residual condition exist on the SPE column after the probes are purified by the SPE column, thereby implementing the quality control on the purification feasibility of the SPE column.

Example 2: quality control of probe and comparison experiment of actual capturing efficiency of probe

This example demonstrates the comparative quality control of 5' -end biotin-modified 17544 probe sets for tumor monitoring:

1. the quality control experimental procedure operation of the scheme of the invention is the same as that of the embodiment 1;

2. the actual hybridization capture quality control of the probe uses primary hybridization reagent (product number TC 0023) and the probe to carry out hybridization capture on the prepared humanized genome library, and the specific steps are as follows:

2.1 library evaporation to dryness and concentration:

1000ng of the human genome library was prepared and the evaporated to dryness concentrate system was placed in a 1.5ml centrifuge tube according to Table 12.

TABLE 12 library evaporation to dryness concentration System

Component (A)	Volume (mu L)
		Human genome library	10
Cot-1DNA(1μg/μL)	5
		MGI sealer (6. Mu.g/. Mu.L)	2
Total volume of	15

Evaporating the prepared mixed solution by using a vacuum concentrator at 60 ℃.

2.2 hybridization reactions

Hybridization buffer solutions were prepared according to Table 12, and after mixing the prepared hybridization buffer solutions, the mixture was added to the nucleic acid to be hybridized after evaporation to dryness in step 2.1, and after incubation at room temperature for 10 minutes, the solution was transferred to a 0.2mL centrifuge tube, and 4. Mu.L of probe to be quality controlled was added. After mixing, hybridization reactions were performed according to the procedure described in Table 13: 95 ℃,10 minutes, 65 ℃ and 16 hours.

TABLE 13 hybridization reaction System

Component (A)	Volume (mu L)
		Boke2X hybridization buffer	8.5
Boke hybrid enhancers	2.7
		Nuclease-free water	1.8
Total volume of	13

2.3 magnetic bead Capture

2.3.1. After 10. Mu.LM 270 strepitavidin beads were washed with 1X Beads Wash Buffer, the supernatant was discarded.

2.3.2. 17uL of hybridization reaction solution was transferred to magnetic beads, and after mixing, incubated at 65℃for 45 minutes.

2.3.3. Vortex mixing for 3sec every 15 min to ensure that the beads are in suspension.

2.4 post hybridization Capture washes

2.4.1 Hot cleaning (65 ℃ C.)

100. Mu.L of 1XWash Buffer 1 preheated at 65℃was added to the product of step 2.3.3, after mixing, all the liquid was transferred to a 1.5mL centrifuge tube and placed on a magnetic rack, and the supernatant removed.

After adding 200. Mu.L of 1XWash Buffer S preheated at 65℃to the centrifuge tube, the supernatant was removed by vortexing and incubating for 5 minutes with shaking at 1200r at 65 ℃. After washing once with 200. Mu.L of 1XWash Buffer S preheated again, the supernatant was removed.

2.4.2 cleaning at Room temperature

To the heat washed product was added 180. Mu.L of 1XWash Buffer I, resuspended by vortexing, and the supernatant was removed after transient centrifugation. Subsequently, 180. Mu.L of 1XWash Buffer II was added, resuspended by vortexing, and the supernatant was removed after transient centrifugation. 180. Mu.L of 1X Wash Buffer III was added, resuspended by vortexing, and the supernatant removed after transient centrifugation. Finally, the washed magnetic bead product was resuspended using 20. Mu. LNuclease-free water.

2.5. Amplification after hybridization:

to the system after the completion of the washing in step 2.4.2, 25. Mu.L of 2X Kapa Hotstart Ready Mix and 2.5. Mu. L F/R Index primer (20. Mu.M) were added, and after thoroughly mixing, the mixture was subjected to cyclic amplification: 98 ℃ for 1 minute; 98℃10s,60℃30s,72℃30s (15 cycles); 72 ℃ for 5 minutes; maintained at 4 ℃. After the amplification, 50. Mu.L of magnetic beads are added into the amplified product for magnetic bead purification, and the purified product is dissolved in 20. Mu.L of TE Buffer.

2.6. Sequencing and data analysis

After DNB preparation of the library using the MGIDNB preparation kit, the sequencing library was sequenced using a DNBSEQ-T7 sequencer. Sequencing results of the library constructed by the method are compared and analyzed by using the bioinformatics method (see figure 3) to obtain the sequence number of single probe successful comparison. On the other hand, genome comparison analysis is performed on sequencing data of the target capture library obtained in the actual use process of the probe, the depth and the average depth of the probe capture area are calculated, and a depth coefficient (depth of a certain probe capture area/average depth×100) is obtained.

Experimental results: the number of sequences of the probe work comparison in the method of the invention is compared with the depth coefficient obtained in the actual capturing application of the probe, and the obtained trend comparison result is shown in figure 5.

The results show that compared with probes with lower sequence numbers, the depth coefficient of the hybridization capture of the probes is lower in practice (the capture effect of the probes can be reflected by the depth coefficient of the probe region, and the probes with low depth coefficient indicate that the capture effect of the probes is poor), and the results of the two quality control methods basically accord with each other, so that the method can effectively control the quality of the probes in the probe group.

Example 3: effect of different poly tails on the construction of the library of the present invention

Example 3 the experimental procedure was substantially identical to that of example 1, except that dGTP was used in the experimental procedure to be changed to dATP, dTTP and dCTP, respectively, and the tail poly (C) 8 of the extension primer was changed to poly (T), respectively, in the experimental procedure ₈ ，poly(A) ₈ And poly (G) ₈ 。

Experimental results: the comparison of the library fragments obtained by fragment quality control of the library using a Qsep capillary electrophoresis apparatus (Qsep 100, beckmann) is shown in FIG. 6. The main peaks 20 and 1000 in FIG. 6 are standard molecular weight nucleic acids (markers) of 20bp and 1000bp, respectively.

The results show that the base at the tail of the four different types can be applied to the library construction method of the invention, and can be used for quality control of the primer probe of the invention. Wherein the library fragment ranges for the A-and T-tails are broad, mainly due to the fact that the length of the tailing is not easily controlled. Library fragments of the G tail and the C tail are more concentrated in expression, and are more beneficial to controlling the tail adding length of the scheme, wherein the G tail expression is relatively better and is more suitable for the scheme.

The technical scheme of the invention is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the invention fall within the protection scope of the invention.

Claims

1. A method of constructing a DNA library, comprising the steps of:

2. The method according to claim 1, wherein in step 1) the single stranded DNA molecule is selected from the group consisting of 10 to 150nt, preferably from 20 to 120nt in length; and/or the number of the groups of groups,

the 5' end of the single stranded DNA molecule comprises a non-phosphorylated modification, preferably selected from the group consisting of amino modification, diphenylcyclooctyne modification, biotin modification, desthiobiotin, thiol modification, dithiol modification, ferrocene modification, tetrahydrofuran modification, thio modification, phosphorothioate modification, digoxin modification, cholesterol modification, azobenzene modification, methylene blue modification, binaphthyl modification or ruthenium modification, more preferably selected from the group consisting of thio modification, amino modification, thiol modification or biotin modification.

3. The method according to claim 1, wherein in said step 1), poly (X) is reacted with a terminal transferase _n Tail addition to the 3' end of the single stranded DNA molecule;

preferably, said n represents the number of bases X, said n being selected from integers from 6 to 12, preferably from 6, 7, 8, 9, 10, 11 or 12;

preferably, X is selected from any one of bases A, T, C or G, preferably from base C or G, more preferably from base G.

4. The method according to claim 1, wherein in said step 2), the 3' end of said extension primer comprises (Y) an m base unit, wherein base Y is complementary to base X, wherein m is selected from integers from 4 to 12;

preferably, the 5' end of the (Y) m base unit of the extension primer further comprises a primer other than poly (X) _n One or more bases complementary to the tail;

preferably, the length of the extension primer is selected from 20 to 40nt.

5. The method according to claim 1, wherein in the step 3), amplification is performed using an amplification primer,

preferably, the amplification primers comprise a first amplification primer and a second amplification primer;

preferably, the length of the first and second amplification primer sequences is each independently selected from 20 to 40nt.

6. The method according to claim 1, wherein in the step 3), the double-stranded adaptor comprises: a first adaptor single strand to be ligated to the 5' end of the first single strand DNA molecule; and a second adaptor single strand to be ligated to the 3' end of the second single strand DNA molecule;

preferably, the 3 'end of the first adaptor single strand and the 5' end of the second adaptor single strand are blunt ends, and the double-stranded adaptor is connected to the double-stranded DNA molecule through the blunt ends.

7. The method according to claim 6, characterized in that in step 3) the blunt end of the double-stranded adaptor has one or more random complementary base pairs, preferably 1-10 random complementary base pairs, more preferably the blunt end of the double-stranded adaptor has 2-8 random complementary base pairs, even more preferably the blunt end of the double-stranded adaptor has 1-4 random complementary base pairs, even more preferably the blunt end of the double-stranded adaptor has 1 random complementary base pair; and, a step of, in the first embodiment,

the 5 'end of the second adaptor single strand is linked to the 3' end of the second single strand DNA molecule, and the 3 'end of the first adaptor single strand is not linked to the modified 5' end of the first single strand DNA molecule.

8. The method according to claim 1, further comprising the step of sequencing and/or bioinformatic analysis of the sequencing library obtained in step 3).

9. An apparatus for constructing a DNA library, the apparatus comprising:

an extension unit that obtains a duplex based on the first single stranded DNA moleculeA strand DNA molecule, wherein said double stranded DNA molecule comprises said first single stranded DNA molecule and a second single stranded DNA molecule complementary to said first single stranded DNA molecule, said extension primer being capable of annealing to poly (X) at the 3' end of said first single stranded DNA molecule _n On the tail;

10. Use of the method of claims 1-8 and/or the device of claim 9 for probe or primer control; preferably, the method and/or the device is used for sequencing one or more probes or primers and/or semi-quantifying or quantifying one or more probes or primers.