WO2020118596A1

WO2020118596A1 - Tag sequence detection method

Info

Publication number: WO2020118596A1
Application number: PCT/CN2018/120820
Authority: WO
Inventors: 赵霞; 赵静; 章文蔚; 陈奥
Original assignee: 深圳华大生命科学研究院
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-06-18
Also published as: CN113168889B; CN113168889A

Abstract

A tag sequence detection method, comprising: performing matching library establishment on a group of template sequences and a group of tag sequences to be detected, and obtaining a group of tag libraries, wherein the template sequences are different amplified or artificially-synthesized gene sequences, different template sequences are different from each other in a sequence, and the template sequences and the tag sequences to be detected have a one-to-one or many-to-one correspondence; perform sequencing on the tag libraries and obtaining a sequencing and length-reading sequence of each tag library; comparing the sequencing and length-reading sequence with all tag sequences to be detected, and collecting statistics about the number of sequencing and length-reading sequences on each tag sequence to be detected in a comparison manner; and calculating, according to the number, a contamination rate of other tag sequences contained in each tag sequence to be detected. According to the method, because the template sequences are different from each other in the sequence, there is no need to worry about a condition of the inability to distinguish different templates due to sequence errors caused by multiple PCR amplifications.

Description

Label sequence detection method

Technical field

The invention relates to the field of sequencing technology, in particular to a method for detecting a tag sequence.

Background technique

In high-throughput second-generation sequencing, multiple libraries can be mixed together for sequencing by adding a library tag of known sequence to the library, and later the information of different samples is separated according to the library tag.

There are many types of tag libraries. The first is a single-tagged library, which can be added to the library by adding a tag sequence to a linker or a tag sequence to a PCR primer, and then adding the tag sequence to the library through linker ligation or PCR amplification, respectively. The second type is the dual-tag library. There are three ways to add the dual-tag sequence to the library. The first is to add the first tag sequence to the linker, add the second tag sequence to the PCR primer, and add the two tags to the library in sequence by adding the linker and PCR. The second is the F primer (Forward amplification primer, Forward primer) and R primer (Reverse amplification primer, Reverse) of the two tag sequences added to the PCR primers (Polymerase Chain Reaction Primer, Polymerase Chain Reaction). primer), two tags are added to the library simultaneously by PCR amplification. The third is that two tags are added to the top chain and bottom chain of the linker sequence, and the two tags are added to the library at the same time by adding a linker. In summary, the oligo constructing the tag library can be divided into two categories: tag adaptor and tag primer.

Although theoretically, a label in a pooling library (a library in which a single library of different library labels are mixed together) corresponds to only one sample, there are actually many reasons why a label corresponds to 2 or even multiple samples , Causing cross-contamination between sequencing data. One of the reasons is the label contamination introduced during the synthesis and purification of library-labeled oligonucleotides (Barcode). For this reason, in the standard NGS (Next-Generation, Sequencing) process, contamination detection of library-tagged oligonucleotides is an indispensable important link. Before the oligonucleotides are put into production, they pass the quality inspection Eliminate the use of synthetic contaminated oligonucleotides, which can reduce the probability of label contamination from the source.

At present, the standard oligonucleotide QC (Quality Control) methods of oligonucleotide synthesis suppliers mainly include OD (optical density, optical density) value detection, chromatographic detection, mass spectrometry detection, these methods can not detect oligonucleosides The accuracy of acid synthesis bases, such as planning to synthesize AATTCCGGA, and 1% of the actually synthesized oligonucleotides are synthesized into AATTCCGGT, and 1% are synthesized into GATTCCGGA. If the wrong synthetic base is located in the sequencing primer The 3'end will directly affect the sequencing primer hybridization success rate and sequencing success rate, resulting in a reduction in the number of effective sequencing templates or sequencing errors.

As shown in Figure 1-4, the method used to detect the oligonucleotide contamination rate in the prior art has been based on a piece of amplified 180bp DNA with a 10bp sample index (index) sequence amplified from plasmid DNA as The template is matched with the oligonucleotide to be tested to build a NGS library, and NGS sequencing is used to distinguish the number of reads of library tag sequences that match the sample tag sequence, thereby calculating the contamination rate of other tags that do not match. However, this method has a different base sequence of only 10bp in template DNA. If read2 sequencing is to be performed, the quality of the second-strand sequencing cannot be guaranteed due to the imbalance of the bases, so the quality of the second-strand sequencing cannot be used to indirectly detect the oligonucleosides. The accuracy of the synthesis of the base of the hybridization sequence of the acid and the double-stranded primer.

Detection of the accuracy of oligonucleotide base synthesis, mainly by PCR amplification of the full length of the oligonucleotide sequence and adding A base to do TA cloning, monoclonal library constructed by Sanger sequencing oligonucleotide, generally each Oligonucleotides need to be sequenced with at least 100 monoclonal libraries.

In addition, Eurofins uses mass spectrometry or liquid chromatography to detect the quality of oligonucleotide synthesis. These methods can only detect the purity of oligonucleotides, not the accuracy of synthetic bases. In 2014, Quail and others reported on the SASI-Seq (Sample Assurance, Spike-In Sequencing) method, using 11 configuration Y-specific primers with nt tags to amplify 384 tagged Spike products from PhiX174 for quality control. Control cross-contamination of mixed libraries and prevent confusion between samples. However, this method has not been used for quality control of the tag synthesis pollution rate of the tag connector and the accuracy of base synthesis.

Before the synthesis of oligonucleotides before leaving the factory, there is generally a quality control process for the purity and quantification of the synthesis, but there is no quality control for the cross-contamination between the oligonucleotides produced during the synthesis process and the accuracy of the synthetic bases. After obtaining synthetic oligonucleotides, NGS users usually carry out contamination rate tests and synthetic base quality tests on oligonucleotides (especially tag adapters or tag primers). Generally, the two tests cannot be completed in the same experiment .

Summary of the invention

The invention provides a method for detecting a tag sequence. The method includes:

A set of template sequences are matched with a set of tag sequences to be tested to build a library to obtain a set of tag libraries. The above template sequences are different gene sequences amplified or artificially synthesized. Different template sequences are different in sequence from each other. The above-mentioned tag sequence to be tested has a one-to-one or many-to-one correspondence;

Sequencing the above tag library to obtain the sequence read long sequence of each tag library;

Compare the sequenced read long sequences of each tag library with all the above-mentioned test tag sequences, and count the number of sequenced read long sequences aligned to each test tag sequence;

Calculate the contamination rate of each tag sequence containing other tag sequences according to the above quantity;

In a preferred embodiment, the above template sequence is a different gene sequence amplified from genomic DNA.

In a preferred embodiment, the number of the above set of template sequences is N, N=4X, and X is an integer greater than or equal to 1.

In a preferred embodiment, the number of the above-mentioned set of template sequences is 96.

In a preferred embodiment, the size of the above template sequence is 50-1000bp, preferably 180bp.

In a preferred embodiment, all the above template sequences are of equal size.

In a preferred embodiment, the above template sequence satisfies that the base sequence ratio of A, T, C, and G at the 5'end and the 3'end is the same as the sequence length of the sequencing read length. The proportion of bases with a balanced base signal.

In a preferred embodiment, the 5'and 3'ends of the above template sequence have the same base sequence range as the read length of the sequencing is in the range of 20bp to 200bp at the 5'and 3'ends, preferably within 30bp.

In a preferred embodiment, the above base ratio is 10% to 30%, preferably 15%.

In a preferred embodiment, the number of the template sequences is N times the number of the label sequences to be tested, N is an integer greater than or equal to 1, and the template sequence is divided into subgroups corresponding to the number of the label sequences to be tested, each The above-mentioned template sequence of the subgroup contains N template sequences.

In a preferred embodiment, the tag sequence to be tested is a tag adapter and/or tag primer.

In a preferred embodiment, the tag sequence to be tested is a single tag linker.

In a preferred embodiment, the matching library construction includes: linking the template sequence and the single-tag adaptor in a one-to-one or many-to-one correspondence, and then performing PCR amplification with universal primers to obtain the Single-label library.

In a preferred embodiment, the tag sequence to be tested is a single tag primer.

In a preferred embodiment, the above matching library construction includes: connecting the above template sequence with a universal linker to obtain a ligation product, and then performing PCR amplification with a single-label primer in a one-to-one or many-to-one correspondence to obtain a computer-based Single-label library.

In a preferred embodiment, the tag sequence to be tested is a double tag sequence composed of a tag adapter and a tag primer.

In a preferred embodiment, the matching library construction includes: linking the template sequence to the tag adapter according to a one-to-one or many-to-one correspondence, and then corresponding to the tag primer according to one-to-one or many-to-one correspondence PCR amplification to obtain a dual-tag library for computer use.

In a preferred embodiment, the tag sequence to be tested is a double tag primer composed of two tag primers.

In a preferred embodiment, the above matching library construction includes: connecting the above template sequence with a universal linker to obtain a ligation product, and then performing PCR amplification with the above-mentioned double-tag primers in a one-to-one or many-to-one correspondence to obtain a computer Dual-label library.

In a preferred embodiment, the above-mentioned tag sequence to be tested is a double tag connector composed of two tag connectors.

In a preferred embodiment, the matching library construction includes: connecting the template sequence and the double-tag adaptor in a one-to-one or many-to-one correspondence to obtain a ligation product, and then PCR amplifying with a universal primer to obtain Double-tagged library.

In a preferred embodiment, the above sequencing is double-end sequencing.

In a preferred embodiment, the above sequencing is PE30+10 sequencing.

In a preferred embodiment, all the tag sequences to be tested are all tag sequences synthesized in the same batch.

In a preferred embodiment, the above method further includes obtaining a second strand (read 2) sequencing quality evaluation result based on the above sequencing read long sequence; and

Optionally, the 5'base synthesis quality of the tag library or the mixed tag library of the tag library and the second-strand sequencing-related linker is indirectly judged by the evaluation result.

In a preferred embodiment, the above template sequence is obtained by amplifying the human genome with 96 primer pairs shown in SEQ ID NO: 1 to 192.

The invention uses a set of template sequences to match the tag sequence to be tested and build a library to detect the tag sequence. Once the template sequence is successfully prepared, it can be amplified multiple times and then amplified, saving template sequence preparation costs; The sequences are different from each other, and there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications.

In addition, the preferred technical solution indirectly detects the quality of the 5'base synthesis of the tag linker or the linker related to the second-strand sequencing by detecting the quality of the second-strand sequencing. The simultaneous detection of label contamination rate and oligonucleotide synthesis quality related to sequencing primers can be achieved through an experiment, saving quality control labor and cost.

BRIEF DESCRIPTION

1 is a flow chart of preparation of template DNA in the prior art;

2 is a schematic diagram of a database building process in the quality control method of a single-label joint in the prior art;

FIG. 3 is a schematic diagram of the sequencing principle in the quality control method of a single-tag joint in the prior art;

4 is a schematic diagram of the calculation of the pollution rate in the quality control method of a single-label joint in the prior art;

5 is a schematic diagram of a template DNA preparation method in an embodiment of the present invention;

FIG. 6 is a schematic diagram of a library building process of a quality inspection single label connector according to an embodiment of the present invention;

7 is a schematic diagram of a library construction process of a quality inspection single label primer in an embodiment of the present invention;

FIG. 8 is a schematic diagram of the sequencing principle of the single label sequence of the quality inspection in the embodiment of the present invention;

FIG. 9 is the contamination rate after the library sequencing of the quality inspection single tag sequence and the ESR of the second-strand sequencing quality evaluation in the embodiment of the present invention; FIG.

10 is a schematic diagram of a library construction process of a quality inspection tag adapter + tag primer in an embodiment of the present invention;

FIG. 11 is a schematic diagram of a library construction process of a double tag primer for quality inspection in an embodiment of the present invention;

FIG. 12 is a schematic diagram of the process of building a library for a quality-inspected double-tag connector according to an embodiment of the present invention;

13 is a schematic diagram of a library sequencing principle of a quality-checked double-tag sequence in an embodiment of the present invention;

FIG. 14 is the contamination rate after the library sequencing of the quality-checked double-tag sequence and the ESR of the second-strand sequencing quality evaluation in the embodiment of the present invention; FIG.

15 is a linker sequence statistical result obtained after sequencing the library SE200 of the ESR second-strand non-elevated and ESR second-strand elevated main band of 160 bp in an embodiment of the present invention;

16 is a graph of ESR statistical results of a single-label mixed library constructed by different manufacturers and different batches of label adapters in an embodiment of the present invention;

17 is a diagram of amplifying products identified by agarose gel electrophoresis in an embodiment of the present invention;

18 is an electrophoresis diagram of a 180bp specific product repurification amplification product in an embodiment of the present invention;

19 is agarose gel electrophoresis of PCR products in an embodiment of the present invention;

20 is a graph of TBU-PAGE gel electrophoresis results of the cyclization product in the embodiment of the present invention;

21 is a graph of ESR results of the mixed library in the embodiment of the present invention.

FIG. 22 is a graph showing the results of ESR splitting of a library constructed by all 8 tag adapters to be tested in the embodiment of the present invention.

detailed description

The present invention will be further described in detail below through specific embodiments and drawings. Corresponding similar element labels are used for similar elements in different embodiments. In the following embodiments, many details are described to enable the present invention to be better understood. However, those skilled in the art can easily recognize that some of the features can be omitted in different situations, or can be replaced by other materials and methods.

In addition, the features, operations, or characteristics described in the specification may be combined in any appropriate manner to form various embodiments. At the same time, the steps or actions in the method description can also be sequentially replaced or adjusted in a manner obvious to those skilled in the art. Therefore, the various orders in the specification and the drawings are only for clearly describing a certain embodiment, and do not mean a necessary order, unless otherwise stated that a certain order must be followed.

The serial numbers themselves, such as "first", "second", etc., are used to distinguish the described objects, and do not have any order or technical meaning.

definition

A tag sequence oligonucleotide (barcode), also referred to herein as a "tag sequence" (barcode), or "oligonucleotide" (oligo), refers to a library (such as a sequencing library) used in the construction of differentiating Sample source and/or molecular source functional nucleotide sequence, including tag adaptor and tag primer. These tag sequences are obtained by artificial synthesis.

Tag linker, a tag-containing linker, refers to a linker sequence used in the construction of a library (such as a sequencing library) that has the function of distinguishing different sample sources and/or molecular sources, including a single tag linker and a double tag linker, where the double tag linker consists of Consists of a single label connector.

Tag primers, tag-containing primers, refer to the primer sequences used in the construction of libraries (such as sequencing libraries) to distinguish different sample sources and/or molecular sources, including single tag primers and double tag primers, where the double tag primer consists of two Consists of a single tag primer.

The tag library refers to a library containing the tag sequence of the present invention obtained by a library construction method, especially a sequencing library. Tag libraries include single tag libraries and dual tag libraries. Among them, the single tag library contains a single tag adaptor or a single tag primer. The dual tag library contains dual tag adaptors or dual tag primers.

The template sequence refers to a sequence used for matching and building a library with a tag sequence in the present invention, and different template sequences are different from each other in sequence. In a preferred embodiment, the template sequence is a different gene sequence amplified from genomic DNA. Such a template sequence, since the sequence at each position between template sequences is different, there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications.

Sequencing refers to the method for determining the nucleic acid sequence. In the present invention, it specifically refers to the method for determining the sequence of the tag library, which includes single-end sequencing and double-end sequencing. The present invention prefers double-end sequencing, especially PE30+10 sequencing, which includes both ends 30bp sequencing length and 10bp tag sequence length sequencing strategy.

The invention can simultaneously complete the quality control of the tag sequence oligonucleotide synthesis pollution rate and the indirect quality control of the oligonucleotide base synthesis quality related to the sequencing quality in one experiment, which is the tag sequence oligonucleotide The quality inspection provides a new method. The detection method of the present invention is applicable not only to single-tag sequence oligonucleotides, but also to double-tag sequence oligonucleotides.

As shown in FIG. 5, in one embodiment of the present invention, N (N=4X, X≥1) (such as 96) fixed base numbers (including 50) are amplified from genomic DNA of human or any other species Any fragment size between -1000bp, preferably the sequencer sequence quality is relatively better, preferably 180bp) gene sequence, the gene sequence needs to meet the 5'and 3'end of the same base sequence read sequencing length Within the range (such as the range of 20bp to 200bp at the 5'and 3'ends, preferably within 30bp), the proportion of any base of A/T/C/G is at least not less than a set value, so that A/T/C can be guaranteed /G base signal balance, the set value may be, for example, a certain percentage value in the range of 10% to 30%, for example, 15%, etc., to ensure the base balance of the insert sequence of the tag sequence sequencing (such as PE30+10) , That is, to ensure that the quality of sequencing is not affected by the imbalance of template bases. The PCR purified product recovered by PCR gel cutting is used as a seed PCR template, and then amplified (PCR on PCR) to obtain a large amount of test template DNA.

When there are N tag sequence oligonucleotides to be tested, the test template DNAs are all divided into N groups, and then the grouped template DNAs are mixed separately, and matched with the tag sequence oligonucleotides to be tested to build a library. For example, when there are 8 tag adapters that need to be prepared with 96 template DNAs for quality inspection, then 1-96 template DNAs are divided into 8 groups, and the mass ratio of template DNAs 1-12 is mixed with the tag sequence 1( Barcode1) for matching and library building, after mixing equal proportions of template DNA No. 13-24 with tag sequence 2 (Barcode2), and so on, and mixing with equal sequence proportions of template DNA No. 85-96 and tag sequence 8 ( Barcode8) to match and build the library. PE30+10 sequencing is used to distinguish the number of reads of the tag sequence that matches the template DNA, so as to calculate the contamination rate of other tag sequences that do not match, and the quality of the second-strand sequencing, such as the second-strand ESR (Effective) in DNB sequencing Spot Rate, the ratio of effective sequencing sites) to improve the situation, to indirectly detect the base synthesis accuracy of the hybridization sequence of the tag sequence oligonucleotide and the double-stranded primer. For another example, when there are 96 tag adapters that need to be prepared with 96 template DNA fragments for quality inspection, then a DNA template corresponding to a tag adapter is used for matching and library construction, and PE 30+10 sequencing is used to carry out tag sequence contamination. Indirect detection of rate and tag sequence oligonucleotide synthesis accuracy. The tag sequence oligonucleotide may be a single tag library-building oligonucleotide or a double tag library-building oligonucleotide. It should be noted that, in the case where the tag sequence to be tested is less than the test template DNA (for example, 96), theoretically only part of the test template DNA can be used, however, considering the need to ensure that the template sequence is in the selected sequencing strategy (For example, PE30 sequencing strategy) the base balance of each sequencing position, using all the test template DNA (for example, 96) is beneficial to ensure this.

When the tag sequence oligonucleotide is a single tag library-building oligonucleotide, that is, a single tag adaptor or a single tag primer, its quality inspection method is shown in Figure 6-9. As shown in Figure 6, when there are N (N = 4X, X ≥ 1) single-tag adaptors (such as tag 1 to tag N adaptor), the prepared N template DNA fragments (such as template DNA A to template DNA N , Or Gene A fragment to Gene N fragment) for quality inspection, the adapters are connected in a one-to-one correspondence with DNA and tag adapters, and then PCR amplification is performed with universal primers to obtain a single-tag library for computer use. Alternatively, as shown in FIG. 7, when there are N (N=4X, X≥1) single-label primers (such as label 1 primer to label N primer), the prepared N template DNA fragments (such as template DNA to A to Template DNA (N fragments, or gene A fragments to gene N fragments) for quality inspection, PCR amplification is performed in a one-to-one correspondence with the DNA ligation products and tag primers after the DNA template is connected to the universal adapter to obtain the single Tag library.

As shown in Fig. 8, after equal mixing of each library, the sequencing read length of each template DNA (such as gene A fragment to gene N fragment) obtained by PE30+10 sequencing corresponds to different tags (such as tag 1 to tag N ), such as 4995, 4, X, 1, 0, 4998, Y, 2, 2, 8, 8, Z, 4990 and other values in Figure 9. According to the number of read lengths and the label pollution rate formula, the pollution rate of each label adapter/primer containing other label adapters/primers can be calculated. As shown in Figure 9, the pollution rate of the label 1 adapter is (4+X+1)/ (4+X+1+4995)*100%, the pollution rate of the label 2 connector is (0+Y+2)/(0+Y+2+4998)*100%, the pollution rate of the label N connector is (2 +8+Z)/(2+8+Z+4990)*100%. At the same time, through the off-board report, the sequencing quality evaluation results of the second strand (read2) (as shown in Figure 9 ESR results) can be obtained, and the 5'end bases of the hybrid single-tagged library and the second strand sequencing related linkers can be indirectly judged by the evaluation results Synthetic quality. Further, through the split analysis of the off-line data, the sequencing quality evaluation result of each single tag (not shown in the figure) can be obtained.

When the tag sequence oligonucleotide is a double tag library-building oligonucleotide, that is, a tag adaptor + tag primer, a double tag primer or a double tag adaptor, the quality inspection method is shown in Figure 10-14. As shown in Figure 10, when there are N (N = 4X, X ≥ 1) tag adapters (such as tag 1 to tag N adapter) and N tag primers (such as tag 1 to tag N primer), the prepared N When a template DNA fragment (such as template DNA A to template DNA N, or gene A to gene N) is subjected to quality inspection, the adapters are connected in a one-to-one correspondence with DNA and tag adapters, and then one-to-one correspondence with tag primers PCR was performed to obtain a dual-tag library for computer use. As shown in Figure 11, when there are N (N = 4X, X ≥ 1) tag primers F (such as tag 1 to tag N primer F) and N tag primers R (such as tag 1 to tag N primer R) need to be used When the prepared N template DNA fragments (such as template DNA A to template DNA N or gene A to gene N fragments) are subjected to quality inspection, the DNA ligation products and double tags are obtained according to the DNA template and the universal adapter. PCR amplification is performed in a one-to-one correspondence with primers to obtain a dual-tag library for computer use. As shown in Figure 12, when there are N (N = 4X, X ≥ 1) tag connector top chain (top) and N tag connector bottom chain (bottom) (such as tag 1 + tag 1 to tag N + tag N connector) When the prepared N template DNA fragments (such as template DNA A to template DNA N, or gene A to gene N fragments) are required for quality inspection, they are obtained after one-to-one correspondence between the DNA template and the double-tag adaptor. The DNA ligation product and universal primers were amplified by PCR to obtain a dual-tag library for computer use.

As shown in Fig. 13, after equal mixing of each library, the sequencing read length of each template DNA (such as gene A fragment to gene N fragment) obtained by PE30+10 sequencing corresponds to different tag adapters or tag primers F Or the read length of the tag linker top chain (such as tag 1 to tag N) (such as 4995, 4, X, 1, 0, 4998, Y, 2, 2, 8, Z, 4990 in Figure 14) and different tag primers Or the read length of the tag primer R or the bottom link of the tag adaptor (as shown in Fig. 14 4990, 10, X, 0, 3, 4995, Y, 2, 3, 12, Z, 4985). The pollution rate of each label adapter or label primer F or label adapter top chain containing other label adapters or label primers F or label adapter top chain can be calculated by reading the length according to the label contamination rate formula, as shown in the first in Figure 14 In the table, the pollution rate of the tag 1 linker or tag 1 primer F or tag 1 linker top chain is (4+X+1)/(4+X+1+4995)*100%, and the tag 2 pollution rate is (0+ Y+2)/(0+Y+2+4998)*100%, the pollution rate of label N is (2+8+Z)/(2+8+Z+4990)*100%, as shown in Figure 14 In the two tables, the contamination rate of the label 1 primer or the label 1 primer R or the label 1 linker bottom chain is (10+X+0)/(10+X+0+4990)*100%, the label 2 primer or the label 2 primer The contamination rate of the bottom chain of the R or tag 2 linker is (3+Y+2)/(3+Y+2+4995)*100%, and the contamination rate of the tag N primer or tag N primer R or tag N linker bottom link (3+12+Z)/(3+12+Z+4985)*100%. At the same time, through the off-board report, the sequencing quality evaluation results of the second strand (read2) (as shown in the ESR results in the figure) can be obtained, and the 5'base synthesis of the hybrid double-tagged library and the second strand sequencing related linker can be indirectly judged by the evaluation results quality. Further, through the split analysis of the off-line data, the sequencing quality evaluation result of each single tag (not shown in the figure) can be obtained.

The characteristics of the present invention include: (1) The design of template DNA needs to follow the principle of base balance to ensure that the sequencing quality is not affected by template DNA due to base imbalance. (2) Once the template DNA is successfully prepared, it can be amplified multiple times before amplification (PCR on PCR), saving the cost of template DNA preparation. Since the sequence of each position is different between template DNAs, there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications. A small number of errors caused by PCR can be solved by proper fault tolerance. In the prior art, since only the 10 bp sequence is different, it is not suitable for template preparation after amplification (PCR on PCR). (3) If there are X template DNAs prepared, the number of label oligonucleotides (label oligo) to be detected can be detected between 4-X, and the experimental arrangement is not subject to the label oligonucleotides to be detected The effect of the number of nucleotides. (4) Innovatively invented a method for indirectly detecting the quality of 5'base synthesis of a tag connector or a connector related to second-strand sequencing by detecting the quality of second-strand sequencing. (5) Simultaneous detection of label contamination rate and oligonucleotide synthesis quality related to sequencing primers can be achieved through a quality control system, saving quality control labor and cost. (6) The method of the invention can meet the quality inspection of tag oligonucleotides constructed by various types of tag libraries, and is flexible and convenient.

In the present invention, the method of indirectly detecting the synthesis quality of the oligonucleotide hybridized with the second-strand sequencing primer by detecting the quality of the second-strand sequencing is generated after a series of test investigations. First, by measuring the linker regions of the library with no improvement (poor quality) and improved (good quality) of the ESR second strand, it was found that the ratio of the base sequence of the linker sequence of the ESR second strand not improved is completely lower than that of the ESR second strand, thus It is speculated that the quality of linker synthesis or library construction affects the correctness of the linker sequence and thus affects the sequencing quality of the second strand. As shown in FIG. 15, in one embodiment of the present invention, the statistical results of the linker sequence obtained after sequencing the library SE200 of the ESR second-strand non-elevation and the main band of the ESR second-strand promotion is 160 bp are shown. The figure shows the adapter sequence (adapterSeq ) The linker sequence obtained after sequencing the library SE200 with a main band of 160 bp. The number (number) is the number of reads corresponding to each sequence, and the percentage (percent%) is the percentage of each sequence to the total sequence. The first line of sequence is the correct linker sequence, the other line of sequence is the linker sequence containing the wrong base (base in the box in the figure), it can be seen that the ESR second strand does not increase the proportion of the correct linker sequence of the library is lower than the ESR second strand Increase the proportion of correct linker sequences in the library.

In the present invention, through a series of comparative investigation tests, it was found that the main reason why the ESR second chain is not improved (poor quality) is the influence of the joint synthesis quality. As shown in Figure 16, the above figure compares manufacturer A and manufacturer B The ESR results of the single-label 97 to single-label 104 mixed libraries of two different manufacturers show that the double-chain ESR of the single-label 97-104 connector mixed library of manufacturer A has been significantly improved (factory A in the figure 97-104 curve), while manufacturer B The single-chain 97-104 linker mixed library's second-strand ESR has not been significantly improved (factory B in the figure 97-104 curve). The following figure compares the ESR results of the single-label 49-56 connector mixed library of manufacturer B purchased from different batches (first batch, second batch, third batch). It can be seen that different batches of single-label 49-56 connector mixed The library's second-chain ESR has been significantly improved in 1 batch, and 2 batches have not been significantly improved. Therefore, it has been established whether the quality of the 5'base synthesis of the linker sequence hybridized with the second-strand sequencing primer is judged by detecting whether the second-strand ESR is improved.

The technical solutions of the present invention are described in detail below by way of examples. It should be understood that the examples are only exemplary and cannot be understood as limiting the protection scope of the present invention.

Example 1: Detection of the label contamination rate and joint synthesis quality of 8 single-label adapters of DNA nanosphere (DNB) sequencing platform

1.96 180bp template DNA preparation

1.1 Design 96 sets of specific amplification of 180bp fragments involving 23 pairs of chromosomes on the human genome, respectively from 96 genes, primers as shown in SEQ ID NO: 1 ~ 192, in which each two sequences constitute a primer to amplify a gene Yes, for example, SEQ ID NO: 1~2 are the primer pairs for amplifying the first gene, SEQ ID NO: 3~4 are the primer pairs for amplifying the second gene, and so on, SEQ ID NO: 191~ 192 is the primer pair for amplifying the 96th gene. The sequence of the amplification product is shown in SEQ ID NO: 193-288, where each sequence represents the amplification product of a gene, where SEQ ID NO: 193-240 represents the amplification product sequence of gene 1-48, SEQ ID NO : 241～288 represents the sequence of the amplification product of gene No.49-96.

1.2 Using human-derived NA12878 DNA as a template to amplify 96 180bp specific products. Among them, the amplification system and procedures are shown in Table 1 and Table 2 below.

Table 1 Amplification system

Table 2 Amplification procedures

1.3 Use agarose gel electrophoresis to identify the correctness and specificity of the amplified product fragments:

1.3.1 Preparation of 2.5% TAE agarose gel: The size of the gel depends on the number of samples to be checked, usually 2.5g of agarose is added to each 100mL 1×TAE buffer, and heated to boiling until the powder is completely dissolved; in a warm water bath After 2 minutes of intermediate cooling, add 2 μL GelStain (full-style gold), mix gently, pour into the rubber plate, put in a wide-hole rubber comb, and leave it at room temperature for 20-30 minutes until the gel solidifies before it can be used.

1.3.2 Take 10 μL of amplified DNA into the spotting plate, add 3 μL of 6× loading buffer, mix by pipetting, and spot all into the gel well.

1.3.3 Electrophoresis conditions: 150V, 30min.

1.3.4 After taking electrophoresis, take a photo and store the photo. The results of agarose gel electrophoresis to identify the amplified products are shown in Figure 17.

1.4 180bp specific product cut gel purification and recovery

1.4.1 Wash the rubber plates, rubber plate racks, combs and electrophoresis tanks used to prepare the recycled rubber with clean water, then rinse with pure water for 2-3 times, and dry them for use.

1.4.2 Preparation of 2.5% TAE agarose gel: The size of the gel depends on the number of samples to be checked, usually 2.5g of agarose (BIO-RAD Megabase Agrose) is added to 100mL of 1×TAE buffer, and heated and boiled to the powder Completely dissolved, the solution does not contain any solid insolubles; after cooling in a warm water bath for 2min, without adding any dye, pour into the rubber plate, put in a wide-hole rubber comb, and let it stand at room temperature for 20-30min until the gel is solidified. Note that there are no bubbles in the agarose solution.

1.4.3 Take 25-30 μL of DNA product amplified in the previous step to the spotting plate, add 6 μL of 6×bromophenol yellow, mix by pipetting, and then spot all into the wells.

1.4.4 Load 2μL 50bp ladder (Tiangen and NEB). The spotting holes are the two holes on the outside of the gel, away from the spotting holes of the recovered sample to avoid cross contamination.

1.4.5 Electrophoresis conditions: 100V, 2h-2.5h, bromophenol yellow can run to the bottom of the gel.

1.4.6 After electrophoresis, put agarose gel in TAE of EB dye for 30min.

1.4.7 Prepare rubber cutting supplies: UV protective glasses, headgear, shoe covers, gloves, rubber cutting blades, EP pipes and EP pipe racks for receiving recycled rubber.

1.4.8 Take a piece of clean plastic wrap or PE gloves, put the dyed gel on the dark reader (Dark Reader) and cut the glue, cut the target strip with a blade at the position corresponding to the molecular weight of 180bp, pay attention to the four sides , Change a blade for each side, the target band is 180bp, the location is between 150-200bp.

1.4.9 Put the cut rubber block into a clean EP tube, weigh and calculate the weight of the cut rubber, and make a mark.

1.4.10 Use QIAquick Gel Extraction kit for gel purification and recovery.

1.4.11 Take 1μL of the purified product for quantitative detection with Qubit HS, and record the concentration of each sample.

1.5 The 180bp product is then amplified and purified

1.5.1 Re-amplify and purify the product recovered by taking 5ng of gum. The reaction system and reaction conditions are shown in Tables 3 and 4 below:

Table 3 Amplification system

Table 4 Amplification procedures

1.5.2 Take 5μL of amplified product and run it on agarose gel for detection. The results of partial agarose gel electrophoresis to identify the amplified product are shown in Figure 18.

1.5.3 Remove the AmpureXP magnetic beads from the refrigerator at least 30 minutes in advance, equilibrate at room temperature, and mix thoroughly to use.

1.5.4 Prepare 75% ethanol, which can be stored for 3 days.

1.5.5 Purify with KingFisher, add 90μL of AmpureXP magnetic beads to each tube of sample.

1.5.6 After purification, resuspend the magnetic beads with 30μL TE and prepare for the next reaction or store at -20℃.

1.5.7 Take 1μL of the purified product for quantitative detection with Qubit HS, and record the concentration of each sample. Due to the different amplification efficiency of each set of primers, if the product yield is insufficient to build a library, repeat step 3 to amplify a sufficient amount of product. The product concentration is not less than 1.67 ng/μL, so that the product can be used multiple times. Among all 96 products, the relatively low yields are #3, 6, 11, 13, 14, 15, 88 and 97, respectively, and the amplification is that the number of cycles can be increased to 22.

2. Tag joint preparation and joint annealing operation

2.1 Oligonucleotide synthesis

Synthesize the oligonucleotide sequences shown in Table 5 below, and avoid contamination between the oligonucleotides during synthesis.

table 5

2.2 Oligonucleotide dissolution

After the oligonucleotide is synthesized, the following dissolution operation is performed:

2.2.1 Turn on the power of the ultra-clean workbench, turn on the ultraviolet lamp, and sterilize the workbench, pipette, pipette tip, and TE waiting items.

2.2.2 Checking: Check whether the base sequence on the oligonucleotide synthesis sheet and the name on the tube label are the same as ordered.

2.2.3 Centrifugation: 4℃, 12000rpm, centrifugation for 10min; pay attention to ensure that the oligonucleotide powder is gathered at the bottom of the tube. After centrifugation, gently handle it to prevent the powder from floating.

2.2.4 Turn off the ultraviolet lamp of the ultra-clean workbench, properly ventilate, and dissolve the oligonucleotide in the ultra-clean workbench:

a) Turn off the fan, carefully open the tube cover, and be careful not to let the powder float out! Open a tube of oligonucleotides, add (nmol number*10) μL of TE solution according to the number of nmol on the wall of the tube, dissolve the powder to become a mother liquor with a final concentration of 100 nmol/μL (ie 100 μM), and cap the tube tightly.

b) Turn on the fan, and after about 10 seconds of ventilation, dissolve the next tube of oligonucleotide. Before dissolving each tube of oligonucleotide, it needs to be ventilated for about 10 seconds, and then dissolve the next tube.

c) After dissolving, mix well by shaking, centrifuge briefly and leave at room temperature for 1 hour.

d) The cap of the tube is clearly marked and the concentration of the mother liquor is 100 μM.

e) Store mother liquor at -20℃.

2.3 Annealing

2.3.1 Turn on the power of the ultra-clean workbench, turn on the ultraviolet lamp, and sterilize the workbench, pipette, pipette tip, and TE waiting items.

2.3.2 After removing the oligonucleotide mother liquor from -20°C, after completely melting, shake and mix well, and quickly centrifuge. Turn off the UV lamp of the ultra-clean workbench and ventilate. In the ultra-clean workbench, according to the following table 6, take the equal volume of the mother liquid of the top chain and the bottom chain, mix with 2× joint buffer (Table 7), and place at room temperature 1 At the end of the hour, the cap is clearly marked at a concentration of 25 μM.

Table 6

组分Component	用量Dosage
顶链(100μM)Top chain (100μM)	25μL25μL
底链(100μM)Bottom chain (100μM)	25μL25μL
接头缓冲液(2×)Adapter buffer (2×)	50μL50μL
总量Total	100μL100μL

Table 7 2× Connector Buffer Formula

试剂名称Reagent name	体积volume
5M NaCl5M NaCl	4μL4μL
1M Tris HCl1M TrisHCl	4μL4μL
2mM EDTA2mM EDTA	20μL20μL
水water	172μL172μL
总量Total	200μL200μL

2.3.3 Dilute to 10μM working solution: add 100μL of 25μM adapter solution to 150μL TE buffer, mix thoroughly, and centrifuge quickly.

2.3.4 Store working fluid at -20℃.

3. Build quality inspection library for label joint quality and pollution rate

3.1 For detecting tags 501-508, 8 tag adapters, 96 specific products of 180bp are numbered according to 1-96, and are divided into 8 groups (numbers are 1-12, 13-24, 25-36, 37-48, 49 respectively) -60, 61-72, 73-84, 85-96), after the equal mass of each group is mixed, take 50ng to match the tags 501-508 in order to build the library.

3.2 Add A-tailing:

3.2.1 Take out the reagents kept at -20℃ in advance and thaw them on ice. The reagents such as buffer and ATP need to be shaken to mix well and centrifuge at low speed.

3.2.2 Prepare the terminal repair mixture about 5 minutes in advance. If the number of reactions is small, dilute dATP to 10 mM and add 1 μL to use. After the preparation of the reaction mixture is completed, it is placed at room temperature. If multiple reactions need to be prepared, proceed at 15% loss. Preparation, preparation reaction system is shown in Table 8 below:

Table 8

组分Component	用量Dosage
水water	23.9μL23.9μL
10×PNK buffer10×PNK buffer	4μL4μL
dATP(100mM)dATP(100mM)	0.1μL0.1μL
PNK(10U/μL)PNK(10U/μL)	1μL1μL
rTaq(5U/μL)rTaq(5U/μL)	1μL1μL
总量Total	30μL30μL

3.2.3 Add 30 μL of the prepared end-repair mixture to 10 μL of the previous step (the volume is insufficient to supplement with TE) 180 bp specific product, mix well, and centrifuge.

3.2.4 Place the product in the PCR instrument, the reaction conditions are as follows in Table 9:

Table 9

温度 temperature		时间time
37℃37℃	30min30min
65℃65℃		15min15min
4℃4℃	保持maintain
热盖105℃Hot cover 105℃	A

3.2.5 After the reaction, the product can be subjected to the next reaction or stored in a -20℃ refrigerator.

3.3 Connector connection:

3.3.1 Make up 40μL of the product from the previous step to 50μL, mix it with the following reaction system, and centrifuge.

3.3.2 Prepare the reaction system as shown in Table 10 below:

Table 10

Component

Dosage

加A后产物Product after adding A	50μL50μL
水water	12.2μL12.2μL
10×PNK buffer10×PNK buffer	3μL3μL
ATP(100mM)ATP (100mM)	0.8μL0.8μL
50％PEG 800050% PEG8000	12μL12μL
T4DNA连接酶(600U/μL)T4DNA ligase (600U/μL)	1μL1μL
Ad153新标签(10μM)Ad153 new label (10μM)	1μL1μL
总量Total	80μL80μL

3.3.3 Place the reaction sample in the PCR instrument for reaction. The reaction conditions are as follows in Table 11:

Table 11

温度 temperature		时间time
23℃23℃		20min20min
4℃4℃	保持maintain
热盖105℃Hot cover 105℃	A

3.3.4 Purification:

a) After the reaction is completed, add 40 μL of AmpureXP magnetic beads to 80 μL of ligation product, shake and mix, and let stand at room temperature for 10 min;

b) After instantaneous centrifugation, place it on a magnetic stand and let it stand until the liquid is clear. Carefully aspirate the supernatant;

c) Add 150 μL of 75% ethanol to the tube, let it stand for 30 s to aspirate the supernatant, repeat once, use a small-scale pipette to discard as much residual ethanol as possible, and dry at room temperature;

d) Resuspend the magnetic beads with 23μL TE, mix by shaking for 5 minutes at room temperature, then place on a magnetic stand until the liquid is clear. Carefully draw 23μL of supernatant into a PCR tube, prepare for the next reaction or store at -20℃.

3.4PCR

3.4.1 Prepare the reaction system as shown in Table 12 below:

Table 12

组分 Component	用量Dosage

Kapa 2×HotStart ReadyMixKapa 2×HotStart ReadyMix	25μL25μL
AD153-PCR2-1(20μM)AD153-PCR2-1 (20μM)	2μL2μL
AD153-PCR2-2(20μM)AD153-PCR2-2 (20μM)	2μL2μL
总量Total	29μL29μL

3.4.2 Add 29 μL of the reaction system to the 21 μL ligation product in the previous step and mix.

3.4.3 Put the reaction sample into the PCR instrument to react. The reaction conditions are as follows in Table 13:

Table 13

3.4.4 After the reaction is completed, add 50 μL of Ampure XP magnetic beads to 50 μL of PCR product for purification, and 32 μL of TE solution for re-dissolution.

3.5 Detection after PCR

3.5.1 Quantification: Take 1 μL of purified product and use Qubit HS quantitative detection, record the concentration of each sample.

3.5.2 Agarose electrophoresis detection:

Quality control standard: the main band is between 250-300bp. As shown in Figure 19, there will be some template self-linking products above the main band. Experimental verification shows that it will not affect the results, and the self-linking products will disappear after cyclization and digestion.

3.6 Heat denatured single chain separation

3.6.1 Quantify according to the concentration of 96 PCR products, each equal mass (each 1.7ng) mixed to 160ng, make up to 48μL with TE solution, add 5μL 10μM mediated sequence (splint oligo).

3.6.2 Place the sample in the PCR instrument to react, the procedure is as follows in Table 14:

Table 14

温度 temperature		时间time
95℃95℃		3min3min
4℃4℃		10min10min
4℃4℃	维持maintain
热盖105℃Hot cover 105℃	A

3.7 cyclization

3.7.1 Prepare the reaction mixture 5 minutes in advance and prepare as shown in Table 15 below:

Table 15

组分 Component		用量Dosage
10×TA buffer10×TA buffer	6μL6μL
ATP(100mM)ATP (100mM)	0.6μL0.6μL
T4DNA连接酶(600U/μL)T4DNA ligase (600U/μL)	0.2μL0.2μL
总量Total	6.8μL6.8μL

3.7.2 Add 6.8 μL of cyclization reaction mixture to the sample and mix well.

3.7.3 Place the reaction on the PCR instrument, the procedure is as follows in Table 16:

Table 16

温度 temperature		时间time
37℃37℃		30min30min
4℃4℃	保持maintain
热盖105℃Hot cover 105℃	A

3.8 Digestion

3.8.1 Prepare the reaction mixture 5 minutes in advance. ExoIII enzyme is first diluted 10 times with storage buffer, and prepared as shown in Table 17 below:

Table 17

组分Component	用量 Dosage
水water
1μL1μL
10×TA buffer10×TA buffer	0.4μL0.4μL
Exo I(20U/μL)ExoI(20U/μL)	2μL2μL
Exo III(10U/μL，10×稀释)Exo III (10U/μL, 10× dilution)	0.7μL0.7μL
总量Total	4.1μL4.1μL

3.8.2 Put it on the PCR instrument and react at 37℃ for 30min.

3.8.3 After the reaction is completed, centrifuge, add 3μL of 0.5M EDTA to each sample, mix well, and centrifuge. If you need to save, you can put the reaction-stopped sample in the -20℃ refrigerator.

3.8.4 Purification: use 168μL AmpureXP magnetic beads for purification, and 25μL TE solution for re-dissolution; prepare for the next reaction or save to -20℃ refrigerator.

3.8.5 ssCircle Quant

The linearly digested single-stranded loop products were quantified using the Qubit single-strand analysis kit (QubitssDNA Assay Kit). The buffer and dye ratio is 199:1, mix and vortex, and centrifuge to mix. Take two 190μL diluted dye working solution and add 10μL of two standard products to vortex and centrifuge to mix. Use 199μL diluted dye to work Add 1μL of sample to the solution, vortex and centrifuge to quantify by Qubit instrument

Run 6% TBU-PAGE gel to check whether the linear DNA is digested cleanly. If the gel run results are shown in lanes 1 to 3 in Figure 20, then the linear digestion is complete, as in lane 4 in Figure 20, if there is linear DNA residue, it is recommended to recirculate.

The reagents and materials used in the above examples are shown in Table 18 below:

Table 18

4. Quality inspection sequence of label joint quality and pollution rate

The mixed library was sent to the BGISEQ-500 platform for sequencing by PE30+10 strategy.

5. Information analysis

The results were analyzed using ESR (Effective Sequencing Site Ratio) and linker contamination analysis procedures.

6. Data results

6.1ESR:

Figure 21 shows the ESR results of the mixed library. It can be seen that the overall 5'end synthesis quality of the batch of 501-508 tag adapters is good, and the second strand is improved.

As shown in Figure 22, the ESR split results when constructing a library for all 8 tag adapters to be tested and taking 6 FOV data.

6.2 Connector pollution rate:

Eight sets of template DNA (1-12, 13-24, 25-36, 37-48, 49-60, 61-72, 73-84, 85-96) were counted respectively, and tags 501, 502, 503, The number of reads in 504, 505, 506, 507, and 508, and then calculate the label matching rate and label contamination rate according to the matching correspondence. The results are shown in Table 19 below. The pollution rate of the label 504 is higher than 1%, and it is judged as unqualified.

Table 19

标签编号Label number	标签匹配率Label matching rate	标签污染率Label contamination rate
501501	99.97％99.97%	0.03％0.03%
502502	99.95％99.95%	0.05％0.05%
503503	99.96％99.96%	0.04％0.04%
504504	98.70％98.70%	1.30％1.30%
505505	99.96％99.96%	0.04％0.04%
506506	99.96％99.96%	0.04％0.04%
507507	99.86％99.86%	0.14％0.14%
508508	99.97％99.97%	0.03％0.03%

The above uses specific examples to explain the present invention, which is only used to help understand the present invention, and is not intended to limit the present invention. For those skilled in the technical field to which the present invention belongs, according to the idea of the present invention, several simple deductions, modifications, or replacements can also be made.

Claims

A label sequence detection method, characterized in that the method includes:

A set of template sequences is matched with a set of tag sequences to be tested to build a library to obtain a set of tag libraries. The template sequences are different gene sequences amplified or artificially synthesized. Different template sequences are different from each other in sequence. The template The sequence has a one-to-one or many-to-one correspondence with the tag sequence to be tested;

Sequencing the tag library to obtain a sequence read long sequence of each tag library;

Compare the sequenced read long sequences of each tag library with all the tag sequences to be tested, and count the number of sequenced read long sequences aligned to each tag sequence to be tested;

The contamination rate of each tag sequence to be tested containing other tag sequences is calculated according to the quantity.
The detection method according to claim 1, wherein the template sequence is a different gene sequence amplified from genomic DNA.
The detection method according to claim 1, wherein the number of the template sequence is N, N=4X, and X is an integer greater than or equal to 1;

Preferably, the number of the template sequence is 96.
The detection method according to claim 1, wherein the size of the template sequence is 50-1000bp, preferably 180bp;

Preferably, the size of all the template sequences is equal.
The detection method according to claim 1, characterized in that the template sequence satisfies any base A, T, C, G within the same base sequence range of the 5'end and the 3'end as the sequencing read length The ratio should be at least not lower than the base ratio that can ensure the balance of the four base signals.
The detection method according to claim 5, characterized in that the 5'end and the 3'end of the template sequence have the same base sequence range as the read length of the sequencing is in the range of 20bp to 200bp at the 5'end and the 3'end, preferably Within 30bp.
The detection method according to claim 5, characterized in that the base ratio is 10% to 30%, preferably 15%.
The detection method according to claim 1, wherein the number of the template sequences is N times the number of the tag sequences to be tested, N is an integer greater than or equal to 1, and the template sequence is divided into The number of subgroups of the number of tag sequences to be tested, and the template sequence of each subgroup includes N template sequences.
The detection method according to claim 1, wherein the tag sequence to be tested is a tag adapter and/or a tag primer.
The detection method according to claim 1, wherein the tag sequence to be tested is a single tag connector.
The detection method according to claim 10, characterized in that the matching library construction comprises: connecting the template sequence and the single tag connector in a one-to-one or many-to-one correspondence, and then using a universal The primers are amplified by PCR to obtain a single-tag library for computer use.
The detection method according to claim 1, wherein the tag sequence to be tested is a single tag primer.
The detection method according to claim 12, wherein the matching library construction comprises: connecting the template sequence with a universal adaptor to obtain a ligation product, and then corresponding to a single-label primer in a one-to-one or many-to-one correspondence PCR amplification was performed to obtain a single-tag library for computer use.
The detection method according to claim 1, wherein the tag sequence to be tested is a double tag sequence composed of a tag adaptor and a tag primer.
The detection method according to claim 14, wherein the matching library construction comprises: connecting the template sequence and the tag connector in a one-to-one or many-to-one correspondence, and then connecting the template sequence The tag primers are PCR amplified in a one-to-one or many-to-one correspondence to obtain a dual-tag library for computer use.
The detection method according to claim 1, wherein the tag sequence to be tested is a double tag primer composed of two tag primers.
The detection method according to claim 16, wherein the matching library construction comprises: connecting the template sequence with a universal linker to obtain a ligation product, and then following the one-to-one or many-to-one pairing with the double tag primer PCR amplification is performed in a corresponding manner to obtain a dual-tag library for computer use.
The detection method according to claim 1, characterized in that the tag sequence to be tested is a double tag connector composed of two tag connectors.
The detection method according to claim 18, wherein the matching library construction comprises: connecting the template sequence and the double-tag adaptor in a one-to-one or many-to-one correspondence to obtain a ligation product, and then The universal primer PCR was used to amplify the dual-tag library.
The detection method according to claim 1, wherein the sequencing is double-end sequencing.
The detection method according to claim 20, wherein the sequencing is PE30+10 sequencing.
The detection method according to claim 1, wherein all the tag sequences to be tested are all tag sequences synthesized in the same batch.
The detection method according to claim 1, characterized in that the method further comprises obtaining a second strand (read2) sequencing quality evaluation result based on the sequencing read long sequence; and

Optionally, the 5'terminal base synthesis quality of the tag library or the mixed tag library of the tag library and the second-strand sequencing-related linker is indirectly judged by the evaluation result.
The detection method according to claim 1, wherein the template sequence is obtained by amplifying a human genome with 96 primer pairs shown in SEQ ID NO: 1 to 192.