WO2020118596A1 - Tag sequence detection method - Google Patents

Tag sequence detection method Download PDF

Info

Publication number
WO2020118596A1
WO2020118596A1 PCT/CN2018/120820 CN2018120820W WO2020118596A1 WO 2020118596 A1 WO2020118596 A1 WO 2020118596A1 CN 2018120820 W CN2018120820 W CN 2018120820W WO 2020118596 A1 WO2020118596 A1 WO 2020118596A1
Authority
WO
WIPO (PCT)
Prior art keywords
tag
sequence
template
sequences
detection method
Prior art date
Application number
PCT/CN2018/120820
Other languages
French (fr)
Chinese (zh)
Inventor
赵霞
赵静
章文蔚
陈奥
Original Assignee
深圳华大生命科学研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大生命科学研究院 filed Critical 深圳华大生命科学研究院
Priority to CN201880099610.8A priority Critical patent/CN113168889B/en
Priority to PCT/CN2018/120820 priority patent/WO2020118596A1/en
Publication of WO2020118596A1 publication Critical patent/WO2020118596A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • the invention relates to the field of sequencing technology, in particular to a method for detecting a tag sequence.
  • multiple libraries can be mixed together for sequencing by adding a library tag of known sequence to the library, and later the information of different samples is separated according to the library tag.
  • the first is a single-tagged library, which can be added to the library by adding a tag sequence to a linker or a tag sequence to a PCR primer, and then adding the tag sequence to the library through linker ligation or PCR amplification, respectively.
  • the second type is the dual-tag library. There are three ways to add the dual-tag sequence to the library. The first is to add the first tag sequence to the linker, add the second tag sequence to the PCR primer, and add the two tags to the library in sequence by adding the linker and PCR.
  • the second is the F primer (Forward amplification primer, Forward primer) and R primer (Reverse amplification primer, Reverse) of the two tag sequences added to the PCR primers (Polymerase Chain Reaction Primer, Polymerase Chain Reaction). primer), two tags are added to the library simultaneously by PCR amplification.
  • the third is that two tags are added to the top chain and bottom chain of the linker sequence, and the two tags are added to the library at the same time by adding a linker.
  • the oligo constructing the tag library can be divided into two categories: tag adaptor and tag primer.
  • the standard oligonucleotide QC (Quality Control) methods of oligonucleotide synthesis suppliers mainly include OD (optical density, optical density) value detection, chromatographic detection, mass spectrometry detection, these methods can not detect oligonucleosides
  • OD optical density, optical density
  • chromatographic detection mass spectrometry detection
  • mass spectrometry detection these methods can not detect oligonucleosides
  • the accuracy of acid synthesis bases such as planning to synthesize AATTCCGGA, and 1% of the actually synthesized oligonucleotides are synthesized into AATTCCGGT, and 1% are synthesized into GATTCCGGA. If the wrong synthetic base is located in the sequencing primer The 3'end will directly affect the sequencing primer hybridization success rate and sequencing success rate, resulting in a reduction in the number of effective sequencing templates or sequencing errors.
  • the method used to detect the oligonucleotide contamination rate in the prior art has been based on a piece of amplified 180bp DNA with a 10bp sample index (index) sequence amplified from plasmid DNA as The template is matched with the oligonucleotide to be tested to build a NGS library, and NGS sequencing is used to distinguish the number of reads of library tag sequences that match the sample tag sequence, thereby calculating the contamination rate of other tags that do not match.
  • this method has a different base sequence of only 10bp in template DNA.
  • the quality of the second-strand sequencing cannot be guaranteed due to the imbalance of the bases, so the quality of the second-strand sequencing cannot be used to indirectly detect the oligonucleosides.
  • oligonucleotide base synthesis mainly by PCR amplification of the full length of the oligonucleotide sequence and adding A base to do TA cloning, monoclonal library constructed by Sanger sequencing oligonucleotide, generally each Oligonucleotides need to be sequenced with at least 100 monoclonal libraries.
  • the invention provides a method for detecting a tag sequence.
  • the method includes:
  • a set of template sequences are matched with a set of tag sequences to be tested to build a library to obtain a set of tag libraries.
  • the above template sequences are different gene sequences amplified or artificially synthesized. Different template sequences are different in sequence from each other.
  • the above-mentioned tag sequence to be tested has a one-to-one or many-to-one correspondence;
  • the above template sequence is a different gene sequence amplified from genomic DNA.
  • the number of the above-mentioned set of template sequences is 96.
  • the size of the above template sequence is 50-1000bp, preferably 180bp.
  • all the above template sequences are of equal size.
  • the above template sequence satisfies that the base sequence ratio of A, T, C, and G at the 5'end and the 3'end is the same as the sequence length of the sequencing read length.
  • the proportion of bases with a balanced base signal is the same as the sequence length of the sequencing read length.
  • the 5'and 3'ends of the above template sequence have the same base sequence range as the read length of the sequencing is in the range of 20bp to 200bp at the 5'and 3'ends, preferably within 30bp.
  • the above base ratio is 10% to 30%, preferably 15%.
  • the number of the template sequences is N times the number of the label sequences to be tested, N is an integer greater than or equal to 1, and the template sequence is divided into subgroups corresponding to the number of the label sequences to be tested, each The above-mentioned template sequence of the subgroup contains N template sequences.
  • the tag sequence to be tested is a tag adapter and/or tag primer.
  • the tag sequence to be tested is a single tag linker.
  • the matching library construction includes: linking the template sequence and the single-tag adaptor in a one-to-one or many-to-one correspondence, and then performing PCR amplification with universal primers to obtain the Single-label library.
  • the tag sequence to be tested is a single tag primer.
  • the above matching library construction includes: connecting the above template sequence with a universal linker to obtain a ligation product, and then performing PCR amplification with a single-label primer in a one-to-one or many-to-one correspondence to obtain a computer-based Single-label library.
  • the tag sequence to be tested is a double tag sequence composed of a tag adapter and a tag primer.
  • the matching library construction includes: linking the template sequence to the tag adapter according to a one-to-one or many-to-one correspondence, and then corresponding to the tag primer according to one-to-one or many-to-one correspondence PCR amplification to obtain a dual-tag library for computer use.
  • the tag sequence to be tested is a double tag primer composed of two tag primers.
  • the above matching library construction includes: connecting the above template sequence with a universal linker to obtain a ligation product, and then performing PCR amplification with the above-mentioned double-tag primers in a one-to-one or many-to-one correspondence to obtain a computer Dual-label library.
  • the above-mentioned tag sequence to be tested is a double tag connector composed of two tag connectors.
  • the matching library construction includes: connecting the template sequence and the double-tag adaptor in a one-to-one or many-to-one correspondence to obtain a ligation product, and then PCR amplifying with a universal primer to obtain Double-tagged library.
  • the above sequencing is double-end sequencing.
  • the above sequencing is PE30+10 sequencing.
  • all the tag sequences to be tested are all tag sequences synthesized in the same batch.
  • the above method further includes obtaining a second strand (read 2) sequencing quality evaluation result based on the above sequencing read long sequence;
  • the above template sequence is obtained by amplifying the human genome with 96 primer pairs shown in SEQ ID NO: 1 to 192.
  • the invention uses a set of template sequences to match the tag sequence to be tested and build a library to detect the tag sequence. Once the template sequence is successfully prepared, it can be amplified multiple times and then amplified, saving template sequence preparation costs; The sequences are different from each other, and there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications.
  • the preferred technical solution indirectly detects the quality of the 5'base synthesis of the tag linker or the linker related to the second-strand sequencing by detecting the quality of the second-strand sequencing.
  • the simultaneous detection of label contamination rate and oligonucleotide synthesis quality related to sequencing primers can be achieved through an experiment, saving quality control labor and cost.
  • FIG. 2 is a schematic diagram of a database building process in the quality control method of a single-label joint in the prior art
  • FIG. 3 is a schematic diagram of the sequencing principle in the quality control method of a single-tag joint in the prior art
  • FIG. 5 is a schematic diagram of a template DNA preparation method in an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a library building process of a quality inspection single label connector according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a library construction process of a quality inspection single label primer in an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of the sequencing principle of the single label sequence of the quality inspection in the embodiment of the present invention.
  • FIG. 9 is the contamination rate after the library sequencing of the quality inspection single tag sequence and the ESR of the second-strand sequencing quality evaluation in the embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a library construction process of a quality inspection tag adapter + tag primer in an embodiment of the present invention
  • FIG. 11 is a schematic diagram of a library construction process of a double tag primer for quality inspection in an embodiment of the present invention.
  • FIG. 12 is a schematic diagram of the process of building a library for a quality-inspected double-tag connector according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of a library sequencing principle of a quality-checked double-tag sequence in an embodiment of the present invention.
  • FIG. 14 is the contamination rate after the library sequencing of the quality-checked double-tag sequence and the ESR of the second-strand sequencing quality evaluation in the embodiment of the present invention.
  • 15 is a linker sequence statistical result obtained after sequencing the library SE200 of the ESR second-strand non-elevated and ESR second-strand elevated main band of 160 bp in an embodiment of the present invention
  • 16 is a graph of ESR statistical results of a single-label mixed library constructed by different manufacturers and different batches of label adapters in an embodiment of the present invention
  • 17 is a diagram of amplifying products identified by agarose gel electrophoresis in an embodiment of the present invention.
  • 18 is an electrophoresis diagram of a 180bp specific product repurification amplification product in an embodiment of the present invention.
  • 21 is a graph of ESR results of the mixed library in the embodiment of the present invention.
  • FIG. 22 is a graph showing the results of ESR splitting of a library constructed by all 8 tag adapters to be tested in the embodiment of the present invention.
  • a tag sequence oligonucleotide also referred to herein as a "tag sequence” (barcode), or “oligonucleotide” (oligo) refers to a library (such as a sequencing library) used in the construction of differentiating Sample source and/or molecular source functional nucleotide sequence, including tag adaptor and tag primer. These tag sequences are obtained by artificial synthesis.
  • Tag linker refers to a linker sequence used in the construction of a library (such as a sequencing library) that has the function of distinguishing different sample sources and/or molecular sources, including a single tag linker and a double tag linker, where the double tag linker consists of Consists of a single label connector.
  • Tag primers refer to the primer sequences used in the construction of libraries (such as sequencing libraries) to distinguish different sample sources and/or molecular sources, including single tag primers and double tag primers, where the double tag primer consists of two Consists of a single tag primer.
  • the tag library refers to a library containing the tag sequence of the present invention obtained by a library construction method, especially a sequencing library.
  • Tag libraries include single tag libraries and dual tag libraries.
  • the single tag library contains a single tag adaptor or a single tag primer.
  • the dual tag library contains dual tag adaptors or dual tag primers.
  • the template sequence refers to a sequence used for matching and building a library with a tag sequence in the present invention, and different template sequences are different from each other in sequence.
  • the template sequence is a different gene sequence amplified from genomic DNA.
  • Sequencing refers to the method for determining the nucleic acid sequence. In the present invention, it specifically refers to the method for determining the sequence of the tag library, which includes single-end sequencing and double-end sequencing.
  • the present invention prefers double-end sequencing, especially PE30+10 sequencing, which includes both ends 30bp sequencing length and 10bp tag sequence length sequencing strategy.
  • the invention can simultaneously complete the quality control of the tag sequence oligonucleotide synthesis pollution rate and the indirect quality control of the oligonucleotide base synthesis quality related to the sequencing quality in one experiment, which is the tag sequence oligonucleotide
  • the quality inspection provides a new method.
  • the detection method of the present invention is applicable not only to single-tag sequence oligonucleotides, but also to double-tag sequence oligonucleotides.
  • the proportion of any base of A/T/C/G is at least not less than a set value, so that A/T/C can be guaranteed /G base signal balance
  • the set value may be, for example, a certain percentage value in the range of 10% to 30%, for example, 15%, etc., to ensure the base balance of the insert sequence of the tag sequence sequencing (such as PE30+10) , That is, to ensure that the quality of sequencing is not affected by the imbalance of template bases.
  • the test template DNAs are all divided into N groups, and then the grouped template DNAs are mixed separately, and matched with the tag sequence oligonucleotides to be tested to build a library.
  • the test template DNAs are all divided into N groups, and then the grouped template DNAs are mixed separately, and matched with the tag sequence oligonucleotides to be tested to build a library.
  • 1-96 template DNAs are divided into 8 groups, and the mass ratio of template DNAs 1-12 is mixed with the tag sequence 1( Barcode1) for matching and library building, after mixing equal proportions of template DNA No. 13-24 with tag sequence 2 (Barcode2), and so on, and mixing with equal sequence proportions of template DNA No. 85-96 and tag sequence 8 ( Barcode8) to match and build the library.
  • PE30+10 sequencing is used to distinguish the number of reads of the tag sequence that matches the template DNA, so as to calculate the contamination rate of other tag sequences that do not match, and the quality of the second-strand sequencing, such as the second-strand ESR (Effective) in DNB sequencing Spot Rate, the ratio of effective sequencing sites) to improve the situation, to indirectly detect the base synthesis accuracy of the hybridization sequence of the tag sequence oligonucleotide and the double-stranded primer.
  • the second-strand ESR Effective in DNB sequencing Spot Rate, the ratio of effective sequencing sites
  • the tag sequence oligonucleotide may be a single tag library-building oligonucleotide or a double tag library-building oligonucleotide. It should be noted that, in the case where the tag sequence to be tested is less than the test template DNA (for example, 96), theoretically only part of the test template DNA can be used, however, considering the need to ensure that the template sequence is in the selected sequencing strategy (For example, PE30 sequencing strategy) the base balance of each sequencing position, using all the test template DNA (for example, 96) is beneficial to ensure this.
  • the test template DNA for example, 96
  • the tag sequence oligonucleotide is a single tag library-building oligonucleotide, that is, a single tag adaptor or a single tag primer
  • its quality inspection method is shown in Figure 6-9.
  • N 4X, X ⁇ 1
  • single-tag adaptors such as tag 1 to tag N adaptor
  • the prepared N template DNA fragments such as template DNA A to template DNA N , Or Gene A fragment to Gene N fragment
  • the adapters are connected in a one-to-one correspondence with DNA and tag adapters, and then PCR amplification is performed with universal primers to obtain a single-tag library for computer use.
  • PCR amplification is performed with universal primers to obtain a single-tag library for computer use.
  • single-label primers such as label 1 primer to label N primer
  • the prepared N template DNA fragments such as template DNA to A to Template DNA (N fragments, or gene A fragments to gene N fragments) for quality inspection
  • PCR amplification is performed in a one-to-one correspondence with the DNA ligation products and tag primers after the DNA template is connected to the universal adapter to obtain the single Tag library.
  • the sequencing read length of each template DNA (such as gene A fragment to gene N fragment) obtained by PE30+10 sequencing corresponds to different tags (such as tag 1 to tag N ), such as 4995, 4, X, 1, 0, 4998, Y, 2, 2, 8, 8, Z, 4990 and other values in Figure 9.
  • tags such as tag 1 to tag N
  • the pollution rate of each label adapter/primer containing other label adapters/primers can be calculated.
  • the pollution rate of the label 1 adapter is (4+X+1)/ (4+X+1+4995)*100%
  • the pollution rate of the label 2 connector is (0+Y+2)/(0+Y+2+4998)*100%
  • the pollution rate of the label N connector is (2 +8+Z)/(2+8+Z+4990)*100%.
  • the sequencing quality evaluation results of the second strand read2 (as shown in Figure 9 ESR results) can be obtained, and the 5'end bases of the hybrid single-tagged library and the second strand sequencing related linkers can be indirectly judged by the evaluation results Synthetic quality.
  • the sequencing quality evaluation result of each single tag (not shown in the figure) can be obtained.
  • the quality inspection method is shown in Figure 10-14.
  • N 4X, X ⁇ 1
  • tag adapters such as tag 1 to tag N adapter
  • N tag primers such as tag 1 to tag N primer
  • the prepared N When a template DNA fragment (such as template DNA A to template DNA N, or gene A to gene N) is subjected to quality inspection, the adapters are connected in a one-to-one correspondence with DNA and tag adapters, and then one-to-one correspondence with tag primers PCR was performed to obtain a dual-tag library for computer use.
  • N 4X, X ⁇ 1
  • tag primers F such as tag 1 to tag N primer F
  • N tag primers R such as tag 1 to tag N primer R
  • N 4X, X ⁇ 1
  • tag connector top chain (top) and N tag connector bottom chain (bottom) such as tag 1 + tag 1 to tag N + tag N connector
  • N template DNA fragments such as template DNA A to template DNA N, or gene A to gene N fragments
  • the DNA ligation product and universal primers were amplified by PCR to obtain a dual-tag library for computer use.
  • the sequencing read length of each template DNA (such as gene A fragment to gene N fragment) obtained by PE30+10 sequencing corresponds to different tag adapters or tag primers F Or the read length of the tag linker top chain (such as tag 1 to tag N) (such as 4995, 4, X, 1, 0, 4998, Y, 2, 2, 8, Z, 4990 in Figure 14) and different tag primers Or the read length of the tag primer R or the bottom link of the tag adaptor (as shown in Fig. 14 4990, 10, X, 0, 3, 4995, Y, 2, 3, 12, Z, 4985).
  • the pollution rate of each label adapter or label primer F or label adapter top chain containing other label adapters or label primers F or label adapter top chain can be calculated by reading the length according to the label contamination rate formula, as shown in the first in Figure 14
  • the pollution rate of the tag 1 linker or tag 1 primer F or tag 1 linker top chain is (4+X+1)/(4+X+1+4995)*100%
  • the tag 2 pollution rate is (0+ Y+2)/(0+Y+2+4998)*100%
  • the pollution rate of label N is (2+8+Z)/(2+8+Z+4990)*100%
  • the contamination rate of the label 1 primer or the label 1 primer R or the label 1 linker bottom chain is (10+X+0)/(10+X+0+4990)*100%
  • the contamination rate of the bottom chain of the R or tag 2 linker is (3+Y+2)/(3+Y+2+4995)*
  • the sequencing quality evaluation results of the second strand (read2) (as shown in the ESR results in the figure) can be obtained, and the 5'base synthesis of the hybrid double-tagged library and the second strand sequencing related linker can be indirectly judged by the evaluation results quality. Further, through the split analysis of the off-line data, the sequencing quality evaluation result of each single tag (not shown in the figure) can be obtained.
  • the characteristics of the present invention include: (1) The design of template DNA needs to follow the principle of base balance to ensure that the sequencing quality is not affected by template DNA due to base imbalance. (2) Once the template DNA is successfully prepared, it can be amplified multiple times before amplification (PCR on PCR), saving the cost of template DNA preparation. Since the sequence of each position is different between template DNAs, there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications. A small number of errors caused by PCR can be solved by proper fault tolerance. In the prior art, since only the 10 bp sequence is different, it is not suitable for template preparation after amplification (PCR on PCR).
  • the number of label oligonucleotides (label oligo) to be detected can be detected between 4-X, and the experimental arrangement is not subject to the label oligonucleotides to be detected The effect of the number of nucleotides.
  • (4) innovatively invented a method for indirectly detecting the quality of 5'base synthesis of a tag connector or a connector related to second-strand sequencing by detecting the quality of second-strand sequencing.
  • Simultaneous detection of label contamination rate and oligonucleotide synthesis quality related to sequencing primers can be achieved through a quality control system, saving quality control labor and cost.
  • the method of the invention can meet the quality inspection of tag oligonucleotides constructed by various types of tag libraries, and is flexible and convenient.
  • the method of indirectly detecting the synthesis quality of the oligonucleotide hybridized with the second-strand sequencing primer by detecting the quality of the second-strand sequencing is generated after a series of test investigations.
  • the statistical results of the linker sequence obtained after sequencing the library SE200 of the ESR second-strand non-elevation and the main band of the ESR second-strand promotion is 160 bp are shown.
  • the figure shows the adapter sequence (adapterSeq ) The linker sequence obtained after sequencing the library SE200 with a main band of 160 bp.
  • the number (number) is the number of reads corresponding to each sequence, and the percentage (percent%) is the percentage of each sequence to the total sequence.
  • the first line of sequence is the correct linker sequence
  • the other line of sequence is the linker sequence containing the wrong base (base in the box in the figure)
  • the ESR second strand does not increase the proportion of the correct linker sequence of the library is lower than the ESR second strand Increase the proportion of correct linker sequences in the library.
  • the following figure compares the ESR results of the single-label 49-56 connector mixed library of manufacturer B purchased from different batches (first batch, second batch, third batch). It can be seen that different batches of single-label 49-56 connector mixed
  • the library's second-chain ESR has been significantly improved in 1 batch, and 2 batches have not been significantly improved. Therefore, it has been established whether the quality of the 5'base synthesis of the linker sequence hybridized with the second-strand sequencing primer is judged by detecting whether the second-strand ESR is improved.
  • Example 1 Detection of the label contamination rate and joint synthesis quality of 8 single-label adapters of DNA nanosphere (DNB) sequencing platform
  • SEQ ID NO: 1 ⁇ 192 Design 96 sets of specific amplification of 180bp fragments involving 23 pairs of chromosomes on the human genome, respectively from 96 genes, primers as shown in SEQ ID NO: 1 ⁇ 192, in which each two sequences constitute a primer to amplify a gene Yes, for example, SEQ ID NO: 1 ⁇ 2 are the primer pairs for amplifying the first gene, SEQ ID NO: 3 ⁇ 4 are the primer pairs for amplifying the second gene, and so on, SEQ ID NO: 191 ⁇ 192 is the primer pair for amplifying the 96th gene.
  • sequence of the amplification product is shown in SEQ ID NO: 193-288, where each sequence represents the amplification product of a gene, where SEQ ID NO: 193-240 represents the amplification product sequence of gene 1-48, SEQ ID NO : 241 ⁇ 288 represents the sequence of the amplification product of gene No.49-96.
  • TAE agarose gel The size of the gel depends on the number of samples to be checked, usually 2.5g of agarose is added to each 100mL 1 ⁇ TAE buffer, and heated to boiling until the powder is completely dissolved; in a warm water bath After 2 minutes of intermediate cooling, add 2 ⁇ L GelStain (full-style gold), mix gently, pour into the rubber plate, put in a wide-hole rubber comb, and leave it at room temperature for 20-30 minutes until the gel solidifies before it can be used.
  • GelStain full-style gold
  • Electrophoresis conditions 150V, 30min.
  • TAE agarose gel The size of the gel depends on the number of samples to be checked, usually 2.5g of agarose (BIO-RAD Megabase Agrose) is added to 100mL of 1 ⁇ TAE buffer, and heated and boiled to the powder Completely dissolved, the solution does not contain any solid insolubles; after cooling in a warm water bath for 2min, without adding any dye, pour into the rubber plate, put in a wide-hole rubber comb, and let it stand at room temperature for 20-30min until the gel is solidified. Note that there are no bubbles in the agarose solution.
  • agarose BIO-RAD Megabase Agrose
  • Electrophoresis conditions 100V, 2h-2.5h, bromophenol yellow can run to the bottom of the gel.
  • Reagent name volume 5M NaCl 4 ⁇ L 1M TrisHCl 4 ⁇ L 2mM EDTA 20 ⁇ L water 172 ⁇ L Total 200 ⁇ L
  • 96 specific products of 180bp are numbered according to 1-96, and are divided into 8 groups (numbers are 1-12, 13-24, 25-36, 37-48, 49 respectively) -60, 61-72, 73-84, 85-96), after the equal mass of each group is mixed, take 50ng to match the tags 501-508 in order to build the library.
  • the product can be subjected to the next reaction or stored in a -20°C refrigerator.
  • reaction sample into the PCR instrument to react.
  • the reaction conditions are as follows in Table 13:
  • the main band is between 250-300bp. As shown in Figure 19, there will be some template self-linking products above the main band. Experimental verification shows that it will not affect the results, and the self-linking products will disappear after cyclization and digestion.
  • the linearly digested single-stranded loop products were quantified using the Qubit single-strand analysis kit (QubitssDNA Assay Kit).
  • the buffer and dye ratio is 199:1, mix and vortex, and centrifuge to mix. Take two 190 ⁇ L diluted dye working solution and add 10 ⁇ L of two standard products to vortex and centrifuge to mix. Use 199 ⁇ L diluted dye to work Add 1 ⁇ L of sample to the solution, vortex and centrifuge to quantify by Qubit instrument
  • the mixed library was sent to the BGISEQ-500 platform for sequencing by PE30+10 strategy.
  • Figure 21 shows the ESR results of the mixed library. It can be seen that the overall 5'end synthesis quality of the batch of 501-508 tag adapters is good, and the second strand is improved.
  • the ESR split results when constructing a library for all 8 tag adapters to be tested and taking 6 FOV data.
  • Label number Label matching rate
  • Label contamination rate 501 99.97% 0.03% 502 99.95% 0.05% 503 99.96% 0.04% 504 98.70% 1.30% 505 99.96% 0.04% 506 99.96% 0.04% 507 99.86% 0.14% 508 99.97% 0.03%

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

A tag sequence detection method, comprising: performing matching library establishment on a group of template sequences and a group of tag sequences to be detected, and obtaining a group of tag libraries, wherein the template sequences are different amplified or artificially-synthesized gene sequences, different template sequences are different from each other in a sequence, and the template sequences and the tag sequences to be detected have a one-to-one or many-to-one correspondence; perform sequencing on the tag libraries and obtaining a sequencing and length-reading sequence of each tag library; comparing the sequencing and length-reading sequence with all tag sequences to be detected, and collecting statistics about the number of sequencing and length-reading sequences on each tag sequence to be detected in a comparison manner; and calculating, according to the number, a contamination rate of other tag sequences contained in each tag sequence to be detected. According to the method, because the template sequences are different from each other in the sequence, there is no need to worry about a condition of the inability to distinguish different templates due to sequence errors caused by multiple PCR amplifications.

Description

标签序列的检测方法Label sequence detection method 技术领域Technical field
本发明涉及测序技术领域,具体涉及一种标签序列的检测方法。The invention relates to the field of sequencing technology, in particular to a method for detecting a tag sequence.
背景技术Background technique
在高通量二代测序中,通过给文库加入已知序列的文库标签(barcode),可以将多个文库混合在一起测序,后期根据文库标签将不同样品的信息分开。In high-throughput second-generation sequencing, multiple libraries can be mixed together for sequencing by adding a library tag of known sequence to the library, and later the information of different samples is separated according to the library tag.
标签文库有多种类型。第一种是单标签文库,可通过在接头上加上标签序列或在PCR引物上加上标签序列,再分别通过接头连接或PCR扩增的方式在文库中加上标签序列。第二种是双标签文库,可通过三种方式将双标签序列加入文库。第一种是在接头上加第一个标签序列,PCR引物上加第二个标签序列,通过加接头和PCR依次将两个标签加入文库中。第二种是两个标签序列分别加在PCR引物(聚合酶链式反应引物,Polymerase Chain Reaction primer)的F引物(正向扩增引物,Forward primer)和R引物(反向扩增引物,Reverse primer)上,通过PCR扩增同时将两个标签加入文库中。第三种是两个标签分别加在接头序列的顶链(top chain)和底链(bottom chain)上,通过加接头同时将两个标签加入文库中。综上可见,构建标签文库的寡核苷酸(oligo)可分为标签接头、标签引物两大类。There are many types of tag libraries. The first is a single-tagged library, which can be added to the library by adding a tag sequence to a linker or a tag sequence to a PCR primer, and then adding the tag sequence to the library through linker ligation or PCR amplification, respectively. The second type is the dual-tag library. There are three ways to add the dual-tag sequence to the library. The first is to add the first tag sequence to the linker, add the second tag sequence to the PCR primer, and add the two tags to the library in sequence by adding the linker and PCR. The second is the F primer (Forward amplification primer, Forward primer) and R primer (Reverse amplification primer, Reverse) of the two tag sequences added to the PCR primers (Polymerase Chain Reaction Primer, Polymerase Chain Reaction). primer), two tags are added to the library simultaneously by PCR amplification. The third is that two tags are added to the top chain and bottom chain of the linker sequence, and the two tags are added to the library at the same time by adding a linker. In summary, the oligo constructing the tag library can be divided into two categories: tag adaptor and tag primer.
虽然理论上标签混合(barcode pooling)文库(不同文库标签的单个文库混合在一起的文库)中一个标签只对应一个样本,但事实上有很多原因导致一个标签对应上了2个甚至是多个样本,造成测序数据之间的交叉污染。带文库标签的寡核苷酸(Barcode oligo)合成及纯化过程中引入的标签污染是原因之一。为此,在标准NGS(新一代测序,Next-Generation Sequencing)过程中,带文库标签的寡核苷酸的污染检测是必不可少的一个重要环节,在寡核苷酸投产之前,通过质检杜绝合成有污染的寡核苷酸投入使用,可从源头降低标签污染产生的概率。Although theoretically, a label in a pooling library (a library in which a single library of different library labels are mixed together) corresponds to only one sample, there are actually many reasons why a label corresponds to 2 or even multiple samples , Causing cross-contamination between sequencing data. One of the reasons is the label contamination introduced during the synthesis and purification of library-labeled oligonucleotides (Barcode). For this reason, in the standard NGS (Next-Generation, Sequencing) process, contamination detection of library-tagged oligonucleotides is an indispensable important link. Before the oligonucleotides are put into production, they pass the quality inspection Eliminate the use of synthetic contaminated oligonucleotides, which can reduce the probability of label contamination from the source.
目前寡核苷酸合成供应商标准的寡核苷酸QC(质量控制,Quality Control)方法主要包括OD(光密度,optical density)值检测、色谱检测、质谱检测,这些方法都不能检测寡核苷酸合成碱基的准确性,例如计划合成AATTCCGGA,而实际合成的寡核苷酸中有1%合成成了AATTCCGGT,有1%合成成了GATTCCGGA,这种合成错误的碱基如果位于测序引物的3’端,将直接影响测序引物杂交成功率和测序成功率,导致有效测序模板数降低或测序错误。At present, the standard oligonucleotide QC (Quality Control) methods of oligonucleotide synthesis suppliers mainly include OD (optical density, optical density) value detection, chromatographic detection, mass spectrometry detection, these methods can not detect oligonucleosides The accuracy of acid synthesis bases, such as planning to synthesize AATTCCGGA, and 1% of the actually synthesized oligonucleotides are synthesized into AATTCCGGT, and 1% are synthesized into GATTCCGGA. If the wrong synthetic base is located in the sequencing primer The 3'end will directly affect the sequencing primer hybridization success rate and sequencing success rate, resulting in a reduction in the number of effective sequencing templates or sequencing errors.
如图1-4所示,现有技术一直以来使用的检测寡核苷酸污染率的方法是以从质粒DNA上扩增出来的一段带10bp样本标签(index)序列的180bp的扩增DNA为模板,让其与待测寡核苷酸进行匹配NGS建库,通过NGS测序来分辨与样本标签序列匹配的文库标签序列的读长(reads)数,从而计算不匹配的其他标签的污染率。但该方法因模板DNA中只有10bp的碱基序列不一样,若是要进行read2测序,因碱基不平衡,二链测序质量得不到保证,故不能通过二链测序质量来间接检测寡核苷酸与二链引物杂交序列的碱基的合成准确性。As shown in Figure 1-4, the method used to detect the oligonucleotide contamination rate in the prior art has been based on a piece of amplified 180bp DNA with a 10bp sample index (index) sequence amplified from plasmid DNA as The template is matched with the oligonucleotide to be tested to build a NGS library, and NGS sequencing is used to distinguish the number of reads of library tag sequences that match the sample tag sequence, thereby calculating the contamination rate of other tags that do not match. However, this method has a different base sequence of only 10bp in template DNA. If read2 sequencing is to be performed, the quality of the second-strand sequencing cannot be guaranteed due to the imbalance of the bases, so the quality of the second-strand sequencing cannot be used to indirectly detect the oligonucleosides. The accuracy of the synthesis of the base of the hybridization sequence of the acid and the double-stranded primer.
检测寡核苷酸碱基合成准确性,主要是通过PCR扩增寡核苷酸序列全长并加上A碱基之后做T-A克隆,Sanger测序寡核苷酸构建的单克隆文库,一般每个寡核苷酸需要测序至少100个单克隆文库。Detection of the accuracy of oligonucleotide base synthesis, mainly by PCR amplification of the full length of the oligonucleotide sequence and adding A base to do TA cloning, monoclonal library constructed by Sanger sequencing oligonucleotide, generally each Oligonucleotides need to be sequenced with at least 100 monoclonal libraries.
另外,Eurofins公司采用质谱或液相色谱的方法来检测寡核苷酸合成质量,这些方法只能检测寡核苷酸纯度,不能检测合成碱基的准确性。2014年,Quail等报道了SASI-Seq(Sample Assurance Spike-In sequencing)的方法,采用带11nt的标签的Y构型特异引物,从PhiX 174中扩增384个带标签的Spike产物,用于质控混合文库的交叉污染及防止样本间混淆。但该方法未用于质控建库标签接头的标签合成污染率及合成碱基的准确性。In addition, Eurofins uses mass spectrometry or liquid chromatography to detect the quality of oligonucleotide synthesis. These methods can only detect the purity of oligonucleotides, not the accuracy of synthetic bases. In 2014, Quail and others reported on the SASI-Seq (Sample Assurance, Spike-In Sequencing) method, using 11 configuration Y-specific primers with nt tags to amplify 384 tagged Spike products from PhiX174 for quality control. Control cross-contamination of mixed libraries and prevent confusion between samples. However, this method has not been used for quality control of the tag synthesis pollution rate of the tag connector and the accuracy of base synthesis.
目前寡核苷酸合成出厂之前,一般都会对合成纯度及定量有一个质控过程,而对合成过程产生的寡核苷酸之间的交叉污染、合成的碱基准确性没有质控。NGS用户在拿到合成寡核苷酸之后,通常会对寡核苷酸(特别是标签接头或标签引物)进行污染率测试和合成碱基质量检测,一般两种检测不能在同一个实验中完成。Before the synthesis of oligonucleotides before leaving the factory, there is generally a quality control process for the purity and quantification of the synthesis, but there is no quality control for the cross-contamination between the oligonucleotides produced during the synthesis process and the accuracy of the synthetic bases. After obtaining synthetic oligonucleotides, NGS users usually carry out contamination rate tests and synthetic base quality tests on oligonucleotides (especially tag adapters or tag primers). Generally, the two tests cannot be completed in the same experiment .
发明内容Summary of the invention
本发明提供一种标签序列的检测方法,该方法包括:The invention provides a method for detecting a tag sequence. The method includes:
使用一组模板序列与一组待测标签序列进行匹配建库得到一组标签文库,上述模板序列是扩增或人工合成的不同基因序列,不同的模板序列在序列上彼此不同,上述模板序列与上述待测标签序列具有一对一或多对一的对应关系;A set of template sequences are matched with a set of tag sequences to be tested to build a library to obtain a set of tag libraries. The above template sequences are different gene sequences amplified or artificially synthesized. Different template sequences are different in sequence from each other. The above-mentioned tag sequence to be tested has a one-to-one or many-to-one correspondence;
对上述标签文库进行测序,得到每个标签文库的测序读长序列;Sequencing the above tag library to obtain the sequence read long sequence of each tag library;
将每个标签文库的测序读长序列与全部上述待测标签序列进行比对,统计比对到每个待测标签序列上的测序读长序列的数量;Compare the sequenced read long sequences of each tag library with all the above-mentioned test tag sequences, and count the number of sequenced read long sequences aligned to each test tag sequence;
根据上述数量计算每个待测标签序列中含有其他标签序列的污染率;Calculate the contamination rate of each tag sequence containing other tag sequences according to the above quantity;
在优选实施例中,上述模板序列是从基因组DNA中扩增得到的不同基因序列。In a preferred embodiment, the above template sequence is a different gene sequence amplified from genomic DNA.
在优选实施例中,上述一组模板序列的数量是N,N=4X,X是大于等于1的整数。In a preferred embodiment, the number of the above set of template sequences is N, N=4X, and X is an integer greater than or equal to 1.
在优选实施例中,上述一组模板序列的数量是96个。In a preferred embodiment, the number of the above-mentioned set of template sequences is 96.
在优选实施例中,上述模板序列的大小是50-1000bp,优选180bp。In a preferred embodiment, the size of the above template sequence is 50-1000bp, preferably 180bp.
在优选实施例中,全部上述模板序列的大小相等。In a preferred embodiment, all the above template sequences are of equal size.
在优选实施例中,上述模板序列满足其5’端和3’端与测序读长相同的碱基序列范围内A、T、C、G任意一种碱基占比至少不低于可保证四种碱基信号平衡的碱基占比值。In a preferred embodiment, the above template sequence satisfies that the base sequence ratio of A, T, C, and G at the 5'end and the 3'end is the same as the sequence length of the sequencing read length. The proportion of bases with a balanced base signal.
在优选实施例中,上述模板序列的5’端和3’端与测序读长相同的碱基序列范围是5’端和3’端20bp至200bp范围,优选30bp内。In a preferred embodiment, the 5'and 3'ends of the above template sequence have the same base sequence range as the read length of the sequencing is in the range of 20bp to 200bp at the 5'and 3'ends, preferably within 30bp.
在优选实施例中,上述碱基占比值是10%至30%,优选15%。In a preferred embodiment, the above base ratio is 10% to 30%, preferably 15%.
在优选实施例中,上述模板序列的数量是上述待测标签序列的数量的N倍,N是大于等于1的整数,上述模板序列分成相当于上述待测标签序列的数量的亚组数,每亚组上述模板序列包含N个模板序列。In a preferred embodiment, the number of the template sequences is N times the number of the label sequences to be tested, N is an integer greater than or equal to 1, and the template sequence is divided into subgroups corresponding to the number of the label sequences to be tested, each The above-mentioned template sequence of the subgroup contains N template sequences.
在优选实施例中,上述待测标签序列是标签接头和/或标签引物。In a preferred embodiment, the tag sequence to be tested is a tag adapter and/or tag primer.
在优选实施例中,上述待测标签序列是单标签接头。In a preferred embodiment, the tag sequence to be tested is a single tag linker.
在优选实施例中,上述匹配建库包括:将上述模板序列与上述单标签接头按照一对一或多对一的对应方式进行接头连接,然后用通用引物进行PCR扩增得到用于上机的单标签文库。In a preferred embodiment, the matching library construction includes: linking the template sequence and the single-tag adaptor in a one-to-one or many-to-one correspondence, and then performing PCR amplification with universal primers to obtain the Single-label library.
在优选实施例中,上述待测标签序列是单标签引物。In a preferred embodiment, the tag sequence to be tested is a single tag primer.
在优选实施例中,上述匹配建库包括:将上述模板序列与通用接头连接得到连接产物,然后与单标签引物按照一对一或多对一的对应方式进行PCR扩增得到用于上机的单标签文库。In a preferred embodiment, the above matching library construction includes: connecting the above template sequence with a universal linker to obtain a ligation product, and then performing PCR amplification with a single-label primer in a one-to-one or many-to-one correspondence to obtain a computer-based Single-label library.
在优选实施例中,上述待测标签序列是标签接头和标签引物组成的双标签序列。In a preferred embodiment, the tag sequence to be tested is a double tag sequence composed of a tag adapter and a tag primer.
在优选实施例中,上述匹配建库包括:将上述模板序列与上述标签接头按照一对一或多对一的对应方式进行接头连接,然后与上述标签引物按照一对一或多对一的对应方式进行PCR扩增得到用于上机的双标签文库。In a preferred embodiment, the matching library construction includes: linking the template sequence to the tag adapter according to a one-to-one or many-to-one correspondence, and then corresponding to the tag primer according to one-to-one or many-to-one correspondence PCR amplification to obtain a dual-tag library for computer use.
在优选实施例中,上述待测标签序列是两个标签引物组成的双标签引物。In a preferred embodiment, the tag sequence to be tested is a double tag primer composed of two tag primers.
在优选实施例中,上述匹配建库包括:将上述模板序列与通用接头连接得到连接产物,然后与上述双标签引物按照一对一或多对一的对应方式进行PCR扩增得到用于上机的双标签文库。In a preferred embodiment, the above matching library construction includes: connecting the above template sequence with a universal linker to obtain a ligation product, and then performing PCR amplification with the above-mentioned double-tag primers in a one-to-one or many-to-one correspondence to obtain a computer Dual-label library.
在优选实施例中,上述待测标签序列是两个标签接头组成的双标签接头。In a preferred embodiment, the above-mentioned tag sequence to be tested is a double tag connector composed of two tag connectors.
在优选实施例中,上述匹配建库包括:将上述模板序列与上述双标签接头按照一对一或多对一的对应方式连接得到连接产物,然后与通用引物PCR扩增得到用于上机的双标签文库。In a preferred embodiment, the matching library construction includes: connecting the template sequence and the double-tag adaptor in a one-to-one or many-to-one correspondence to obtain a ligation product, and then PCR amplifying with a universal primer to obtain Double-tagged library.
在优选实施例中,上述测序是双末端测序。In a preferred embodiment, the above sequencing is double-end sequencing.
在优选实施例中,上述测序是PE30+10测序。In a preferred embodiment, the above sequencing is PE30+10 sequencing.
在优选实施例中,全部上述待测标签序列是同一批合成的全部标签序列。In a preferred embodiment, all the tag sequences to be tested are all tag sequences synthesized in the same batch.
在优选实施例中,上述方法还包括根据上述测序读长序列获得二链(read2)的测序质量评估结果;以及In a preferred embodiment, the above method further includes obtaining a second strand (read 2) sequencing quality evaluation result based on the above sequencing read long sequence; and
任选地,通过上述评估结果来间接判断上述标签文库或上述标签文库的混合标签文库与二链测序相关接头的5’端碱基合成质量。Optionally, the 5'base synthesis quality of the tag library or the mixed tag library of the tag library and the second-strand sequencing-related linker is indirectly judged by the evaluation result.
在优选实施例中,上述模板序列通过SEQ ID NO:1~192所示的96对引物对扩增人基因组得到。In a preferred embodiment, the above template sequence is obtained by amplifying the human genome with 96 primer pairs shown in SEQ ID NO: 1 to 192.
本发明使用一组模板序列与待测标签序列进行匹配建库来检测标签序列,模板序列一旦制备成功,即可进行多次的扩增后再扩增,节省模板序列制备成本;由于模板序列在序列上彼此不同,无需担心由于多次PCR扩增带来的序列错误而无法区分不同模板的情况。The invention uses a set of template sequences to match the tag sequence to be tested and build a library to detect the tag sequence. Once the template sequence is successfully prepared, it can be amplified multiple times and then amplified, saving template sequence preparation costs; The sequences are different from each other, and there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications.
此外,优选技术方案通过检测二链测序质量来间接检测标签接头或与二链测序相关接头的5’端碱基合成质量。可通过一个实验实现标签污染率和与测序引物相关的寡核苷酸合成质量的同时检测,节省质控的人力和成本。In addition, the preferred technical solution indirectly detects the quality of the 5'base synthesis of the tag linker or the linker related to the second-strand sequencing by detecting the quality of the second-strand sequencing. The simultaneous detection of label contamination rate and oligonucleotide synthesis quality related to sequencing primers can be achieved through an experiment, saving quality control labor and cost.
附图说明BRIEF DESCRIPTION
图1为现有技术中模板DNA的制备流程图;1 is a flow chart of preparation of template DNA in the prior art;
图2为现有技术中单标签接头质控方法中的建库流程示意图;2 is a schematic diagram of a database building process in the quality control method of a single-label joint in the prior art;
图3为现有技术中单标签接头质控方法中的测序原理示意图;FIG. 3 is a schematic diagram of the sequencing principle in the quality control method of a single-tag joint in the prior art;
图4为现有技术中单标签接头质控方法中的污染率计算原理图;4 is a schematic diagram of the calculation of the pollution rate in the quality control method of a single-label joint in the prior art;
图5为本发明实施例中模板DNA制备方法示意图;5 is a schematic diagram of a template DNA preparation method in an embodiment of the present invention;
图6为本发明实施例中质检单标签接头的建库流程示意图;FIG. 6 is a schematic diagram of a library building process of a quality inspection single label connector according to an embodiment of the present invention;
图7为本发明实施例中质检单标签引物的建库流程示意图;7 is a schematic diagram of a library construction process of a quality inspection single label primer in an embodiment of the present invention;
图8为本发明实施例中质检单标签序列的测序原理示意图;FIG. 8 is a schematic diagram of the sequencing principle of the single label sequence of the quality inspection in the embodiment of the present invention;
图9为本发明实施例中质检单标签序列的文库测序后污染率和二链测序质量评估结果ESR;FIG. 9 is the contamination rate after the library sequencing of the quality inspection single tag sequence and the ESR of the second-strand sequencing quality evaluation in the embodiment of the present invention; FIG.
图10为本发明实施例中质检标签接头+标签引物的建库流程示意图;10 is a schematic diagram of a library construction process of a quality inspection tag adapter + tag primer in an embodiment of the present invention;
图11为本发明实施例中质检双标签引物的建库流程示意图;FIG. 11 is a schematic diagram of a library construction process of a double tag primer for quality inspection in an embodiment of the present invention;
图12为本发明实施例中质检双标签接头的建库流程示意图;FIG. 12 is a schematic diagram of the process of building a library for a quality-inspected double-tag connector according to an embodiment of the present invention;
图13为本发明实施例中质检双标签序列的文库测序原理示意图;13 is a schematic diagram of a library sequencing principle of a quality-checked double-tag sequence in an embodiment of the present invention;
图14为本发明实施例中质检双标签序列的文库测序后污染率和二链测序质量评估结果ESR;FIG. 14 is the contamination rate after the library sequencing of the quality-checked double-tag sequence and the ESR of the second-strand sequencing quality evaluation in the embodiment of the present invention; FIG.
图15为本发明实施例中ESR二链不提升和ESR二链提升的主带为160bp的文库SE200测序之后得到的接头序列统计结果;15 is a linker sequence statistical result obtained after sequencing the library SE200 of the ESR second-strand non-elevated and ESR second-strand elevated main band of 160 bp in an embodiment of the present invention;
图16为本发明实施例中不同厂家及不同批次标签接头构建的单标签混合文库的ESR统计结果图;16 is a graph of ESR statistical results of a single-label mixed library constructed by different manufacturers and different batches of label adapters in an embodiment of the present invention;
图17为本发明实施例中琼脂糖凝胶电泳鉴定扩增产物图;17 is a diagram of amplifying products identified by agarose gel electrophoresis in an embodiment of the present invention;
图18为本发明实施例中180bp特异性产物再纯化扩增产物电泳图;18 is an electrophoresis diagram of a 180bp specific product repurification amplification product in an embodiment of the present invention;
图19为本发明实施例中PCR产物琼脂糖凝胶电泳;19 is agarose gel electrophoresis of PCR products in an embodiment of the present invention;
图20为本发明实施例中环化产物TBU-PAGE胶电泳结果图;20 is a graph of TBU-PAGE gel electrophoresis results of the cyclization product in the embodiment of the present invention;
图21为本发明实施例中混合文库的ESR结果图。21 is a graph of ESR results of the mixed library in the embodiment of the present invention.
图22为本发明实施例中所有待检8个标签接头构建文库ESR拆分结果图。FIG. 22 is a graph showing the results of ESR splitting of a library constructed by all 8 tag adapters to be tested in the embodiment of the present invention.
具体实施方式detailed description
下面通过具体实施方式结合附图对本发明作进一步详细说明。其中不同实施方式中类似元件采用了相关联的类似的元件标号。在以下的实施方式中,很多细节描述是为了使得本发明能被更好的理解。然而,本领域技术人员可以毫不费力的认识到,其中部分特征在不同情况下是可以省略的,或者可以由其他材料、方法所替代。The present invention will be further described in detail below through specific embodiments and drawings. Corresponding similar element labels are used for similar elements in different embodiments. In the following embodiments, many details are described to enable the present invention to be better understood. However, those skilled in the art can easily recognize that some of the features can be omitted in different situations, or can be replaced by other materials and methods.
另外,说明书中所描述的特点、操作或者特征可以以任意适当的方式结合形成各种实施方式。同时,方法描述中的各步骤或者动作也可以按照本领域技术人员所能显而易见的方式进行顺序调换或调整。因此,说明书和附图中的各种顺序只是为了清楚描述某一个实施例,并不意味着是必须的顺序,除非另有说明其中某个顺序是必须遵循的。In addition, the features, operations, or characteristics described in the specification may be combined in any appropriate manner to form various embodiments. At the same time, the steps or actions in the method description can also be sequentially replaced or adjusted in a manner obvious to those skilled in the art. Therefore, the various orders in the specification and the drawings are only for clearly describing a certain embodiment, and do not mean a necessary order, unless otherwise stated that a certain order must be followed.
本文中为部件所编序号本身,例如“第一”、“第二”等,仅用于区分所描述的对象,不具有任何顺序或技术含义。The serial numbers themselves, such as "first", "second", etc., are used to distinguish the described objects, and do not have any order or technical meaning.
定义definition
标签序列寡核苷酸(barcode oligo),本文中也称为“标签序列”(barcode),或“寡核苷酸”(oligo),是指文库(例如测序文库)构建中使用的具有区分不同样本来源和/或分子来源功能的核苷酸序列,包括标签接头和标签引物等。这些标签序列通过人工合成获得。A tag sequence oligonucleotide (barcode), also referred to herein as a "tag sequence" (barcode), or "oligonucleotide" (oligo), refers to a library (such as a sequencing library) used in the construction of differentiating Sample source and/or molecular source functional nucleotide sequence, including tag adaptor and tag primer. These tag sequences are obtained by artificial synthesis.
标签接头,含有标签的接头,指文库(例如测序文库)构建中使用的具有区分不同样本 来源和/或分子来源功能的接头序列,包括单标签接头和双标签接头,其中双标签接头由两个单个的标签接头组成。Tag linker, a tag-containing linker, refers to a linker sequence used in the construction of a library (such as a sequencing library) that has the function of distinguishing different sample sources and/or molecular sources, including a single tag linker and a double tag linker, where the double tag linker consists of Consists of a single label connector.
标签引物,含有标签的引物,指文库(例如测序文库)构建中使用的具有区分不同样本来源和/或分子来源功能的引物序列,包括单标签引物和双标签引物,其中双标签引物由两个单个的标签引物组成。Tag primers, tag-containing primers, refer to the primer sequences used in the construction of libraries (such as sequencing libraries) to distinguish different sample sources and/or molecular sources, including single tag primers and double tag primers, where the double tag primer consists of two Consists of a single tag primer.
标签文库,是指通过文库构建方法得到的包含本发明的标签序列的文库,特别是测序文库。标签文库包括单标签文库和双标签文库。其中,单标签文库包含单标签接头或单标签引物。双标签文库包含双标签接头或双标签引物。The tag library refers to a library containing the tag sequence of the present invention obtained by a library construction method, especially a sequencing library. Tag libraries include single tag libraries and dual tag libraries. Among them, the single tag library contains a single tag adaptor or a single tag primer. The dual tag library contains dual tag adaptors or dual tag primers.
模板序列,是指本发明中用于与标签序列进行匹配建库的序列,不同的模板序列在序列上彼此不同。在优选实施例中,模板序列是从基因组DNA中扩增得到的不同基因序列。这样的模板序列,由于模板序列之间每个位置的序列都不一样,无需担心由于多次PCR扩增带来的序列错误而无法区分不同模板的情况。The template sequence refers to a sequence used for matching and building a library with a tag sequence in the present invention, and different template sequences are different from each other in sequence. In a preferred embodiment, the template sequence is a different gene sequence amplified from genomic DNA. Such a template sequence, since the sequence at each position between template sequences is different, there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications.
测序,是指测定核酸序列的方法,本发明中尤其是指测定标签文库序列的方法,包括单端测序和双端测序,本发明优选双端测序,特别是PE30+10测序,即包括两端30bp测序长度和10bp标签序列长度的测序策略。Sequencing refers to the method for determining the nucleic acid sequence. In the present invention, it specifically refers to the method for determining the sequence of the tag library, which includes single-end sequencing and double-end sequencing. The present invention prefers double-end sequencing, especially PE30+10 sequencing, which includes both ends 30bp sequencing length and 10bp tag sequence length sequencing strategy.
本发明可在一个实验中同时完成标签序列寡核苷酸(barcode oligo)合成污染率质控及与测序质量相关的寡核苷酸碱基合成质量的间接质控,为标签序列寡核苷酸的质检提供新方法。本发明的检测方法不仅适用于单标签序列寡核苷酸,也适合于双标签序列寡核苷酸。The invention can simultaneously complete the quality control of the tag sequence oligonucleotide synthesis pollution rate and the indirect quality control of the oligonucleotide base synthesis quality related to the sequencing quality in one experiment, which is the tag sequence oligonucleotide The quality inspection provides a new method. The detection method of the present invention is applicable not only to single-tag sequence oligonucleotides, but also to double-tag sequence oligonucleotides.
如图5所示,本发明一个实施例中,以从人或其他任一物种基因组DNA中扩增N(N=4X,X≥1)个(如96个)固定碱基个数(包括50-1000bp之间的任一片段大小,优选测序仪测序质量相对较佳的片段大小,优选180bp)的基因序列,该基因序列需要满足其5’和3’末端与测序读长相同的碱基序列范围内(如5’和3’端20bp至200bp范围,优选30bp内)A/T/C/G任意一种碱基占比至少不低于一个设定值,这样可保证A/T/C/G碱基信号平衡,该设定值例如可以是10%至30%范围内的某一百分值,例如15%等,以保证标签序列测序(如PE30+10)的插入片段碱基平衡,即保证测序质量不受模板碱基不平衡的影响。用PCR切胶回收得到的PCR纯化产物做为种子PCR模板,进行扩增后再扩增(PCR on PCR)来得到大量的测试模板DNA。As shown in FIG. 5, in one embodiment of the present invention, N (N=4X, X≥1) (such as 96) fixed base numbers (including 50) are amplified from genomic DNA of human or any other species Any fragment size between -1000bp, preferably the sequencer sequence quality is relatively better, preferably 180bp) gene sequence, the gene sequence needs to meet the 5'and 3'end of the same base sequence read sequencing length Within the range (such as the range of 20bp to 200bp at the 5'and 3'ends, preferably within 30bp), the proportion of any base of A/T/C/G is at least not less than a set value, so that A/T/C can be guaranteed /G base signal balance, the set value may be, for example, a certain percentage value in the range of 10% to 30%, for example, 15%, etc., to ensure the base balance of the insert sequence of the tag sequence sequencing (such as PE30+10) , That is, to ensure that the quality of sequencing is not affected by the imbalance of template bases. The PCR purified product recovered by PCR gel cutting is used as a seed PCR template, and then amplified (PCR on PCR) to obtain a large amount of test template DNA.
当有N个标签序列寡核苷酸待检时,将测试模板DNA均分成N组,然后再将分组之后的模板DNA分别混合之后,与待测标签序列寡核苷酸进行匹配建库。例如,当有8个标签接头需要用制备好的96个模板DNA进行质检时,则将1-96个模板DNA分成8组,1-12号模板DNA等质量比例混合之后与标签序列1(Barcode1)进行匹配建库,13-24号模板DNA等质量比例混合之后与标签序列2(Barcode2)进行匹配建库,以此类推到85-96号模板DNA等质量比例混合之后与标签序列8(Barcode8)进行匹配建库。通过PE30+10测序来分辨与模板DNA匹配的标签序列的读长(reads)数,从而计算不匹配的其他标签序列的污染率,同时通过二链测序质量,如DNB测序中二链ESR(Effective Spot Rate,有效测序位点比率)提升情况,来间接检测标签序列寡核苷酸与二链引物杂交序列的碱基合成准确性。又如,当有96个标签接头需要用制备好的96个模板DNA片段进行质检时,则按照1个DNA模板对应 1个标签接头进行匹配建库,通过PE30+10测序来进行标签序列污染率和标签序列寡核苷酸合成准确性间接检测。标签序列寡核苷酸可以是单标签建库的寡核苷酸,也可以是双标签建库的寡核苷酸。需要说明的是,虽然在待检的标签序列少于测试模板DNA(例如96个)的情况下,理论上可以仅采用部分测试模板DNA,然而,考虑到需要保证模板序列在选定的测序策略(例如PE30测序策略)下每个测序位置的碱基平衡,采用全部测试模板DNA(例如96个)有利于保证这一点。When there are N tag sequence oligonucleotides to be tested, the test template DNAs are all divided into N groups, and then the grouped template DNAs are mixed separately, and matched with the tag sequence oligonucleotides to be tested to build a library. For example, when there are 8 tag adapters that need to be prepared with 96 template DNAs for quality inspection, then 1-96 template DNAs are divided into 8 groups, and the mass ratio of template DNAs 1-12 is mixed with the tag sequence 1( Barcode1) for matching and library building, after mixing equal proportions of template DNA No. 13-24 with tag sequence 2 (Barcode2), and so on, and mixing with equal sequence proportions of template DNA No. 85-96 and tag sequence 8 ( Barcode8) to match and build the library. PE30+10 sequencing is used to distinguish the number of reads of the tag sequence that matches the template DNA, so as to calculate the contamination rate of other tag sequences that do not match, and the quality of the second-strand sequencing, such as the second-strand ESR (Effective) in DNB sequencing Spot Rate, the ratio of effective sequencing sites) to improve the situation, to indirectly detect the base synthesis accuracy of the hybridization sequence of the tag sequence oligonucleotide and the double-stranded primer. For another example, when there are 96 tag adapters that need to be prepared with 96 template DNA fragments for quality inspection, then a DNA template corresponding to a tag adapter is used for matching and library construction, and PE 30+10 sequencing is used to carry out tag sequence contamination. Indirect detection of rate and tag sequence oligonucleotide synthesis accuracy. The tag sequence oligonucleotide may be a single tag library-building oligonucleotide or a double tag library-building oligonucleotide. It should be noted that, in the case where the tag sequence to be tested is less than the test template DNA (for example, 96), theoretically only part of the test template DNA can be used, however, considering the need to ensure that the template sequence is in the selected sequencing strategy (For example, PE30 sequencing strategy) the base balance of each sequencing position, using all the test template DNA (for example, 96) is beneficial to ensure this.
当标签序列寡核苷酸是单标签建库的寡核苷酸,即单标签接头或单标签引物时,其质检方法如图6-9所示。如图6所示,当有N(N=4X,X≥1)个单标签接头(如标签1至标签N接头)需要用制备好的N个模板DNA片段(如模板DNA A至模板DNA N,或基因A片段至基因N片段)进行质检时,则按照DNA和标签接头一一对应的方式进行接头连接,再用通用引物进行PCR扩增得到用于上机的单标签文库。或者,如图7所示,当有N(N=4X,X≥1)个单标签引物(如标签1引物至标签N引物)需要用制备好的N个模板DNA片段(如模板DNA A至模板DNA N,或基因A片段至基因N片段)进行质检时,则按照DNA模板与通用接头连接之后得到DNA连接产物与标签引物一一对应的方式进行PCR扩增得到用于上机的单标签文库。When the tag sequence oligonucleotide is a single tag library-building oligonucleotide, that is, a single tag adaptor or a single tag primer, its quality inspection method is shown in Figure 6-9. As shown in Figure 6, when there are N (N = 4X, X ≥ 1) single-tag adaptors (such as tag 1 to tag N adaptor), the prepared N template DNA fragments (such as template DNA A to template DNA N , Or Gene A fragment to Gene N fragment) for quality inspection, the adapters are connected in a one-to-one correspondence with DNA and tag adapters, and then PCR amplification is performed with universal primers to obtain a single-tag library for computer use. Alternatively, as shown in FIG. 7, when there are N (N=4X, X≥1) single-label primers (such as label 1 primer to label N primer), the prepared N template DNA fragments (such as template DNA to A to Template DNA (N fragments, or gene A fragments to gene N fragments) for quality inspection, PCR amplification is performed in a one-to-one correspondence with the DNA ligation products and tag primers after the DNA template is connected to the universal adapter to obtain the single Tag library.
如图8所示,每个文库等比例混合之后,通过PE30+10测序来获得每个模板DNA(如基因A片段至基因N片段)的测序读长对应到不同标签(如标签1至标签N)的读长数,如图9中4995,4,X,1,0,4998,Y,2,2,8,Z,4990等数值。根据读长个数按照标签污染率公式,可计算出每个标签接头/引物中包含其他标签接头/引物的污染率,如图9中标签1接头的污染率为(4+X+1)/(4+X+1+4995)*100%,标签2接头的污染率为(0+Y+2)/(0+Y+2+4998)*100%,标签N接头的污染率为(2+8+Z)/(2+8+Z+4990)*100%。同时,通过下机报告,可获得二链(read2)的测序质量评估结果(如图9中ESR结果),通过评估结果来间接判断混合单标签文库与二链测序相关接头的5’端碱基合成质量。进一步通过下机数据的拆分分析,可获得每个单标签的测序质量评估结果(图中未示出)。As shown in Fig. 8, after equal mixing of each library, the sequencing read length of each template DNA (such as gene A fragment to gene N fragment) obtained by PE30+10 sequencing corresponds to different tags (such as tag 1 to tag N ), such as 4995, 4, X, 1, 0, 4998, Y, 2, 2, 8, 8, Z, 4990 and other values in Figure 9. According to the number of read lengths and the label pollution rate formula, the pollution rate of each label adapter/primer containing other label adapters/primers can be calculated. As shown in Figure 9, the pollution rate of the label 1 adapter is (4+X+1)/ (4+X+1+4995)*100%, the pollution rate of the label 2 connector is (0+Y+2)/(0+Y+2+4998)*100%, the pollution rate of the label N connector is (2 +8+Z)/(2+8+Z+4990)*100%. At the same time, through the off-board report, the sequencing quality evaluation results of the second strand (read2) (as shown in Figure 9 ESR results) can be obtained, and the 5'end bases of the hybrid single-tagged library and the second strand sequencing related linkers can be indirectly judged by the evaluation results Synthetic quality. Further, through the split analysis of the off-line data, the sequencing quality evaluation result of each single tag (not shown in the figure) can be obtained.
当标签序列寡核苷酸是双标签建库的寡核苷酸,即标签接头+标签引物、双标签引物或双标签接头时,其质检方法如图10-14所示。如图10所示,当有N(N=4X,X≥1)个标签接头(如标签1至标签N接头)和N个标签引物(如标签1至标签N引物)需要用制备好的N个模板DNA片段(如模板DNA A至模板DNA N,或基因A片段至基因N片段)进行质检时,则按照DNA和标签接头一一对应的方式进行接头连接,再用标签引物一一对应进行PCR得到用于上机的双标签文库。如图11所示,当有N(N=4X,X≥1)个标签引物F(如标签1至标签N引物F)和N个标签引物R(如标签1至标签N引物R)需要用制备好的N个模板DNA片段(如图中模板DNA A至模板DNA N,或基因A片段至基因N片段)进行质检时,则按照DNA模板与通用接头连接之后得到DNA连接产物与双标签引物一一对应的方式进行PCR扩增得到用于上机的双标签文库。如图12所示,当有N(N=4X,X≥1)个标签接头顶链(top)和N个标签接头底链(bottom)(如标签1+标签1至标签N+标签N接头)需要用制备好的N个模板DNA片段(如图中模板DNA A至模板DNA N,或基因A片段至基因N片段)进行质检时,则按照DNA模板与双标签接头一一对应连接之后得到DNA连接产物与 通用引物PCR扩增得到用于上机的双标签文库。When the tag sequence oligonucleotide is a double tag library-building oligonucleotide, that is, a tag adaptor + tag primer, a double tag primer or a double tag adaptor, the quality inspection method is shown in Figure 10-14. As shown in Figure 10, when there are N (N = 4X, X ≥ 1) tag adapters (such as tag 1 to tag N adapter) and N tag primers (such as tag 1 to tag N primer), the prepared N When a template DNA fragment (such as template DNA A to template DNA N, or gene A to gene N) is subjected to quality inspection, the adapters are connected in a one-to-one correspondence with DNA and tag adapters, and then one-to-one correspondence with tag primers PCR was performed to obtain a dual-tag library for computer use. As shown in Figure 11, when there are N (N = 4X, X ≥ 1) tag primers F (such as tag 1 to tag N primer F) and N tag primers R (such as tag 1 to tag N primer R) need to be used When the prepared N template DNA fragments (such as template DNA A to template DNA N or gene A to gene N fragments) are subjected to quality inspection, the DNA ligation products and double tags are obtained according to the DNA template and the universal adapter. PCR amplification is performed in a one-to-one correspondence with primers to obtain a dual-tag library for computer use. As shown in Figure 12, when there are N (N = 4X, X ≥ 1) tag connector top chain (top) and N tag connector bottom chain (bottom) (such as tag 1 + tag 1 to tag N + tag N connector) When the prepared N template DNA fragments (such as template DNA A to template DNA N, or gene A to gene N fragments) are required for quality inspection, they are obtained after one-to-one correspondence between the DNA template and the double-tag adaptor. The DNA ligation product and universal primers were amplified by PCR to obtain a dual-tag library for computer use.
如图13所示,每个文库等比例混合之后,通过PE30+10测序来获得每个模板DNA(如图中基因A片段至基因N片段)的测序读长对应到不同标签接头或标签引物F或标签接头顶链(如标签1至标签N)的读长数(如图14中4995,4,X,1,0,4998,Y,2,2,8,Z,4990)和不同标签引物或标签引物R或标签接头底链的读长数(如图14中4990,10,X,0,3,4995,Y,2,3,12,Z,4985)。通过读长数按照标签污染率公式可计算出每个标签接头或标签引物F或标签接头顶链中包含其他标签接头或标签引物F或标签接头顶链的污染率,如图14中第一个表格中标签1接头或标签1引物F或标签1接头顶链的污染率为(4+X+1)/(4+X+1+4995)*100%,标签2的污染率为(0+Y+2)/(0+Y+2+4998)*100%,标签N的污染率为(2+8+Z)/(2+8+Z+4990)*100%,如图14中第二个表格中标签1引物或标签1引物R或标签1接头底链的污染率为(10+X+0)/(10+X+0+4990)*100%,标签2引物或标签2引物R或标签2接头底链的污染率为(3+Y+2)/(3+Y+2+4995)*100%,标签N引物或标签N引物R或标签N接头底链的污染率为(3+12+Z)/(3+12+Z+4985)*100%。同时,通过下机报告,可获得二链(read2)的测序质量评估结果(如图中ESR结果),通过评估结果来间接判断混合双标签文库与二链测序相关接头的5’端碱基合成质量。进一步通过下机数据的拆分分析,可获得每个单标签的测序质量评估结果(图中未示出)。As shown in Fig. 13, after equal mixing of each library, the sequencing read length of each template DNA (such as gene A fragment to gene N fragment) obtained by PE30+10 sequencing corresponds to different tag adapters or tag primers F Or the read length of the tag linker top chain (such as tag 1 to tag N) (such as 4995, 4, X, 1, 0, 4998, Y, 2, 2, 8, Z, 4990 in Figure 14) and different tag primers Or the read length of the tag primer R or the bottom link of the tag adaptor (as shown in Fig. 14 4990, 10, X, 0, 3, 4995, Y, 2, 3, 12, Z, 4985). The pollution rate of each label adapter or label primer F or label adapter top chain containing other label adapters or label primers F or label adapter top chain can be calculated by reading the length according to the label contamination rate formula, as shown in the first in Figure 14 In the table, the pollution rate of the tag 1 linker or tag 1 primer F or tag 1 linker top chain is (4+X+1)/(4+X+1+4995)*100%, and the tag 2 pollution rate is (0+ Y+2)/(0+Y+2+4998)*100%, the pollution rate of label N is (2+8+Z)/(2+8+Z+4990)*100%, as shown in Figure 14 In the two tables, the contamination rate of the label 1 primer or the label 1 primer R or the label 1 linker bottom chain is (10+X+0)/(10+X+0+4990)*100%, the label 2 primer or the label 2 primer The contamination rate of the bottom chain of the R or tag 2 linker is (3+Y+2)/(3+Y+2+4995)*100%, and the contamination rate of the tag N primer or tag N primer R or tag N linker bottom link (3+12+Z)/(3+12+Z+4985)*100%. At the same time, through the off-board report, the sequencing quality evaluation results of the second strand (read2) (as shown in the ESR results in the figure) can be obtained, and the 5'base synthesis of the hybrid double-tagged library and the second strand sequencing related linker can be indirectly judged by the evaluation results quality. Further, through the split analysis of the off-line data, the sequencing quality evaluation result of each single tag (not shown in the figure) can be obtained.
本发明的特点包括:(1)模板DNA的设计需遵循碱基平衡的原则,以保证测序质量不受模板DNA因碱基不平衡而产生影响。(2)模板DNA一旦制备成功,即可进行多次的扩增后再扩增(PCR on PCR),节省模板DNA制备成本。由于模板DNA之间每个位置的序列都不一样,不用担心由于多次PCR扩增带来的序列错误而无法区分不同模板的情况,PCR带来的少量错误可以通过适当容错来解决。而现有技术由于只有10bp序列不一样,不宜进行扩增后再扩增(PCR on PCR)的模板制备。(3)若制备的模板DNA有X个,则可检测标签寡核苷酸(标签oligo)待检个数在4-X之间的标签寡核苷酸,实验安排不受待检标签寡核苷酸个数的影响。(4)创新性地发明了通过检测二链测序质量来间接检测标签接头或与二链测序相关接头的5’端碱基合成质量的方法。(5)可通过一个质控系统实现标签污染率和与测序引物相关的寡核苷酸合成质量的同时检测,节省质控的人力和成本。(6)本发明的方法可满足各种类型标签文库构建的标签寡核苷酸质检,灵活方便。The characteristics of the present invention include: (1) The design of template DNA needs to follow the principle of base balance to ensure that the sequencing quality is not affected by template DNA due to base imbalance. (2) Once the template DNA is successfully prepared, it can be amplified multiple times before amplification (PCR on PCR), saving the cost of template DNA preparation. Since the sequence of each position is different between template DNAs, there is no need to worry about the situation that different templates cannot be distinguished due to sequence errors caused by multiple PCR amplifications. A small number of errors caused by PCR can be solved by proper fault tolerance. In the prior art, since only the 10 bp sequence is different, it is not suitable for template preparation after amplification (PCR on PCR). (3) If there are X template DNAs prepared, the number of label oligonucleotides (label oligo) to be detected can be detected between 4-X, and the experimental arrangement is not subject to the label oligonucleotides to be detected The effect of the number of nucleotides. (4) Innovatively invented a method for indirectly detecting the quality of 5'base synthesis of a tag connector or a connector related to second-strand sequencing by detecting the quality of second-strand sequencing. (5) Simultaneous detection of label contamination rate and oligonucleotide synthesis quality related to sequencing primers can be achieved through a quality control system, saving quality control labor and cost. (6) The method of the invention can meet the quality inspection of tag oligonucleotides constructed by various types of tag libraries, and is flexible and convenient.
本发明中,通过检测二链测序质量来间接检测与二链测序引物杂交寡核苷酸的合成质量的方法,是通过一系列测试调查之后产生的。首先,通过测通ESR二链不提升(质量差)和提升(质量好)的文库的接头区域,发现ESR二链不提升的接头序列碱基完全正确的比率低于ESR二链提升的,从而推测接头合成质量或建库质量影响了接头序列的正确性从而影响二链的测序质量。如图15所示,本发明的一个实施例中,ESR二链不提升和ESR二链提升的主带为160bp的文库SE200测序之后得到的接头序列统计结果显示,图中示出接头序列(adapterSeq)主带为160bp的文库SE200测序之后得到的接头序列,数量(number)是每条序列对应的读长数,百分比(percent%)是每条序列占总序列的占比百分值。第一行序列是正确的接头序列,其他行序列是含有错误碱基(图中框中的碱基)的接头序列,可见ESR二链不提升文库的正确接头序列的占比低于ESR二链提升文库的正确接头序列的占比。In the present invention, the method of indirectly detecting the synthesis quality of the oligonucleotide hybridized with the second-strand sequencing primer by detecting the quality of the second-strand sequencing is generated after a series of test investigations. First, by measuring the linker regions of the library with no improvement (poor quality) and improved (good quality) of the ESR second strand, it was found that the ratio of the base sequence of the linker sequence of the ESR second strand not improved is completely lower than that of the ESR second strand, thus It is speculated that the quality of linker synthesis or library construction affects the correctness of the linker sequence and thus affects the sequencing quality of the second strand. As shown in FIG. 15, in one embodiment of the present invention, the statistical results of the linker sequence obtained after sequencing the library SE200 of the ESR second-strand non-elevation and the main band of the ESR second-strand promotion is 160 bp are shown. The figure shows the adapter sequence (adapterSeq ) The linker sequence obtained after sequencing the library SE200 with a main band of 160 bp. The number (number) is the number of reads corresponding to each sequence, and the percentage (percent%) is the percentage of each sequence to the total sequence. The first line of sequence is the correct linker sequence, the other line of sequence is the linker sequence containing the wrong base (base in the box in the figure), it can be seen that the ESR second strand does not increase the proportion of the correct linker sequence of the library is lower than the ESR second strand Increase the proportion of correct linker sequences in the library.
本发明中,进而又通过一系列的对比调查测试,发现ESR二链不提升(质量差)的主要原因是接头合成质量的影响,如图16所示,上图中对比了厂家A和厂家B两个不同厂家的单标签97至单标签104混合文库的ESR结果,可见厂家A单标签97-104接头混合文库的二链ESR有明显提升(图中厂家A 97~104曲线),而厂家B单标签97-104接头混合文库的二链ESR没有明显提升(图中厂家B 97~104曲线)。下图中对比了不同批次(第一批、第二批、第三批)采购的厂家B的单标签49-56接头混合文库的ESR结果,可见不同批次的单标签49-56接头混合文库的二链ESR有1批有明显提升,有2批没有明显提升。因此,确立了通过检测二链ESR是否提升来判断与二链测序引物杂交的接头序列5’端碱基合成的质量好坏。In the present invention, through a series of comparative investigation tests, it was found that the main reason why the ESR second chain is not improved (poor quality) is the influence of the joint synthesis quality. As shown in Figure 16, the above figure compares manufacturer A and manufacturer B The ESR results of the single-label 97 to single-label 104 mixed libraries of two different manufacturers show that the double-chain ESR of the single-label 97-104 connector mixed library of manufacturer A has been significantly improved (factory A in the figure 97-104 curve), while manufacturer B The single-chain 97-104 linker mixed library's second-strand ESR has not been significantly improved (factory B in the figure 97-104 curve). The following figure compares the ESR results of the single-label 49-56 connector mixed library of manufacturer B purchased from different batches (first batch, second batch, third batch). It can be seen that different batches of single-label 49-56 connector mixed The library's second-chain ESR has been significantly improved in 1 batch, and 2 batches have not been significantly improved. Therefore, it has been established whether the quality of the 5'base synthesis of the linker sequence hybridized with the second-strand sequencing primer is judged by detecting whether the second-strand ESR is improved.
以下通过实施例详细说明本发明的技术方案,应当理解,实施例仅是示例性的,不能理解为对本发明保护范围的限制。The technical solutions of the present invention are described in detail below by way of examples. It should be understood that the examples are only exemplary and cannot be understood as limiting the protection scope of the present invention.
实施例1:检测DNA纳米球(DNB)测序平台8个单标签接头的标签污染率和接头合成质量Example 1: Detection of the label contamination rate and joint synthesis quality of 8 single-label adapters of DNA nanosphere (DNB) sequencing platform
1.96个180bp模板DNA制备1.96 180bp template DNA preparation
1.1设计96套特异性扩增人基因组上涉及23对染色体的180bp片段,分别来自于96个基因,引物如SEQ ID NO:1~192所示,其中每两条序列组成扩增一个基因的引物对,例如,SEQ ID NO:1~2是扩增第1个基因的引物对,SEQ ID NO:3~4是扩增第2个基因的引物对,以此类推,SEQ ID NO:191~192是扩增第96个基因的引物对。扩增产物序列如SEQ ID NO:193~288所示,其中每条序列表示一个基因的扩增产物,其中SEQ ID NO:193~240表示1-48号基因的扩增产物序列,SEQ ID NO:241~288表示49-96号基因的扩增产物序列。1.1 Design 96 sets of specific amplification of 180bp fragments involving 23 pairs of chromosomes on the human genome, respectively from 96 genes, primers as shown in SEQ ID NO: 1 ~ 192, in which each two sequences constitute a primer to amplify a gene Yes, for example, SEQ ID NO: 1~2 are the primer pairs for amplifying the first gene, SEQ ID NO: 3~4 are the primer pairs for amplifying the second gene, and so on, SEQ ID NO: 191~ 192 is the primer pair for amplifying the 96th gene. The sequence of the amplification product is shown in SEQ ID NO: 193-288, where each sequence represents the amplification product of a gene, where SEQ ID NO: 193-240 represents the amplification product sequence of gene 1-48, SEQ ID NO : 241~288 represents the sequence of the amplification product of gene No.49-96.
1.2利用人源NA12878DNA作为模板,扩增96个180bp特异性产物。其中,扩增体系及程序如下表1和表2所示。1.2 Using human-derived NA12878 DNA as a template to amplify 96 180bp specific products. Among them, the amplification system and procedures are shown in Table 1 and Table 2 below.
表1扩增体系Table 1 Amplification system
Figure PCTCN2018120820-appb-000001
Figure PCTCN2018120820-appb-000001
表2扩增程序Table 2 Amplification procedures
Figure PCTCN2018120820-appb-000002
Figure PCTCN2018120820-appb-000002
Figure PCTCN2018120820-appb-000003
Figure PCTCN2018120820-appb-000003
1.3利用琼脂糖凝胶电泳鉴定扩增产物片段正确性及特异性:1.3 Use agarose gel electrophoresis to identify the correctness and specificity of the amplified product fragments:
1.3.1配制2.5%的TAE琼脂糖胶:胶的大小取决于需要检查的样品数量,通常每100mL 1×TAE缓冲液中加入2.5g的琼脂糖,加热煮沸至胶粉完全溶解;在温水浴中冷却2min后,加入2μL GelStain(全式金),轻轻混匀,倒入胶板里,放入宽孔胶梳后,常温放置20-30min待胶凝固便可以使用。1.3.1 Preparation of 2.5% TAE agarose gel: The size of the gel depends on the number of samples to be checked, usually 2.5g of agarose is added to each 100mL 1×TAE buffer, and heated to boiling until the powder is completely dissolved; in a warm water bath After 2 minutes of intermediate cooling, add 2 μL GelStain (full-style gold), mix gently, pour into the rubber plate, put in a wide-hole rubber comb, and leave it at room temperature for 20-30 minutes until the gel solidifies before it can be used.
1.3.2取10μL扩增后的DNA到点样板里,加入3μL的6×上样缓冲液,吹打混匀,全部点到胶孔里。1.3.2 Take 10 μL of amplified DNA into the spotting plate, add 3 μL of 6× loading buffer, mix by pipetting, and spot all into the gel well.
1.3.3电泳条件:150V,30min。1.3.3 Electrophoresis conditions: 150V, 30min.
1.3.4电泳完成后拍照并存储照片,部分琼脂糖凝胶电泳鉴定扩增产物结果如图17所示。1.3.4 After taking electrophoresis, take a photo and store the photo. The results of agarose gel electrophoresis to identify the amplified products are shown in Figure 17.
1.4 180bp特异性产物切胶纯化回收1.4 180bp specific product cut gel purification and recovery
1.4.1将配制回收胶用的胶板、胶板架、梳子及跑回收胶用的电泳槽用清水洗涤干净,再用纯水润洗2-3次,晾干待用。1.4.1 Wash the rubber plates, rubber plate racks, combs and electrophoresis tanks used to prepare the recycled rubber with clean water, then rinse with pure water for 2-3 times, and dry them for use.
1.4.2配制2.5%的TAE琼脂糖胶:胶的大小取决于需要检查的样品数量,通常每100mL1×TAE缓冲液中加入2.5g的琼脂糖(BIO-RAD Megabase Agrose),加热煮沸至胶粉完全溶解,溶液中不含有任何固体不溶物;在温水浴中冷却2min后,不加任何染料,倒入胶板里,放入宽孔胶梳,常温放置20-30min待胶凝固便可以使用,注意琼脂糖溶液中不要有气泡。1.4.2 Preparation of 2.5% TAE agarose gel: The size of the gel depends on the number of samples to be checked, usually 2.5g of agarose (BIO-RAD Megabase Agrose) is added to 100mL of 1×TAE buffer, and heated and boiled to the powder Completely dissolved, the solution does not contain any solid insolubles; after cooling in a warm water bath for 2min, without adding any dye, pour into the rubber plate, put in a wide-hole rubber comb, and let it stand at room temperature for 20-30min until the gel is solidified. Note that there are no bubbles in the agarose solution.
1.4.3取上一步扩增的25-30μL DNA产物到点样板里,加入6μL的6×溴酚黄,吹打混匀,全部点到胶孔里。1.4.3 Take 25-30 μL of DNA product amplified in the previous step to the spotting plate, add 6 μL of 6×bromophenol yellow, mix by pipetting, and then spot all into the wells.
1.4.4上样2μL 50bp ladder(Tiangen和NEB),点样孔为凝胶外侧的两个孔,远离回收样品点样孔,避免交叉污染。1.4.4 Load 2μL 50bp ladder (Tiangen and NEB). The spotting holes are the two holes on the outside of the gel, away from the spotting holes of the recovered sample to avoid cross contamination.
1.4.5电泳条件:100V,2h-2.5h,溴酚黄跑到胶底端即可。1.4.5 Electrophoresis conditions: 100V, 2h-2.5h, bromophenol yellow can run to the bottom of the gel.
1.4.6电泳结束,将琼脂糖胶放入EB染料的TAE中染色30min。1.4.6 After electrophoresis, put agarose gel in TAE of EB dye for 30min.
1.4.7准备切胶用品:UV防护眼镜、头套、鞋套、手套、切胶刀片、承装回收胶EP管和EP管架。1.4.7 Prepare rubber cutting supplies: UV protective glasses, headgear, shoe covers, gloves, rubber cutting blades, EP pipes and EP pipe racks for receiving recycled rubber.
1.4.8取一块干净的保鲜膜或PE手套,将染好的凝胶放在暗读器(Dark Reader)上切胶,在对应分子量180bp的位置用刀片切下目标条带,注意四个面,每切一个面换一个刀片,目 标条带为180bp,所在位置为150-200bp之间。1.4.8 Take a piece of clean plastic wrap or PE gloves, put the dyed gel on the dark reader (Dark Reader) and cut the glue, cut the target strip with a blade at the position corresponding to the molecular weight of 180bp, pay attention to the four sides , Change a blade for each side, the target band is 180bp, the location is between 150-200bp.
1.4.9将切下的胶块放入干净的EP管中,称重并计算出所切胶的重量,做好标记。1.4.9 Put the cut rubber block into a clean EP tube, weigh and calculate the weight of the cut rubber, and make a mark.
1.4.10用QIAquick Gel Extraction kit进行胶纯化回收。1.4.10 Use QIAquick Gel Extraction kit for gel purification and recovery.
1.4.11取1μL纯化后产物用Qubit HS定量检测,记录每个样品的浓度。1.4.11 Take 1μL of the purified product for quantitative detection with Qubit HS, and record the concentration of each sample.
1.5 180bp产物再扩增纯化1.5 The 180bp product is then amplified and purified
1.5.1取5ng的胶回收后产物进行再扩增纯化,反应体系及反应条件如下表3和表4所示:1.5.1 Re-amplify and purify the product recovered by taking 5ng of gum. The reaction system and reaction conditions are shown in Tables 3 and 4 below:
表3扩增体系Table 3 Amplification system
Figure PCTCN2018120820-appb-000004
Figure PCTCN2018120820-appb-000004
表4扩增程序Table 4 Amplification procedures
Figure PCTCN2018120820-appb-000005
Figure PCTCN2018120820-appb-000005
1.5.2取5μL扩增产物跑琼脂糖凝胶检测,部分琼脂糖凝胶电泳鉴定扩增产物结果如图18所示。1.5.2 Take 5μL of amplified product and run it on agarose gel for detection. The results of partial agarose gel electrophoresis to identify the amplified product are shown in Figure 18.
1.5.3提前至少30分钟将AmpureXP磁珠从冰箱中取出,室温平衡,充分混匀待用。1.5.3 Remove the AmpureXP magnetic beads from the refrigerator at least 30 minutes in advance, equilibrate at room temperature, and mix thoroughly to use.
1.5.4配制75%乙醇,可保存3天。1.5.4 Prepare 75% ethanol, which can be stored for 3 days.
1.5.5用KingFisher进行纯化,每管样品加入90μL AmpureXP磁珠。1.5.5 Purify with KingFisher, add 90μL of AmpureXP magnetic beads to each tube of sample.
1.5.6纯化后用30μL TE重悬磁珠,准备进行下一步反应或保存于-20℃。1.5.6 After purification, resuspend the magnetic beads with 30μL TE and prepare for the next reaction or store at -20℃.
1.5.7取1μL纯化后产物用Qubit HS定量检测,记录每个样品的浓度,由于每套引物的 扩增效率不同,若出现产物产量不足建库需求,重复步骤3扩增足量产物,每个产物浓度不小于1.67ng/μL,从而产物可以多次使用。所有96个产物中,产量相对较低的分别为#3、6、11、13、14、15、88和97,扩增是循环数可增加到22。1.5.7 Take 1μL of the purified product for quantitative detection with Qubit HS, and record the concentration of each sample. Due to the different amplification efficiency of each set of primers, if the product yield is insufficient to build a library, repeat step 3 to amplify a sufficient amount of product. The product concentration is not less than 1.67 ng/μL, so that the product can be used multiple times. Among all 96 products, the relatively low yields are #3, 6, 11, 13, 14, 15, 88 and 97, respectively, and the amplification is that the number of cycles can be increased to 22.
2.标签接头制备及接头退火操作2. Tag joint preparation and joint annealing operation
2.1寡核苷酸合成2.1 Oligonucleotide synthesis
合成如下表5所示的寡核苷酸序列,合成过程中避免寡核苷酸之间互相污染。Synthesize the oligonucleotide sequences shown in Table 5 below, and avoid contamination between the oligonucleotides during synthesis.
表5table 5
Figure PCTCN2018120820-appb-000006
Figure PCTCN2018120820-appb-000006
2.2寡核苷酸溶解2.2 Oligonucleotide dissolution
寡核苷酸合成后,进行如下溶解操作:After the oligonucleotide is synthesized, the following dissolution operation is performed:
2.2.1打开超净工作台电源,打开紫外灯,进行工作台、移液器、吸头、TE等待用物品的灭菌。2.2.1 Turn on the power of the ultra-clean workbench, turn on the ultraviolet lamp, and sterilize the workbench, pipette, pipette tip, and TE waiting items.
2.2.2核对:核对寡核苷酸合成单上的碱基序列、管子标签上的名称与订购的是否一致。2.2.2 Checking: Check whether the base sequence on the oligonucleotide synthesis sheet and the name on the tube label are the same as ordered.
2.2.3离心:4℃,12000rpm,离心10min;注意保证寡核苷酸粉末聚集在管底部,离心后轻拿轻放,防止粉末飘起。2.2.3 Centrifugation: 4℃, 12000rpm, centrifugation for 10min; pay attention to ensure that the oligonucleotide powder is gathered at the bottom of the tube. After centrifugation, gently handle it to prevent the powder from floating.
2.2.4关闭超净工作台的紫外灯,适当通风,在超净工作台里进行寡核苷酸的溶解:2.2.4 Turn off the ultraviolet lamp of the ultra-clean workbench, properly ventilate, and dissolve the oligonucleotide in the ultra-clean workbench:
a)关闭风机,小心开启管盖,注意不要让粉末飘出!打开一管寡核苷酸,根据管壁上的nmol数,加入(nmol数*10)μL的TE溶液,溶解粉末,成为终浓度为100nmol/μL(即100μM)的母液,盖紧管盖。a) Turn off the fan, carefully open the tube cover, and be careful not to let the powder float out! Open a tube of oligonucleotides, add (nmol number*10) μL of TE solution according to the number of nmol on the wall of the tube, dissolve the powder to become a mother liquor with a final concentration of 100 nmol/μL (ie 100 μM), and cap the tube tightly.
b)打开风机,通风约10秒后,再溶解下一管寡核苷酸。溶解每一管寡核苷酸之前,都需通风10秒左右,再溶解下一管。b) Turn on the fan, and after about 10 seconds of ventilation, dissolve the next tube of oligonucleotide. Before dissolving each tube of oligonucleotide, it needs to be ventilated for about 10 seconds, and then dissolve the next tube.
c)溶解后,充分震荡混匀,短暂离心后室温放置1小时。c) After dissolving, mix well by shaking, centrifuge briefly and leave at room temperature for 1 hour.
d)管盖标记清楚母液浓度100μM。d) The cap of the tube is clearly marked and the concentration of the mother liquor is 100 μM.
e)-20℃保存母液。e) Store mother liquor at -20℃.
2.3退火2.3 Annealing
2.3.1打开超净工作台电源,打开紫外灯,进行工作台、移液器、吸头、TE等待用物品的灭菌。2.3.1 Turn on the power of the ultra-clean workbench, turn on the ultraviolet lamp, and sterilize the workbench, pipette, pipette tip, and TE waiting items.
2.3.2从-20℃取出寡核苷酸母液后,待完全融化后,震荡混匀,快速离心。关闭超净工作台的紫外灯,通风,在超净工作台里,按照下表6,取等体积的顶链和底链的母液,2×接头缓冲液(表7)进行混合,室温放置1小时,管盖标记清楚浓度25μM。2.3.2 After removing the oligonucleotide mother liquor from -20°C, after completely melting, shake and mix well, and quickly centrifuge. Turn off the UV lamp of the ultra-clean workbench and ventilate. In the ultra-clean workbench, according to the following table 6, take the equal volume of the mother liquid of the top chain and the bottom chain, mix with 2× joint buffer (Table 7), and place at room temperature 1 At the end of the hour, the cap is clearly marked at a concentration of 25 μM.
表6Table 6
组分Component 用量Dosage
顶链(100μM)Top chain (100μM) 25μL25μL
底链(100μM)Bottom chain (100μM) 25μL25μL
接头缓冲液(2×)Adapter buffer (2×) 50μL50μL
总量Total 100μL100μL
表7 2×接头缓冲液配方Table 7 2× Connector Buffer Formula
试剂名称Reagent name 体积volume
5M NaCl5M NaCl 4μL4μL
1M Tris HCl1M TrisHCl 4μL4μL
2mM EDTA2mM EDTA 20μL20μL
water 172μL172μL
总量Total 200μL200μL
2.3.3稀释至10μM工作液:100μL 25μM的接头溶液加入150μL TE缓冲液,充分震荡混匀,快速离心。2.3.3 Dilute to 10μM working solution: add 100μL of 25μM adapter solution to 150μL TE buffer, mix thoroughly, and centrifuge quickly.
2.3.4-20℃保存工作液。2.3.4 Store working fluid at -20℃.
3.标签接头质量及污染率质检建库3. Build quality inspection library for label joint quality and pollution rate
3.1为检测标签501-508 8个标签接头,将96个180bp特异性产物按照1-96编号,均分成8组(编号分别为1-12,13-24,25-36,37-48,49-60,61-72,73-84,85-96),每组等质量混合之后,取50ng依次与标签501-508进行匹配建库。3.1 For detecting tags 501-508, 8 tag adapters, 96 specific products of 180bp are numbered according to 1-96, and are divided into 8 groups (numbers are 1-12, 13-24, 25-36, 37-48, 49 respectively) -60, 61-72, 73-84, 85-96), after the equal mass of each group is mixed, take 50ng to match the tags 501-508 in order to build the library.
3.2加A尾(A-tailing):3.2 Add A-tailing:
3.2.1提前将保持于-20℃的试剂取出并于冰上融解,缓冲液和ATP等试剂需要振荡充分混匀,低速离心。3.2.1 Take out the reagents kept at -20℃ in advance and thaw them on ice. The reagents such as buffer and ATP need to be shaken to mix well and centrifuge at low speed.
3.2.2提前5min左右准备末端修复混合液,如果反应数较少,可将dATP稀释至10mM,加1μL使用,反应混合物配制完成后置于常温,如果需要配制多个反应,照15%损耗进行配制,配制反应体系如下表8:3.2.2 Prepare the terminal repair mixture about 5 minutes in advance. If the number of reactions is small, dilute dATP to 10 mM and add 1 μL to use. After the preparation of the reaction mixture is completed, it is placed at room temperature. If multiple reactions need to be prepared, proceed at 15% loss. Preparation, preparation reaction system is shown in Table 8 below:
表8Table 8
组分Component 用量Dosage
water 23.9μL23.9μL
10×PNK buffer10×PNK buffer 4μL4μL
dATP(100mM)dATP(100mM) 0.1μL0.1μL
PNK(10U/μL)PNK(10U/μL) 1μL1μL
rTaq(5U/μL)rTaq(5U/μL) 1μL1μL
总量Total 30μL30μL
3.2.3在上一步骤10μL(体积不足用TE补充)180bp特异性产物中加入30μL配制好的末端修复混合液,混匀,离心。3.2.3 Add 30 μL of the prepared end-repair mixture to 10 μL of the previous step (the volume is insufficient to supplement with TE) 180 bp specific product, mix well, and centrifuge.
3.2.4将产物置于PCR仪中,反应条件如下表9:3.2.4 Place the product in the PCR instrument, the reaction conditions are as follows in Table 9:
表9Table 9
温度 temperature 时间time
37℃37℃ 30min30min
65℃65 15min15min
4℃4℃ 保持maintain
热盖105℃Hot cover 105℃  A
3.2.5反应后产物可进行下一步反应或保存于-20℃冰箱。3.2.5 After the reaction, the product can be subjected to the next reaction or stored in a -20℃ refrigerator.
3.3接头连接:3.3 Connector connection:
3.3.1将上一步产物40μL补水至50μL,与下面反应体系混匀,离心。3.3.1 Make up 40μL of the product from the previous step to 50μL, mix it with the following reaction system, and centrifuge.
3.3.2配制反应体系如下表10:3.3.2 Prepare the reaction system as shown in Table 10 below:
表10Table 10
组分Component 用量Dosage
加A后产物Product after adding A 50μL50μL
water 12.2μL12.2μL
10×PNK buffer10×PNK buffer 3μL3μL
ATP(100mM)ATP (100mM) 0.8μL0.8μL
50%PEG 800050% PEG8000 12μL12μL
T4DNA连接酶(600U/μL)T4DNA ligase (600U/μL) 1μL1μL
Ad153新标签(10μM)Ad153 new label (10μM) 1μL1μL
总量Total 80μL80μL
3.3.3将反应样品置入PCR仪中反应,反应条件如下表11:3.3.3 Place the reaction sample in the PCR instrument for reaction. The reaction conditions are as follows in Table 11:
表11Table 11
温度 temperature 时间time
23℃23 20min20min
4℃4℃ 保持maintain
热盖105℃Hot cover 105℃  A
3.3.4纯化:3.3.4 Purification:
a)反应结束后,向80μL连接产物中加入40μL AmpureXP磁珠,振荡混匀,室温静置10min;a) After the reaction is completed, add 40 μL of AmpureXP magnetic beads to 80 μL of ligation product, shake and mix, and let stand at room temperature for 10 min;
b)瞬时离心后置于磁力架,静置至液体澄清,小心吸弃上清;b) After instantaneous centrifugation, place it on a magnetic stand and let it stand until the liquid is clear. Carefully aspirate the supernatant;
c)向管中加入150μL 75%乙醇,静置30s吸弃上清,重复1次,用小量程的移液器尽可能弃掉残留的乙醇,室温晾干;c) Add 150 μL of 75% ethanol to the tube, let it stand for 30 s to aspirate the supernatant, repeat once, use a small-scale pipette to discard as much residual ethanol as possible, and dry at room temperature;
d)用23μL TE重悬磁珠,震荡混匀后室温结合5min,之后置于磁力架上至液体澄清,小心吸取23μL上清至PCR管中,准备进行下一步反应或保存于-20℃。d) Resuspend the magnetic beads with 23μL TE, mix by shaking for 5 minutes at room temperature, then place on a magnetic stand until the liquid is clear. Carefully draw 23μL of supernatant into a PCR tube, prepare for the next reaction or store at -20℃.
3.4PCR3.4PCR
3.4.1配制反应体系如下表12:3.4.1 Prepare the reaction system as shown in Table 12 below:
表12Table 12
组分 Component 用量Dosage
Kapa 2×HotStart ReadyMixKapa 2×HotStart ReadyMix 25μL25μL
AD153-PCR2-1(20μM)AD153-PCR2-1 (20μM) 2μL2μL
AD153-PCR2-2(20μM)AD153-PCR2-2 (20μM) 2μL2μL
总量Total 29μL29μL
3.4.2将29μL反应体系加入上步骤21μL连接产物中,混匀。3.4.2 Add 29 μL of the reaction system to the 21 μL ligation product in the previous step and mix.
3.4.3将反应样品置入PCR仪中反应,反应条件如下表13:3.4.3 Put the reaction sample into the PCR instrument to react. The reaction conditions are as follows in Table 13:
表13Table 13
Figure PCTCN2018120820-appb-000007
Figure PCTCN2018120820-appb-000007
Figure PCTCN2018120820-appb-000008
Figure PCTCN2018120820-appb-000008
3.4.4反应结束后,向50μL PCR产物中加入50μL Ampure XP磁珠进行纯化,32μL TE溶液进行回溶。3.4.4 After the reaction is completed, add 50 μL of Ampure XP magnetic beads to 50 μL of PCR product for purification, and 32 μL of TE solution for re-dissolution.
3.5PCR后的检测3.5 Detection after PCR
3.5.1定量:取1μL纯化后的产物用Qubit HS定量检测,记录每个样品的浓度。3.5.1 Quantification: Take 1 μL of purified product and use Qubit HS quantitative detection, record the concentration of each sample.
3.5.2琼脂糖电泳检测:3.5.2 Agarose electrophoresis detection:
质控标准:主带在250-300bp之间。如图19所示,主带上方会有一些模板自连产物,实验验证表明其不会对结果产生影响,环化消化后自连产物便会消失。Quality control standard: the main band is between 250-300bp. As shown in Figure 19, there will be some template self-linking products above the main band. Experimental verification shows that it will not affect the results, and the self-linking products will disappear after cyclization and digestion.
3.6热变性单链分离3.6 Heat denatured single chain separation
3.6.1根据96个PCR产物的浓度定量,每个等质量(每个取1.7ng)混合至160ng,用TE溶液补至48μL,加入5μL 10μM介导序列(splint oligo)。3.6.1 Quantify according to the concentration of 96 PCR products, each equal mass (each 1.7ng) mixed to 160ng, make up to 48μL with TE solution, add 5μL 10μM mediated sequence (splint oligo).
3.6.2将样品置于PCR仪反应,程序如下表14:3.6.2 Place the sample in the PCR instrument to react, the procedure is as follows in Table 14:
表14Table 14
温度 temperature 时间time
95℃95 3min3min
4℃4 10min10min
4℃4℃ 维持maintain
热盖105℃Hot cover 105℃  A
3.7环化3.7 cyclization
3.7.1提前5分钟准备反应混合液,配制如下表15所示:3.7.1 Prepare the reaction mixture 5 minutes in advance and prepare as shown in Table 15 below:
表15Table 15
组分 Component 用量Dosage
10×TA buffer10×TA buffer 6μL6μL
ATP(100mM)ATP (100mM) 0.6μL0.6μL
T4DNA连接酶(600U/μL)T4DNA ligase (600U/μL) 0.2μL0.2μL
总量Total 6.8μL6.8μL
3.7.2在样品中加入环化反应混合液6.8μL,混匀。3.7.2 Add 6.8 μL of cyclization reaction mixture to the sample and mix well.
3.7.3置于PCR仪上反应,其程序如下表16:3.7.3 Place the reaction on the PCR instrument, the procedure is as follows in Table 16:
表16Table 16
温度 temperature 时间time
37℃37 30min30min
4℃4℃ 保持maintain
热盖105℃Hot cover 105℃  A
3.8酶切消化3.8 Digestion
3.8.1提前5分钟准备反应混合液,Exo III酶先用储存缓冲液稀释10倍,配制如下表17:3.8.1 Prepare the reaction mixture 5 minutes in advance. ExoIII enzyme is first diluted 10 times with storage buffer, and prepared as shown in Table 17 below:
表17Table 17
组分Component 用量 Dosage
water
1μL1μL
10×TA buffer10×TA buffer 0.4μL0.4μL
Exo I(20U/μL)ExoI(20U/μL) 2μL2μL
Exo III(10U/μL,10×稀释)Exo III (10U/μL, 10× dilution) 0.7μL0.7μL
总量Total 4.1μL4.1μL
3.8.2置于PCR仪上反应,37℃反应30min。3.8.2 Put it on the PCR instrument and react at 37℃ for 30min.
3.8.3反应完成后,离心,向每个样品中加入3μL 0.5M EDTA,混匀,离心,如需保存可将反应终止样品置于-20℃冰箱。3.8.3 After the reaction is completed, centrifuge, add 3μL of 0.5M EDTA to each sample, mix well, and centrifuge. If you need to save, you can put the reaction-stopped sample in the -20℃ refrigerator.
3.8.4纯化:用168μL AmpureXP磁珠进行纯化,25μL TE溶液进行回溶;准备进行下一步反应或保存至-20℃冰箱。3.8.4 Purification: use 168μL AmpureXP magnetic beads for purification, and 25μL TE solution for re-dissolution; prepare for the next reaction or save to -20℃ refrigerator.
3.8.5单链环定量(ssCircle Quant)3.8.5 ssCircle Quant
将线性消化的单链环产物用Qubit单链分析试剂盒(Qubit ssDNA Assay Kit)定量。缓冲液与染料比例为199:1混匀后涡旋,并离心混合备用,取两份190μL稀释后染料工作液分别加入10μL的两种标准品涡旋并离心混合备用,取199μL稀释后染料工作液加入1μL样品,涡旋后并离心进行Qubit仪器定量。The linearly digested single-stranded loop products were quantified using the Qubit single-strand analysis kit (QubitssDNA Assay Kit). The buffer and dye ratio is 199:1, mix and vortex, and centrifuge to mix. Take two 190μL diluted dye working solution and add 10μL of two standard products to vortex and centrifuge to mix. Use 199μL diluted dye to work Add 1μL of sample to the solution, vortex and centrifuge to quantify by Qubit instrument
跑6%TBU-PAGE胶检测线性DNA是否消化干净。若跑胶结果如图20中的泳道1-泳道3,则表示线性消化完全,如图20中泳道4,如果还有线性DNA残留,建议重新环化。Run 6% TBU-PAGE gel to check whether the linear DNA is digested cleanly. If the gel run results are shown in lanes 1 to 3 in Figure 20, then the linear digestion is complete, as in lane 4 in Figure 20, if there is linear DNA residue, it is recommended to recirculate.
上述实施例中所用的试剂及物料如下表18所示:The reagents and materials used in the above examples are shown in Table 18 below:
表18Table 18
Figure PCTCN2018120820-appb-000009
Figure PCTCN2018120820-appb-000009
Figure PCTCN2018120820-appb-000010
Figure PCTCN2018120820-appb-000010
Figure PCTCN2018120820-appb-000011
Figure PCTCN2018120820-appb-000011
4.标签接头质量及污染率质检测序4. Quality inspection sequence of label joint quality and pollution rate
将混合的文库送测BGISEQ-500平台进行PE30+10策略的测序。The mixed library was sent to the BGISEQ-500 platform for sequencing by PE30+10 strategy.
5.信息分析5. Information analysis
用ESR(有效测序位点比率)和接头污染分析流程进行结果分析。The results were analyzed using ESR (Effective Sequencing Site Ratio) and linker contamination analysis procedures.
6.数据结果6. Data results
6.1ESR:6.1ESR:
如图21为混合文库的ESR结果,可见这批501-508标签接头的总体5’端合成质量良好,二链有提升。Figure 21 shows the ESR results of the mixed library. It can be seen that the overall 5'end synthesis quality of the batch of 501-508 tag adapters is good, and the second strand is improved.
如图22为所有待检8个标签接头构建文库取用6个FOV的数据时,ESR拆分结果。As shown in Figure 22, the ESR split results when constructing a library for all 8 tag adapters to be tested and taking 6 FOV data.
6.2接头污染率:6.2 Connector pollution rate:
分别统计了8组模板DNA(1-12,13-24,25-36,37-48,49-60,61-72,73-84,85-96)分别测到标签501、502、503、504、505、506、507、508的读长(reads)数,然后根据匹配对应关系,计算标签匹配率和标签污染率,结果如下表19所示。其中标签504的污染率高于1%,判定为不合格。Eight sets of template DNA (1-12, 13-24, 25-36, 37-48, 49-60, 61-72, 73-84, 85-96) were counted respectively, and tags 501, 502, 503, The number of reads in 504, 505, 506, 507, and 508, and then calculate the label matching rate and label contamination rate according to the matching correspondence. The results are shown in Table 19 below. The pollution rate of the label 504 is higher than 1%, and it is judged as unqualified.
表19Table 19
标签编号Label number 标签匹配率Label matching rate 标签污染率Label contamination rate
501501 99.97%99.97% 0.03%0.03%
502502 99.95%99.95% 0.05%0.05%
503503 99.96%99.96% 0.04%0.04%
504504 98.70%98.70% 1.30%1.30%
505505 99.96%99.96% 0.04%0.04%
506506 99.96%99.96% 0.04%0.04%
507507 99.86%99.86% 0.14%0.14%
508508 99.97%99.97% 0.03%0.03%
以上应用了具体个例对本发明进行阐述,只是用于帮助理解本发明,并不用以限制本发明。对于本发明所属技术领域的技术人员,依据本发明的思想,还可以做出若干简单推演、变形或替换。The above uses specific examples to explain the present invention, which is only used to help understand the present invention, and is not intended to limit the present invention. For those skilled in the technical field to which the present invention belongs, according to the idea of the present invention, several simple deductions, modifications, or replacements can also be made.

Claims (24)

  1. 一种标签序列的检测方法,其特征在于,所述方法包括:A label sequence detection method, characterized in that the method includes:
    使用一组模板序列与一组待测标签序列进行匹配建库得到一组标签文库,所述模板序列是扩增或人工合成的不同基因序列,不同的模板序列在序列上彼此不同,所述模板序列与上述待测标签序列具有一对一或多对一的对应关系;A set of template sequences is matched with a set of tag sequences to be tested to build a library to obtain a set of tag libraries. The template sequences are different gene sequences amplified or artificially synthesized. Different template sequences are different from each other in sequence. The template The sequence has a one-to-one or many-to-one correspondence with the tag sequence to be tested;
    对所述标签文库进行测序,得到每个标签文库的测序读长序列;Sequencing the tag library to obtain a sequence read long sequence of each tag library;
    将每个标签文库的测序读长序列与全部所述待测标签序列进行比对,统计比对到每个待测标签序列上的测序读长序列的数量;Compare the sequenced read long sequences of each tag library with all the tag sequences to be tested, and count the number of sequenced read long sequences aligned to each tag sequence to be tested;
    根据所述数量计算每个待测标签序列中含有其他标签序列的污染率。The contamination rate of each tag sequence to be tested containing other tag sequences is calculated according to the quantity.
  2. 根据权利要求1所述的检测方法,其特征在于,所述模板序列是从基因组DNA中扩增得到的不同基因序列。The detection method according to claim 1, wherein the template sequence is a different gene sequence amplified from genomic DNA.
  3. 根据权利要求1所述的检测方法,其特征在于,所述一组模板序列的数量是N,N=4X,X是大于等于1的整数;The detection method according to claim 1, wherein the number of the template sequence is N, N=4X, and X is an integer greater than or equal to 1;
    优选地,所述一组模板序列的数量是96个。Preferably, the number of the template sequence is 96.
  4. 根据权利要求1所述的检测方法,其特征在于,所述模板序列的大小是50-1000bp,优选180bp;The detection method according to claim 1, wherein the size of the template sequence is 50-1000bp, preferably 180bp;
    优选地,全部所述模板序列的大小相等。Preferably, the size of all the template sequences is equal.
  5. 根据权利要求1所述的检测方法,其特征在于,所述模板序列满足其5’端和3’端与测序读长相同的碱基序列范围内A、T、C、G任意一种碱基占比至少不低于可保证四种碱基信号平衡的碱基占比值。The detection method according to claim 1, characterized in that the template sequence satisfies any base A, T, C, G within the same base sequence range of the 5'end and the 3'end as the sequencing read length The ratio should be at least not lower than the base ratio that can ensure the balance of the four base signals.
  6. 根据权利要求5所述的检测方法,其特征在于,所述模板序列的5’端和3’端与测序读长相同的碱基序列范围是5’端和3’端20bp至200bp范围,优选30bp内。The detection method according to claim 5, characterized in that the 5'end and the 3'end of the template sequence have the same base sequence range as the read length of the sequencing is in the range of 20bp to 200bp at the 5'end and the 3'end, preferably Within 30bp.
  7. 根据权利要求5所述的检测方法,其特征在于,所述碱基占比值是10%至30%,优选15%。The detection method according to claim 5, characterized in that the base ratio is 10% to 30%, preferably 15%.
  8. 根据权利要求1所述的检测方法,其特征在于,所述模板序列的数量是所述待测标签序列的数量的N倍,N是大于等于1的整数,所述模板序列分成相当于所述待测标签序列的数量的亚组数,每亚组所述模板序列包含N个模板序列。The detection method according to claim 1, wherein the number of the template sequences is N times the number of the tag sequences to be tested, N is an integer greater than or equal to 1, and the template sequence is divided into The number of subgroups of the number of tag sequences to be tested, and the template sequence of each subgroup includes N template sequences.
  9. 根据权利要求1所述的检测方法,其特征在于,所述待测标签序列是标签接头和/或标签引物。The detection method according to claim 1, wherein the tag sequence to be tested is a tag adapter and/or a tag primer.
  10. 根据权利要求1所述的检测方法,其特征在于,所述待测标签序列是单标签接头。The detection method according to claim 1, wherein the tag sequence to be tested is a single tag connector.
  11. 根据权利要求10所述的检测方法,其特征在于,所述匹配建库包括:将所述模板序列与所述单标签接头按照一对一或多对一的对应方式进行接头连接,然后用通用引物进行PCR扩增得到用于上机的单标签文库。The detection method according to claim 10, characterized in that the matching library construction comprises: connecting the template sequence and the single tag connector in a one-to-one or many-to-one correspondence, and then using a universal The primers are amplified by PCR to obtain a single-tag library for computer use.
  12. 根据权利要求1所述的检测方法,其特征在于,所述待测标签序列是单标签引物。The detection method according to claim 1, wherein the tag sequence to be tested is a single tag primer.
  13. 根据权利要求12所述的检测方法,其特征在于,所述匹配建库包括:将所述模板序列与通用接头连接得到连接产物,然后与单标签引物按照一对一或多对一的对应方式进行PCR扩增得到用于上机的单标签文库。The detection method according to claim 12, wherein the matching library construction comprises: connecting the template sequence with a universal adaptor to obtain a ligation product, and then corresponding to a single-label primer in a one-to-one or many-to-one correspondence PCR amplification was performed to obtain a single-tag library for computer use.
  14. 根据权利要求1所述的检测方法,其特征在于,所述待测标签序列是标签接头和标 签引物组成的双标签序列。The detection method according to claim 1, wherein the tag sequence to be tested is a double tag sequence composed of a tag adaptor and a tag primer.
  15. 根据权利要求14所述的检测方法,其特征在于,所述匹配建库包括:将所述模板序列与所述标签接头按照一对一或多对一的对应方式进行接头连接,然后与所述标签引物按照一对一或多对一的对应方式进行PCR扩增得到用于上机的双标签文库。The detection method according to claim 14, wherein the matching library construction comprises: connecting the template sequence and the tag connector in a one-to-one or many-to-one correspondence, and then connecting the template sequence The tag primers are PCR amplified in a one-to-one or many-to-one correspondence to obtain a dual-tag library for computer use.
  16. 根据权利要求1所述的检测方法,其特征在于,所述待测标签序列是两个标签引物组成的双标签引物。The detection method according to claim 1, wherein the tag sequence to be tested is a double tag primer composed of two tag primers.
  17. 根据权利要求16所述的检测方法,其特征在于,所述匹配建库包括:将所述模板序列与通用接头连接得到连接产物,然后与所述双标签引物按照一对一或多对一的对应方式进行PCR扩增得到用于上机的双标签文库。The detection method according to claim 16, wherein the matching library construction comprises: connecting the template sequence with a universal linker to obtain a ligation product, and then following the one-to-one or many-to-one pairing with the double tag primer PCR amplification is performed in a corresponding manner to obtain a dual-tag library for computer use.
  18. 根据权利要求1所述的检测方法,其特征在于,所述待测标签序列是两个标签接头组成的双标签接头。The detection method according to claim 1, characterized in that the tag sequence to be tested is a double tag connector composed of two tag connectors.
  19. 根据权利要求18所述的检测方法,其特征在于,所述匹配建库包括:将所述模板序列与所述双标签接头按照一对一或多对一的对应方式连接得到连接产物,然后与通用引物PCR扩增得到用于上机的双标签文库。The detection method according to claim 18, wherein the matching library construction comprises: connecting the template sequence and the double-tag adaptor in a one-to-one or many-to-one correspondence to obtain a ligation product, and then The universal primer PCR was used to amplify the dual-tag library.
  20. 根据权利要求1所述的检测方法,其特征在于,所述测序是双末端测序。The detection method according to claim 1, wherein the sequencing is double-end sequencing.
  21. 根据权利要求20所述的检测方法,其特征在于,所述测序是PE30+10测序。The detection method according to claim 20, wherein the sequencing is PE30+10 sequencing.
  22. 根据权利要求1所述的检测方法,其特征在于,全部所述待测标签序列是同一批合成的全部标签序列。The detection method according to claim 1, wherein all the tag sequences to be tested are all tag sequences synthesized in the same batch.
  23. 根据权利要求1所述的检测方法,其特征在于,所述方法还包括根据所述测序读长序列获得二链(read2)的测序质量评估结果;以及The detection method according to claim 1, characterized in that the method further comprises obtaining a second strand (read2) sequencing quality evaluation result based on the sequencing read long sequence; and
    任选地,通过所述评估结果来间接判断所述标签文库或所述标签文库的混合标签文库与二链测序相关接头的5’端碱基合成质量。Optionally, the 5'terminal base synthesis quality of the tag library or the mixed tag library of the tag library and the second-strand sequencing-related linker is indirectly judged by the evaluation result.
  24. 根据权利要求1所述的检测方法,其特征在于,所述模板序列通过SEQ ID NO:1~192所示的96对引物对扩增人基因组得到。The detection method according to claim 1, wherein the template sequence is obtained by amplifying a human genome with 96 primer pairs shown in SEQ ID NO: 1 to 192.
PCT/CN2018/120820 2018-12-13 2018-12-13 Tag sequence detection method WO2020118596A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880099610.8A CN113168889B (en) 2018-12-13 2018-12-13 Method for detecting tag sequence
PCT/CN2018/120820 WO2020118596A1 (en) 2018-12-13 2018-12-13 Tag sequence detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/120820 WO2020118596A1 (en) 2018-12-13 2018-12-13 Tag sequence detection method

Publications (1)

Publication Number Publication Date
WO2020118596A1 true WO2020118596A1 (en) 2020-06-18

Family

ID=71075863

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/120820 WO2020118596A1 (en) 2018-12-13 2018-12-13 Tag sequence detection method

Country Status (2)

Country Link
CN (1) CN113168889B (en)
WO (1) WO2020118596A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115197999A (en) * 2022-07-15 2022-10-18 纳昂达(南京)生物科技有限公司 Method and device for synthesizing crosstalk by quality control double-end unique tag connector
EP3998343A4 (en) * 2020-08-19 2022-11-02 Nanodigmbio (nanjing) Biotechnology Co., Ltd Double-ended library label composition and application thereof in mgi sequencing platform
CN116004763A (en) * 2022-07-19 2023-04-25 纳昂达(南京)生物科技有限公司 Selection verification and quality control method for combined joint
CN116287161A (en) * 2021-12-31 2023-06-23 安诺优达基因科技(北京)有限公司 Oligonucleotide sequence consistency detection method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101921841A (en) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 HLA (Human Leukocyte Antigen) gene high-resolution genotyping method based on Illumina GA sequencing technology
CN106021987A (en) * 2016-05-24 2016-10-12 人和未来生物科技(长沙)有限公司 Ultra-lower frequency clustering and grouping algorithm for mutant peptide labels
CN106755454A (en) * 2017-01-06 2017-05-31 杭州杰毅麦特医疗器械有限公司 A kind of molecular label nucleic acid detection method
CN108932401A (en) * 2018-06-07 2018-12-04 江西海普洛斯生物科技有限公司 It is a kind of be sequenced sample identification method and its application

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013130512A2 (en) * 2012-02-27 2013-09-06 The University Of North Carolina At Chapel Hill Methods and uses for molecular tags

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101921841A (en) * 2010-06-30 2010-12-22 深圳华大基因科技有限公司 HLA (Human Leukocyte Antigen) gene high-resolution genotyping method based on Illumina GA sequencing technology
CN106021987A (en) * 2016-05-24 2016-10-12 人和未来生物科技(长沙)有限公司 Ultra-lower frequency clustering and grouping algorithm for mutant peptide labels
CN106755454A (en) * 2017-01-06 2017-05-31 杭州杰毅麦特医疗器械有限公司 A kind of molecular label nucleic acid detection method
CN108932401A (en) * 2018-06-07 2018-12-04 江西海普洛斯生物科技有限公司 It is a kind of be sequenced sample identification method and its application

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3998343A4 (en) * 2020-08-19 2022-11-02 Nanodigmbio (nanjing) Biotechnology Co., Ltd Double-ended library label composition and application thereof in mgi sequencing platform
CN116287161A (en) * 2021-12-31 2023-06-23 安诺优达基因科技(北京)有限公司 Oligonucleotide sequence consistency detection method
CN115197999A (en) * 2022-07-15 2022-10-18 纳昂达(南京)生物科技有限公司 Method and device for synthesizing crosstalk by quality control double-end unique tag connector
CN115197999B (en) * 2022-07-15 2024-01-23 纳昂达(南京)生物科技有限公司 Method and device for synthesizing crosstalk by quality control double-end unique tag connector
CN116004763A (en) * 2022-07-19 2023-04-25 纳昂达(南京)生物科技有限公司 Selection verification and quality control method for combined joint
CN116004763B (en) * 2022-07-19 2024-02-09 纳昂达(南京)生物科技有限公司 Selection verification and quality control method for combined joint

Also Published As

Publication number Publication date
CN113168889B (en) 2023-04-04
CN113168889A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2020118596A1 (en) Tag sequence detection method
CN108300716B (en) Linker element, application thereof and method for constructing targeted sequencing library based on asymmetric multiplex PCR
CN104372093B (en) A kind of SNP detection method based on high-flux sequence
CN104531883B (en) The detection kit and detection method of PKD1 gene mutations
CN111440896B (en) Novel beta coronavirus variation detection method, probe and kit
CN107541791A (en) Construction method, kit and the application in plasma DNA DNA methylation assay library
WO2012068919A1 (en) Dna library and preparation method thereof, and method and device for detecting snps
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN108251504A (en) A kind of method and kit of supper-fast structure genomic DNA sequencing library
CN103173441A (en) Amplification method, primer, sequencing method and mutation detection method of mitochondria whole genome DNA (Deoxyribonucleic Acid)
KR101406720B1 (en) Method for designing fusion primer used in next generation sequencing, and method for analyzing genotype of target gene using next generation sequencing and fusion primer
WO2023284768A1 (en) Fusion primer direct amplification method-based human mitochondrial whole genome high-throughput sequencing kit
CN110656157A (en) Quality control product for tracing high-throughput sequencing sample and design and use method thereof
CN111676325A (en) Primer combination for detecting SARS-CoV-2 whole genome and application method
CN115989544A (en) Method and system for visualizing short reads in repetitive regions of a genome
CN111748606A (en) Method and kit for quickly constructing plasma DNA sequencing library
WO2020232635A1 (en) Method and system for constructing sequencing library on the basis of methylated dna target region, and use thereof
CN107002150B (en) High-throughput detection method for DNA synthesis product
CN109825552A (en) A kind of primer and method for being enriched with to target area
US20230235320A1 (en) Methods and compositions for analyzing nucleic acid
US20230340609A1 (en) Cancer detection, monitoring, and reporting from sequencing cell-free dna
He et al. An efficient and accurate droplet digital PCR method for rapid transgene copy number detection and homozygous identification in cotton (Gossypium hirsutum)
KR20210079309A (en) Barcoding of Nucleic Acids
Gao et al. HITAC-seq enables high-throughput cost-effective sequencing of plasmids and DNA fragments with identity
WO2023201487A1 (en) Adapter, adapter ligation reagent, kit, and library construction method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18943100

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18943100

Country of ref document: EP

Kind code of ref document: A1