CN111020005A - Method and system for improving database building success rate of next generation sequencing - Google Patents

Method and system for improving database building success rate of next generation sequencing Download PDF

Info

Publication number
CN111020005A
CN111020005A CN201911277727.3A CN201911277727A CN111020005A CN 111020005 A CN111020005 A CN 111020005A CN 201911277727 A CN201911277727 A CN 201911277727A CN 111020005 A CN111020005 A CN 111020005A
Authority
CN
China
Prior art keywords
library
sequence
linker
next generation
bases
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911277727.3A
Other languages
Chinese (zh)
Other versions
CN111020005B (en
Inventor
郭志伟
李英辉
陈倩
胡荣君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jundi Gene Technology Co ltd
Original Assignee
Shanghai Jundi Gene Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jundi Gene Technology Co ltd filed Critical Shanghai Jundi Gene Technology Co ltd
Priority to CN201911277727.3A priority Critical patent/CN111020005B/en
Publication of CN111020005A publication Critical patent/CN111020005A/en
Application granted granted Critical
Publication of CN111020005B publication Critical patent/CN111020005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B80/00Linkers or spacers specially adapted for combinatorial chemistry or libraries, e.g. traceless linkers or safety-catch linkers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Biochemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Immunology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a method for improving the success rate of the second-generation sequencing library construction, which comprises the following steps: 1) contacting a linker comprising a molecular tag with a single-stranded template to obtain a ligation product; 2) and contacting the library-expanding primer with the ligation product to obtain a library-building product. Wherein the linkage of the linker to the single-stranded template is a single-stranded single-ended linkage, and each of the molecular tags comprises a non-contiguous random sequence, the random sequences being separated by a fixed sequence. In another aspect, the invention provides a system for increasing library success in secondary sequencing, comprising an oligonucleotide linker adapted to be spaced apart from a random sequence by a fixed sequence as provided in the first aspect of the invention. The method and the system for improving the success rate of the second-generation database building have the advantages of unique design, controllable cost, clear application scene and stable performance, realize the effect far exceeding the prior art under the same cost condition for the second-generation database building with single-chain single-end connection, and have wide application prospect and popularization value.

Description

Method and system for improving database building success rate of next generation sequencing
Technical Field
The invention relates to the field of molecular biology, in particular to a method and a system for improving the library construction success rate of next generation sequencing.
Background
The next generation sequencing technology NGS was born in the 90 s of the last century and has been on the market for more than a decade. The technology can realize sequencing of thousands of DNA template molecules to be detected at the same time, increases the efficiency and flux of sequencing reaction, provides higher and higher sequencing speed and allows larger sequencing depth. However, since sequencing accuracy and sensitivity are affected by various sources such as sample imperfections, PCR at the amplification stage, and sequencing noise and errors, increasing the depth of sequencing alone does not ensure that allele sequences with very low frequency, such as free dna (cfdna) sequences in plasma, circulating tumor dna (ctdna) sequences, sequences in foreign microbial subclone mutations, etc., are detected.
Based on the need to suppress sequencing inaccuracies due to various sources of error in determining the sequence of small and/or low allele frequencies of DNA molecules, more and more NGS manufacturers choose to use unique molecular markers (UMIs) for determining to reduce background noise and correct sequencing errors. In the existing NGS technology, the mainstream application method of UMI is to embed 6-10 continuous random base sequences as molecular tags into linker sequences at two ends of a fragment to be sequenced. However, in the scenario of single-stranded ligated primers, due to the large number of adaptors, a significant number of random sequences will always form a high degree of even complete pairing with the library-amplifying primers, resulting in PCR reactions that generate large amounts of non-target PCR products and affect the yield of library construction.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, one aspect of the present invention is to provide a method for improving the pooling success rate of next generation sequencing. Which comprises the following steps:
1) contacting a linker comprising a molecular tag with a single-stranded template to obtain a ligation product;
2) contacting the library-expanding primer with the ligation product to obtain a library-building product;
wherein the linker is attached to the single-stranded template in a single-stranded single-ended attachment, and each of the molecular tags comprises a non-contiguous random sequence.
In another aspect, the present invention provides a system for increasing the pooling power of next generation sequencing, which comprises an oligonucleotide adaptor suitable for use in the method for increasing the pooling power of next generation sequencing provided by the first aspect of the present invention.
Detailed Description
The applicant provides a method for improving the success rate of next generation sequencing library building through a large amount of exploratory researches, the method is simple in design and excellent in performance, and particularly has a good library building effect on a next generation sequencing sample of single-chain single-end connection library building, and the method is completed on the basis.
The invention relates to a method for improving the library construction success rate of next generation sequencing. Which comprises the following steps:
1) contacting a linker comprising a molecular tag with a single-stranded template to obtain a ligation product;
2) contacting the library-expanding primer with the ligation product to obtain a library-building product;
wherein the linkage of the linker to the single-stranded template is a single-stranded single-ended linkage, each of the molecular tags comprising a non-contiguous random sequence.
In the method for improving the library building power of next generation sequencing provided by the invention, a plurality of adaptors are used for single-stranded connection, and a plurality of molecular tags are correspondingly used, wherein each molecular tag is an oligonucleotide sequence of a single molecule which can be used for identifying a single-stranded DNA fragment in the sample. Each molecular tag contains a random sequence that is discontinuous, non-contiguous. The molecular tag also comprises a fixed sequence corresponding to a plurality of linkers or molecular tags, and the fixed sequence is correspondingly a plurality of linkers or molecular tags. For a single molecular tag of a single linker, the random sequence is spaced apart from the fixed sequence. The number of the segments of the random sequence is a plurality of at least two segments; the number of segments of the fixed sequence is singular or plural, and is at least one segment.
In some embodiments of the invention, each fixed sequence comprises 1-4 bases, at least 1 base, and at most 4 bases. It is not said that a single fixed sequence with more than 4 bases cannot be used for the molecular tag provided by the present invention, but the applicant has found through a lot of experiments that the molecular tag constructed by spacing random sequences with a single fixed sequence with more than 4 bases cannot achieve a better effect of identifying a single molecule than the molecular tag constructed by spacing random sequences with a single fixed sequence with 1-4 bases, but rather, the molecular tag sequence may be too long to raise the cost, and thus the clinical application value is lacking.
In some embodiments of the present invention, when the number of bases included in two adjacent fixed sequences separated by the random sequence in the molecular tag is 1, the two adjacent fixed sequences include bases different from each other; that is, the fixed sequences of two single bases separated by a random sequence may not both be A, or T, C, G. For example, a fixed sequence separated by a random sequence NN does not form a sequence such as ann in the molecular tag. This is because, the applicant found through a lot of experiments that when the fixed sequences of two single bases spaced by a random sequence in the molecular tag are the same, the effect of identifying individual molecules cannot be better than that of the molecular tag of continuous random sequence.
In some embodiments of the invention, when the single fixed sequence in the molecular tag used comprises 2 to 4 bases, the single fixed sequence is not a repeat of a single base in four bases; for example, a length 2 fixed sequence may not be an AA, and a length 4 fixed sequence may not be a CCCC. This is because, the applicant found experimentally that when the single-fixed sequence in the molecular tag is a repeat of a single base in four bases, for example, when the single-fixed sequence is AA, TTT, or CCCC, the single-fixed sequence cannot identify the single molecule more optimally than the molecular tag of continuous random sequence.
In some embodiments of the invention, the fixed sequences separated by random sequences occur 1-7 times in the molecular tag. In general, the fewer the number of bases in a single fixed sequence, the more frequently the fixed sequence occurs in the molecular tag. For example, when the number of bases contained in a single-segment fixed sequence in the molecular tag used for the linker is only 1, the single-segment fixed sequence has a higher frequency of 7 occurrences in the molecular tag of the linker provided by the present invention than the single-segment fixed sequence having a number of bases ranging from 2 to 4; when the number of bases contained in the fixed sequence in the molecular tag used for the linker is 4, the frequency of the fixed sequence in the molecular tag may be only 1-2 times, which is to ensure that the molecular tag sequence is not excessively long under the condition of ensuring the effect of distinguishing individual molecules, thereby causing significant increase in cost and other problems.
In some embodiments of the invention, each of the random sequences used to space the fixed sequences comprises the same number of bases ranging from 1 to 4, and the total number of bases in the random sequences ranges from 8 to 12. Similarly, the fixed sequence is arranged such that the smaller the number of bases contained in a single random sequence, the higher the frequency of occurrence of the single random sequence in the molecular tag, and vice versa, but the total number of bases in the random sequence does not exceed the range of 8-12 for the continuous random sequence commonly used in the prior art. The applicant finds through a large number of experiments that the random sequence with the total length of 8-12 bases, which is separated by 1-7 single-segment fixed sequences with the length of 1-4 bases, can reduce the background value of a joint in the library building process to the maximum extent, and achieves the purposes of efficiently identifying single molecules in an original sample and not causing the cost for synthesizing the molecular label provided by the invention to be obviously higher than that of a molecular label formed by continuous random sequences.
In some embodiments of the invention, the continuous pairing of the adaptor comprising the molecular tag in step 1) with the last ten bases at the 3' end of the library primer in step 2) is no more than 8, further ensuring the ability of the invention to use fixed sequences to space adaptors of random sequences in the molecular tag. If 8-10 continuous random sequences are used as molecular tags of the adapters, the adapters and 8 continuous bases at the 3' end of the library amplification primer are easy to form pairing, so that a large amount of adapter-primer dimers are formed, the background of library construction is greatly increased, and further the library construction efficiency is reduced or the library construction is directly failed under the more serious condition. Because the function of the library amplification primer is to amplify the ligation product obtained in the step 1) for sequencing, the sequence of the library amplification primer is limited by each sequencing platform and is hardly changed, in order to avoid forming a dimer with the library amplification primer to obstruct library construction, the solution of the applicant is to precisely design a linker sequence, so that the linker sequence can meet the requirement of identifying a single molecule in a sample under the condition of single-stranded single-ended ligation, and does not form pairing with the 3' end of the library amplification primer as much as possible, so that primer dimer is prevented from being formed to the greatest extent, and the library construction success rate is ensured.
In some embodiments of the present invention, in order to test the feasibility of the preliminarily designed molecular tagged adaptors in the library amplification stage, the method further comprises performing background detection on the adaptors by using the library primers in step 2). Selecting the linker comprising the molecular tag when the background value of the linker is below a threshold; when the background of the linker is above a threshold, the linker comprising the molecular tag is discarded. Wherein for the setting of the threshold, at least the background value of the joints comprising the molecular tags consisting of the continuous random sequences in the library-creating products is lower, and then the joints with the relatively lower background value are selected from the joints meeting the condition that the background value is lower than the threshold value for creating the library.
In some embodiments of the present invention, in order to check the ligation effect of the adaptor, the ligation product is also subjected to quality inspection by using a quantitative PCR method after the ligation is completed in step 1). The applicant finds out through a large number of experiments that the joint of the molecular label formed by the continuous random sequence can cause higher PCR quantification in different degrees in the quality inspection stage, and the quality inspection accuracy is influenced. The joint containing the molecular tags spaced by the fixed sequence and the random sequence provided by the invention is not easy to form a dimer with a quality detection primer, the PCR quantification is accurate, the quality detection effect is stable, and the smooth proceeding of the subsequent reaction is ensured.
In another aspect, the present invention provides a system for increasing secondary pooling success rate, comprising an oligonucleotide linker suitable for the molecular tag sequence of the method for increasing secondary pooling success rate provided by the present invention. The method for improving the second generation library success rate is applicable to a plurality of molecular labels, and a plurality of corresponding oligonucleotide linkers. Wherein the design rules for the molecular tags and other parts of the sequence of the oligonucleotide linker have been described in detail in the first aspect of the invention and are not described in detail herein.
In some embodiments of the invention, the oligonucleotide linkers provided herein comprise, in addition to the molecular tag sequences used to distinguish individual molecules, sample tag sequences used to distinguish individual samples and sequencing universal sequences that match the sequencing platform. In some embodiments, the 5' end of the oligonucleotide linker provided herein is modified, wherein the modification is a phosphorylation modification. In some embodiments of the invention, the oligonucleotide linker has a cohesive end or is a single stranded structure at the temperature of the ligation reaction, which is a single stranded ligation.
The invention has the advantages that firstly, the molecular label with the interval of the fixed sequence and the random sequence is used, the synthesis cost of the joint is not obviously increased, the cost is almost the same as that of the molecular label consisting of 8-12 random bases, and the synthesis difficulty and the use difficulty are not obviously increased; secondly, the molecular tags with the fixed sequences and the random sequences spaced are used, so that the probability that the adaptor and the library amplification primer or the quality inspection primer form a dimer to generate a non-target library amplification product or a non-target quality inspection product is greatly reduced, the library amplification and quality inspection effects are guaranteed, and the library construction success rate is remarkably improved; and thirdly, because the application scene is single-chain single-end connection, only one end of each connection product is connected with the joint, the accuracy requirement of the molecular label for identifying the single molecule is higher, the molecular label with the fixed sequence and the random sequence spaced from each other greatly improves the specificity of the single-end joint for identifying the single molecule, and further ensures the library construction success rate. In conclusion, the method and the system for improving the success rate of the second-generation database building, provided by the invention, have the advantages of unique design, controllable cost, clear application scene, stable performance and group drawing effect, realize the effect far exceeding the prior art under the condition of the same cost for the second-generation database building with single-chain single-end connection, and have wide application prospect and popularization value.
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Example 1.
The purpose of the experiment is to screen UMI joints with different sequence designs suitable for a Proton sequencing platform by using a PCR detection system.
The experimental scheme is as follows: and respectively using a ligation product detection PCR system and an amplification PCR system to perform background detection on the UMI joints designed by various sequences, and comprehensively screening the UMI joints with background values meeting requirements in the two systems.
Main materials and reagents:
the oligonucleotide sequences used in this example are shown in Table 1 and were synthesized by Shanghai Bailey Biotechnology, Inc., where the sequence UMI is underlined in italics. The 5' end of the UMI joint 1-5 is modified by phosphorylation.
TABLE 1
Figure BDA0002310753860000031
Figure BDA0002310753860000041
Other materials and reagents:
Probe Qpcr Master Mix(Toroivd);
Realtime PCR Master Mix(TOYOBO);
trP1-1 primer (Shanghai Bailegg Biotech Co., Ltd.);
F858-SL1 primer (Shanghai Bailegg Biotech, Inc.);
a2, A3, A6 primer (Shanghai Bailegg Biotech Co., Ltd.);
probe MGB1-A2P, MGBA (Shanghai Bailey Biotech Co., Ltd.);
the calibrator used in the present example was constructed according to the conventional method for molecular cloning.
The main equipment is quantitative PCR instrument (Applied Biosystems 7300 PLUS).
The experimental method comprises the following steps:
the linkers in Table 1 were diluted to 2nM and PCR was loaded to 2. mu.L for background detection, and the PCR systems are shown in tables 2-6, respectively.
Figure BDA0002310753860000042
Figure BDA0002310753860000051
TABLE 6
Figure BDA0002310753860000052
The PCR procedure is shown in Table 7.
TABLE 7
Figure BDA0002310753860000053
The experimental results are as follows:
the results of the experiments are shown in tables 8-12.
TABLE 8 ligation detection PCR System 1
Figure BDA0002310753860000054
Figure BDA0002310753860000061
TABLE 9 ligation detection PCR System 2
Sample(s) CT value Number of copies of assay results
Calibrator S5 18.9 200000
Calibrator S4 22.2 20000
Calibrator S3 25.7 2000
Calibrator S2 29.1 200
Calibrator S1 32.7 20
Blank control UD /
non-UMI linker UD /
Continuous random UMI linker 1 30.5 82.3
UMI joint 1 according to the invention 32.8 17.7
Inventive UMI joint 2 UD
UMI joint 3 of the invention 29.3 183.4
UMI joint 4 of the invention UD
UMI joint 5 of the present invention 37.2 0.9
TABLE 10 library PCR System 1
Sample(s) CT value Number of copies of assay results
Calibrator S5 18.9 200000
Calibrator S4 22.1 20000
Calibrator S3 25.6 2000
Calibrator S2 28.8 200
Calibrator S1 32.1 20
Blank control UD /
non-UMI linker 31.1 40.7
Continuous random UMI linker 1 28.0 351.4
UMI joint 1 according to the invention 24.6 3740.5
UMI joint 2 of the present invention 29.6 115.4
UMI joint 3 of the invention 32.9 11.6
UMI joint 4 of the invention 29.2 152.5
UMI joint 5 of the present invention 33.3 8.8
TABLE 11 library PCR System 2
Figure BDA0002310753860000062
Figure BDA0002310753860000071
TABLE 12 library PCR System 3
Sample(s) CT value Number of copies of assay results
Calibrator S5 18.2 200000
Calibrator S4 21.6 20000
Calibrator S3 24.8 2000
Calibrator S2 28.2 200
Calibrator S1 32.0 20
Blank control UD /
non-UMI linker 35.2 2.0
Continuous random UMI linker 1 24.3 3119.0
UMI joint 1 according to the invention 21.5 20545.9
UMI joint 2 of the present invention 26.2 867.9
UMI joint 3 of the invention 28.9 140.9
UMI joint 4 of the invention 24.9 2082.4
UMI joint 5 of the present invention 30.6 44.9
And (4) conclusion:
the results in tables 8 to 12 correspond to the PCR systems in tables 2 to 6, respectively. As can be seen from the copy numbers of the test results in tables 8-12, the background of non-UMI linkers was low in all PCR systems, but the pooling requirement for distinguishing individual molecules was not met. The traditional UMI joint with continuous random sequences has low background value in a PCR system for detecting a connecting product, but has high background value in a PCR system for amplifying a library, can generate a large amount of non-library-building products, and can not meet the library building requirement. Several UMI linkers designed by the method provided by the invention have low and high background values, but the UMI linker 5(SEQ ID NO:7) of the invention has lower background values in all systems under the condition of using various different detection primers or library amplification primers, is greatly lower than the background value of UMI linkers of continuous random sequences, is similar to the background value of non-UMI linkers, can realize the purpose of not forming a large amount of linker-primer dimers under the condition of distinguishing single molecules in a sample, and meets the requirement of the invention for improving the library construction success rate of next generation sequencing.
Example 2
The purpose of the experiment is to screen UMI joints suitable for different sequence designs of an Illumina sequencing platform by using a PCR detection system.
The experimental scheme is as follows: and respectively using a ligation product detection PCR system and an amplification PCR system to perform background detection on the UMI joints designed by various sequences, and comprehensively screening the UMI joints with background values meeting requirements in the two systems.
Main materials and reagents:
the oligonucleotide sequences used in this example are shown in Table 13 and were synthesized by Shanghai Bailey Biotechnology, Inc., wherein the sequence UMI is underlined in italics. The 5' end of the UMI joint 6-9 is modified by phosphorylation.
Watch 13
Figure BDA0002310753860000081
Other materials and reagents:
Realtime PCR Master Mix(TOYOBO);
F4-SP1 primer (Shanghai Bailegg Biotech, Inc.);
F858-SL1 primer (Shanghai Bailegg Biotech, Inc.);
P7-AMP primer (Shanghai Bailegg Biotech Co., Ltd.);
probe MGB-iSP2 (shanghai baili biotechnology limited);
the calibrator used in this example was constructed according to the conventional method of molecular cloning.
The main equipment is quantitative PCR instrument (Applied Biosystems 7300 PLUS).
The experimental method comprises the following steps:
the adaptors in Table 13 were diluted to 2nM and then PCR loaded at 2. mu.L for background detection, with ligation detection PCR system as shown in Table 14 and library PCR system as shown in Table 15.
TABLE 14
Figure BDA0002310753860000082
Watch 15
Figure BDA0002310753860000091
The PCR procedure is shown in Table 16.
TABLE 16
The experimental results are as follows:
Figure BDA0002310753860000092
the results are shown in tables 17 and 18.
TABLE 17 ligation detection PCR System
Sample(s) CT value Number of copies of assay results
Calibrator S5 19.47 200000
Calibrator S4 22.87 20000
Calibrator S3 26.05 2000
Calibrator S2 29.30 200
Calibrator S1 32.67 20
NTC UD UD
Continuous random UMI linker 2 24.51 5981.6
UMI joint 6 of the invention 36.17 1.7
UMI joint 7 according to the invention 33.89 8.3
UMI joint 8 of the present invention 31.21 54.5
UMI joint 9 of the invention 32.44 23.0
TABLE 18 ligation detection PCR System
Figure BDA0002310753860000093
Figure BDA0002310753860000101
And (4) conclusion:
the results in tables 17 and 18 correspond to the PCR systems in tables 14 and 15, respectively. non-UMI linkers do not meet the second-generation sequencing pooling requirements of the present invention for distinguishing individual molecules in a sample, and are not included in the alternative linkers of this example. As can be seen from the copy numbers of the detection results in tables 17 and 18, the UMI linker of the conventional continuous random sequence has higher background values in both the ligation product detection system and the library amplification system, and does not meet the library building requirement of the present invention. The UMI joint background of several random sequences which are designed according to the thought of the invention and are spaced by fixed sequences is greatly reduced, particularly the UMI joint 6(SEQ ID NO:15) of the invention only detects 1.7 copies in a PCR detection system of a connection product, and even does not detect a background value in a PCR system of an amplification library, so that the aim of hardly forming a joint-primer dimer under the condition of distinguishing single molecules in a sample can be realized, and the invention meets the requirement of improving the library building power of next generation sequencing.
Sequence listing
<110> Shanghai Zhendedi Gen-Tech Co., Ltd
<120> method and system for improving library formation power of next generation sequencing
<160>22
<170>SIPOSequenceListing 1.0
<210>1
<211>44
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
gatcgttctc cttactgagt cggagacacg cagggatgag atgg 44
<210>2
<211>62
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>2
gtcnnnnnnn natcgttctc cttactgagt cggagacacg cagggatgag atggctgcag 60
ag 62
<210>3
<211>66
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>3
gtcnntnnca nncnnatcgt tctccttact gagtcggaga cacgcaggga tgagatggct 60
gcagag 66
<210>4
<211>69
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
gtcntncnan cnangntnat cgttctcctt actgagtcgg agacacgcag ggatgagatg 60
gctgcagag 69
<210>5
<211>66
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
gtcnnntcnn ancnnatcgt tctccttact gagtcggaga cacgcaggga tgagatggct 60
gcagag 66
<210>6
<211>67
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
gtcnnacnng tnnannatcg ttctccttac tgagtcggag acacgcaggg atgagatggc 60
tgcagag 67
<210>7
<211>68
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>7
gtcnnacnna anngtnnatc gttctcctta ctgagtcgga gacacgcagg gatgagatgg 60
ctgcagag 68
<210>8
<211>22
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>8
ctctctatgg gcagtcggtg at 22
<210>9
<211>19
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>9
tactggtgaa aacaccgca 19
<210>10
<211>17
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>10
ccatctcatc cctgcgt 17
<210>11
<211>14
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>11
ccatctcatc cctg 14
<210>12
<211>19
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>12
ctctgcagcc atctcatcc 19
<210>13
<211>22
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>13
tccgactcag taaggagaac ga 22
<210>14
<211>16
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>14
cgtgtctccg actcag 16
<210>15
<211>81
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>15
gnntgnntgn ntgnnctgtc tcttatacac atctccgagc ccacgagacc gagtaatatc 60
tcgtatgccg tcttctgctt g 81
<210>16
<211>87
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>16
gnntgatnnt gatnntgatn nctgtctctt atacacatct ccgagcccac gagaccgagt 60
aatatctcgt atgccgtctt ctgcttg 87
<210>17
<211>85
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>17
gnnntgnnnt gnnntgnnnc tgtctcttat acacatctcc gagcccacga gaccgagtaa 60
tatctcgtat gccgtcttct gcttg 85
<210>18
<211>83
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>18
gnnntgnntg nntgnnnctg tctcttatac acatctccga gcccacgaga ccgagtaata 60
tctcgtatgc cgtcttctgc ttg 83
<210>19
<211>75
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>19
gnnnnnnnnc tgtctcttat acacatctcc gagcccacga gaccgagtaa tatctcgtat 60
gccgtcttct gcttg 75
<210>20
<211>19
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>20
tcgtcggcag cgtcagatg 19
<210>21
<211>24
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>21
caagcagaag acggcatacg agat 24
<210>22
<211>17
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>22
tctcgtgggc tcggaga 17

Claims (9)

1. A method for improving the library construction success rate of next generation sequencing comprises the following steps:
1) contacting a linker comprising a molecular tag with a single-stranded template to obtain a ligation product;
2) contacting the library-expanding primer with the ligation product to obtain a library-building product;
wherein the linker is attached to the single-stranded template in a single-stranded single-ended attachment, and each of the molecular tags comprises a non-contiguous random sequence.
2. The method of increasing library success rate of next generation sequencing of claim 1,
the molecular tag further comprises a fixed sequence;
the random sequence and the fixed sequence are mutually spaced;
the fixed sequence contained in a single said molecular tag is at least one segment.
3. The method of increasing library success rate of next generation sequencing of claim 2,
a single stretch of the fixed sequence comprises 1-4 bases;
preferably, when the number of bases of two adjacent fixed sequences separated by the random sequence is 1, the adjacent fixed bases are not the same;
and/or, when a single stretch of the fixed sequence comprises 2-4 bases, the fixed sequence is not a repeat of a single base in four bases;
preferably, the fixed sequence occurs 1 to 7 times in a single said molecular tag.
4. The method of increasing library success rate of next generation sequencing of claim 2,
a single segment of the random sequence comprises 1-4 bases, and the total number of bases of the random sequence is 8-12;
and/or, the adaptor comprising the molecular tag pairs no more than 8 consecutive of the last ten bases of the 3' end of the library primer in step 2).
5. The method of claim 2,
further comprising performing background detection on the adaptor comprising the molecular tag using the library primers of step 2);
when the background of the joints is lower than a threshold value, the joints are selected; discarding the adapter when the background of the adapter is above a threshold;
wherein the threshold is set to be lower than a background value of a linker of a molecular tag composed of a continuous random sequence.
6. An oligonucleotide adaptor for improving the library formation power of next generation sequencing,
comprising a molecular tag sequence suitable for use in a method for increasing the pooling power of next generation sequencing according to any one of claims 1-5.
7. The oligonucleotide adapter for increasing the pooling success of next generation sequencing of claim 6,
sample tag sequences and sequencing universal sequences are also included.
8. The oligonucleotide adapter for increasing the pooling success of next generation sequencing of claim 6,
the 5' end of the oligonucleotide linker is modified; preferably, the modification is a phosphorylation modification;
and/or, the oligonucleotide linker is single stranded or has a cohesive end.
9. The oligonucleotide linker for improving the library completion power of next generation sequencing of claim 6, comprising the sequence as set forth in any one of SEQ ID NO 3-NO 7 and SEQ ID NO 15-NO 18.
CN201911277727.3A 2019-12-10 2019-12-10 Method and system for improving next generation sequencing library establishment success rate Active CN111020005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911277727.3A CN111020005B (en) 2019-12-10 2019-12-10 Method and system for improving next generation sequencing library establishment success rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911277727.3A CN111020005B (en) 2019-12-10 2019-12-10 Method and system for improving next generation sequencing library establishment success rate

Publications (2)

Publication Number Publication Date
CN111020005A true CN111020005A (en) 2020-04-17
CN111020005B CN111020005B (en) 2023-06-30

Family

ID=70206557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911277727.3A Active CN111020005B (en) 2019-12-10 2019-12-10 Method and system for improving next generation sequencing library establishment success rate

Country Status (1)

Country Link
CN (1) CN111020005B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107083427A (en) * 2016-04-01 2017-08-22 广州市基准医疗有限责任公司 The DNA cloning technology of DNA ligase mediation
CN108504649A (en) * 2017-02-24 2018-09-07 上海基致生物医药科技有限公司 Banking process, kit and detection method is sequenced in coding bis- generations of PCR
CN109797197A (en) * 2019-02-11 2019-05-24 杭州纽安津生物科技有限公司 It a kind of single chain molecule label connector and single stranded DNA banking process and its is applied in detection Circulating tumor DNA
WO2019114146A1 (en) * 2017-12-15 2019-06-20 格诺思博生物科技南通有限公司 Method for enriching gene target regions and library construction kit
CN110438121A (en) * 2018-05-03 2019-11-12 深圳华大临床检验中心 Connector, connector library and its application

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107083427A (en) * 2016-04-01 2017-08-22 广州市基准医疗有限责任公司 The DNA cloning technology of DNA ligase mediation
CN108504649A (en) * 2017-02-24 2018-09-07 上海基致生物医药科技有限公司 Banking process, kit and detection method is sequenced in coding bis- generations of PCR
WO2019114146A1 (en) * 2017-12-15 2019-06-20 格诺思博生物科技南通有限公司 Method for enriching gene target regions and library construction kit
CN110438121A (en) * 2018-05-03 2019-11-12 深圳华大临床检验中心 Connector, connector library and its application
CN109797197A (en) * 2019-02-11 2019-05-24 杭州纽安津生物科技有限公司 It a kind of single chain molecule label connector and single stranded DNA banking process and its is applied in detection Circulating tumor DNA

Also Published As

Publication number Publication date
CN111020005B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
EP3464634B1 (en) Molecular tagging methods and sequencing libraries
US10544451B2 (en) Vesicular linker and uses thereof in nucleic acid library construction and sequencing
Hu et al. Next-generation sequencing for MicroRNA expression profile
CN104480534B (en) Library building method
JP2020000237A (en) Systems and methods to detect rare mutations and copy number variation
CN108893466A (en) The detection method of sequence measuring joints, sequence measuring joints group and ultralow frequency mutation
CN110546272B (en) Method for attaching adaptors to sample nucleic acids
CN102373288A (en) Method and kit for sequencing target areas
KR20180038252A (en) A Method for Multiple Detection of Methylated DNA
WO2019144582A1 (en) Probe and method for high-throughput sequencing targeted capture target region used for detecting gene mutations as well as known and unknown gene fusion types
CN113005121A (en) Linker elements, kits and uses related thereto
CN112359093B (en) Method and kit for preparing and expressing and quantifying free miRNA library in blood
JP2018527928A (en) High molecular weight DNA sample tracking tag for next generation sequencing
WO2018148289A2 (en) Duplex adapters and duplex sequencing
EP3643789A1 (en) Pcr primer pair and application thereof
CN110923314A (en) Primer group for detecting SNP locus rs9263726, crRNA sequence and application thereof
CN111073964B (en) Kit for detecting human leukocyte antigen HLA-ABCCDRDQ genotyping
CN113969307A (en) DNA methylation sequencing library, preparation method and DNA methylation detection method
AU2019453690B2 (en) Method for preparing nested multiplex PCR high-throughput sequencing library and kit
CN104531874A (en) EGFR gene mutation detecting method and kit
EP3565906B1 (en) Quantifying dna sequences
CN111020005B (en) Method and system for improving next generation sequencing library establishment success rate
CN106701949B (en) A kind of detection method of gene mutation and reagent reducing amplification bias
US20090305288A1 (en) Methods for amplifying nucleic acids and for analyzing nucleic acids therewith
CN109306373B (en) Joint, primer group and kit for detecting human genome maturation tRNA (transfer ribonucleic acid) spectrum

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant