CN107354209B - Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations - Google Patents

Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations Download PDF

Info

Publication number
CN107354209B
CN107354209B CN201710573056.XA CN201710573056A CN107354209B CN 107354209 B CN107354209 B CN 107354209B CN 201710573056 A CN201710573056 A CN 201710573056A CN 107354209 B CN107354209 B CN 107354209B
Authority
CN
China
Prior art keywords
tag
nucleic acid
molecular
library
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710573056.XA
Other languages
Chinese (zh)
Other versions
CN107354209A (en
Inventor
高晓峘
曾晓静
李胜
张印新
韩颖鑫
何哲
王佳伟
夏伟成
蒋馥蔓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jingke Medical Laboratory Co ltd
Original Assignee
Guangzhou Jingke Medical Laboratory Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jingke Medical Laboratory Co ltd filed Critical Guangzhou Jingke Medical Laboratory Co ltd
Priority to CN201710573056.XA priority Critical patent/CN107354209B/en
Priority to PCT/CN2017/100425 priority patent/WO2019010776A1/en
Publication of CN107354209A publication Critical patent/CN107354209A/en
Application granted granted Critical
Publication of CN107354209B publication Critical patent/CN107354209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a combination label, a joint containing the combination label and a combination thereof, and a method for determining that a target region of a sample to be detected contains a low-frequency mutation nucleic acid sequence. Wherein the combinatorial tag comprises a molecular tag and a library tag, the bases of the molecular tag being arranged across the bases of the library tag. The invention combines the library label and the random molecular label together, utilizes the determined base sequence of the library label for identifying different samples to randomly separate the molecular labels, thereby achieving the purposes of controlling the number of continuous bases, not reducing the variety of the specific molecular label, not additionally increasing the lengths of the two labels and not wasting sequencing data.

Description

Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations
Technical Field
The invention relates to the technical field of nucleic acid sequencing, in particular to a combined label, a joint containing the combined label, a composition of the joint and the composition, and a method for determining that a target region of a sample to be detected contains a low-frequency mutation nucleic acid sequence.
Background
High-throughput sequencing is the sequencing technology with the widest application range at present, but some sequencing errors still can not be avoided in sequencing, the occurrence rate is 0.1-0.2% or higher, and the DNA polymerase used in the PCR process also has the error rate, and the error rate is higherIs 10-7~10-5In particular, the error rate increases with the number of PCR cycles.
In order to detect less than 0.1% base mutations (low frequency mutations) or sequencing errors, the authors invented a method of molecular tagging by adding a specific sequence to one or both ends of each sequencing template prior to PCR. Each position of the molecular label can be 1 of A, T, C, G4 bases, the length of the molecular label is selected according to the actual experiment needs, and the molecular label can have 4 n power types according to the length of the molecular label and the change of 4 bases. If the molecular tags of the original templates are completely randomly distributed, the diversity of the molecular tags can ensure that each original template is unique after the molecular tags are connected in the original library, each original template can be used as the original template to form a cluster of 'molecular clusters' in the subsequent PCR process, and if no sequencing error or PCR error exists, the molecular sequences in each cluster are error-free 'copied strands' of the positive strand and the negative strand of the original template.
Theoretically, the base sequence at each position of the molecular tag is completely randomly distributed. However, in the primer synthesis process, when a certain base is synthesized, A, T, C, G four bases are added in equal amount, and the frequency of occurrence of A, T, C, G four bases at each position is not completely equal due to the difference in energy or synthesis efficiency required for synthesis of these four bases. Multiple consecutive identical bases, e.g., 8A, 8G, etc., may be present, resulting in a virtually non-theoretical number of random molecular tag species.
Multiple consecutive bases not only increase the likelihood of sequencing errors, but also increase the proportion of dominant molecule sequences. When different molecular sequences with very similar sequences are linked to the same tag sequence, the skilled person cannot distinguish whether the sequence belongs to a molecule which normally exists, is caused by sequencing error or has low-frequency mutation. Further, molecular cloning where the low frequency mutation is identical to the sequence of normal abundance results in low frequency mutation being missed as a sequencing error or a PCR error. The non-randomness of the molecular tags can reduce their utility and even limit their application. In order to solve the problem, some researchers add a base U, such as NNNUUUNNNUNNN, to the molecular tag to avoid the occurrence of multiple continuous identical bases, which results in low detection effect of the molecular tag, and this method increases the length of the molecular tag, and the U base does not have the function of distinguishing different molecules in the analysis process, i.e. does not have the effect of preparing the molecular tag, so this method not only adds invalid molecular tag length, but also wastes sequencing length, and affects sequencing cost.
Disclosure of Invention
The invention aims to provide a label composition and a detection method, which can effectively control the number of bases of a label and reduce the waste of sequencing data.
The invention provides a combined label, which comprises a molecular label and a library label, wherein the base of the molecular label is arranged in a cross way with the base of the library label.
In another aspect, the invention also provides an adaptor, wherein the adaptor contains the combined label, and the combined label is positioned at any position of the adaptor except 20bp bases at the tail end of the overhang T and the non-overhang.
The invention also provides a method for determining that a target region of a sample to be detected contains a low-frequency mutant nucleic acid sequence, which comprises the following steps:
s1, performing a joint adding reaction on the target region nucleic acid of the sample to be detected by using the joint, and performing PCR amplification on the jointed target region nucleic acid of the sample to be detected to obtain an amplification product, wherein the amplification product forms a target region nucleic acid sequencing library of the sample to be detected;
s2, sequencing the target region nucleic acid sequencing library of the sample to be tested to obtain a sequenced nucleic acid sequence;
s3, classifying the sequenced nucleic acid sequences according to the molecular tags contained in the joints, and classifying the sequenced nucleic acid sequences carrying the same molecular tags into the same nucleic acid sequence set;
s4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base type and the frequency of each base position in the nucleic acid sequence set;
s5, obtaining a nucleic acid sequence containing a correct base arrangement position in the nucleic acid sequence set by data analysis according to the base type and frequency of each base position in the nucleic acid sequence set;
s6, comparing the nucleic acid sequence containing the correct base sequence position with the rest nucleic acid sequences in the nucleic acid sequence set or the nucleic acid sequences in the parallel nucleic acid sequence set to obtain the nucleic acid sequence containing the low-frequency mutation.
The invention combines the library label and the random molecular label together, utilizes the determined base sequence of the library label for identifying different samples to randomly separate the molecular labels, thereby achieving the purposes of controlling the number of continuous bases, not reducing the variety of the specific molecular label, not additionally increasing the lengths of the two labels and not wasting sequencing data.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which
FIG. 1 is a flowchart of a method for determining that a target region of a sample contains a low-frequency mutated nucleic acid sequence according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of the structure of a molecular tag in a fully complementary double-stranded linker according to an embodiment of the present invention.
FIG. 3 is a schematic structural diagram of a molecular tag located at a complementary end of a complementary-end-open Y-type linker according to an embodiment of the present invention.
FIG. 4 is a schematic structural diagram of a molecular tag located at an open end in a Y-type linker with a complementary end and an open end according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of a Y-shaped structure in which a molecular tag is not located on a linker but can be introduced into the linker by PCR in an embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
It is to be noted that, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
The invention provides a combined label, which comprises a molecular label and a library label, wherein bases of the library label are arranged with the molecular label in a crossed way.
The library tag is a tag sequence used for identifying different sample libraries in sequencing so as to achieve the aim of sequencing a plurality of libraries together. For example, when the sequencing platform is proton, the library tag used is barcode. When the sequencing platform is illumina, the library tag used is index.
According to a specific embodiment of the invention, every 1-2 bases of the library tag are arranged across every 1-3 bases of the molecular tag. The detailed description is as follows;
first, every 1 base of the library tag is crossed with every 1 base of the molecular tag, and the combined tag has at most 2 continuous identical bases. Reference is made to the following specific examples:
1. when the combined label is AN2TN4GN6CN8……ANn-6TNn-4GNn-2CNnFrom left to right, the 1 st, 3 rd, 5 th, 7 th, 9 th.. N-3 th, N-1 th position is the library tag (ATGC … ATGC), and the 2 nd, 4 th, 6 th, 8 th, 10 th.. N-2 th, N th position is the molecular tag (N)2N4N6N8…Nn-6Nn-4Nn-2Nn)。
The base of the molecular tag is different from the base of the library tag immediately preceding it, e.g. AN2TN4GN6CN8… … in the formula (I), N2Instead of A, T, C, G may be used, N4Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3n/2. For example, when n ═ 16, the length of the library tag is 8bp, and the molecular tag isHas a length of 8bp and a molecular tag sequence combination number of 38=6561。
2. When the combined label is N1AN3TN5GN7……CNn-7ANn-5TNn-3GNn-1C, from left to right, position 2, 4, 6, 8, 10,. and n is the library tag, and position 1, 3, 5, 7, 9,. and n-3, and n-1 is the molecular tag.
The base of the molecular tag is different from the base of the library tag next to it, e.g. N1AN3TN5GN7… … in the formula (I), N1Instead of A, T, C, G may be used, N3Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3n/2. For example, when n is 16, the length of the library tag is 8bp, the length of the molecular tag is 8bp, and the number of combinations of the molecular tag sequences is 38=6561。
3. When the combined label is AN2TN4GN6CN8……ANn-7TNn-5GNn-3CNn-1A, from left to right, position 1, 3, 5, 7, 9,. n-2, n is the library tag and position 2, 4, 6, 8, 10,. n-1 is the molecular tag.
The base of the molecular tag is different from the base of the library tag immediately preceding it, e.g. AN2TN4GN6CN8… … in the formula (I), N2Instead of A, T, C, G may be used, N4Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3(n-1)/2. For example, when n is 17, the length of the library tag is 9bp, the length of the molecular tag is 8bp, and the number of combinations of the molecular tag sequences is 38=6561。
4. When the combined label is N1AN3TN5GN7……CNn-8ANn-6TNn-4GNn-2CNnFrom left to right, position 2, 4, 6, 8, 10,. n-1 is the library tag and position 1, 3, 5, 7, 9,. n-2, n is the molecular tag.
The base of the molecular tag is different from the base of the library tag next to it, e.g. N1AN3TN5GN7… … in the formula (I), N1Instead of A, T, C, G may be used, N3Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3(n+1)/2. For example, when n is 17, the length of the library tag is 8bp, the length of the molecular tag is 9bp, and the molecular tag sequence combination is 39=19683。
And secondly, every 1-2 bases of the library label and every 1-2 bases of the molecular label are arranged in a cross mode, and the combined label has at most 3 continuous identical bases.
Further, every 1-2 bases of the library tag are arranged across every 1 base of the molecular tag, and the combinatorial tag has a maximum of 3 consecutive identical bases. Reference is made to the following specific examples:
5. when the combined label is ATN3GCN6……ACNn-3TCNnFrom left to right, position 1, 2, 4, 5, 7, 8, · (n-2), (n-1) is the library tag, and position 3, 6, 9, 12, 15, 18,. ere (n-3), n is the molecular tag.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4n/3. When n is 18, the length of the library label is 12bp, the length of the molecular label is 6bp, and the combination number of the molecular label sequences is 46=4069。
6. When the combined label is N1ATN4GC……Nn-6ACNn-3TGNnFrom left to right, items 2, 3, 5, 6, 8, 9,. (n-2), (n-1)Positions are the library tags, positions 1, 4, 7, 10, 13, 16, 19,. multidot. (n-6), (n-3), n are the molecular tags.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4(n+2)/3. When n is 19, the length of the library tag is 12bp, the length of the intermolecular molecular tag sequence in the library is 7bp, and the number of combinations of the molecular tag sequences is 47=16384。
7. When the combined label is ATN3GCN6……ACNn-4TGNn-1C, from left to right, position 1, 2, 4, 5, 7, 8, · (n-2), n is the library tag, position 3, 6, 9, 12, 15, 18,. the (n-4), (n-1) is the molecular tag.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4(n-1)/3. When n is 19, the length of the library tag is 13bp, the length of the intermolecular molecular tag sequence in the library is 6bp, and the number of combinations of the molecular tag sequences is 46=4069。
8. When the combined label is TN2GCN5ACN8……TGNn-2CT, left to right, position 1, 3, 4, 6, 7, · (n-4), (n-3), (n-1), n is the library tag, position 2, 5, 8, 12, 15, 18, · (n-2) is the molecular tag.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4 (n-1)/3. When n is 13, the length of the library tag is 9bp, the length of the intermolecular molecular tag sequence in the library is 4bp, and the number of combinations of the molecular tag sequences is 44=256。
Further, every 1 base of the library tag is crossed with every 1-2 bases of the molecular tag, and the combined tag has at most 3 continuous identical bases. Reference is made to the following specific examples:
9. when the combined label is AN2N3TN5N6……CNn-4Nn-3GNn-1NnFrom left to right, the 1 st, 4 th, 7 th,. n-5, n-2 th positions are the library tags, and the 2 nd, 3 rd, 5 th, 6 th,. n-4, n-3, n-1, n-positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 42n/3. When n is 24, the length of the library label is 8bp, the length of the molecular label is 16bp, and the combination number of the molecular label sequences is 416=4294967296。
10. When the combined label is AN2N3TN5N6……CNn-5Nn-4GNn-2N n-1T, from left to right, position 1, 4, 7,. n-6, n-3, n is the library tag, position 2, 3, 5, 6,. n-5, n-4, n-2, n-1 is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 42(n-1)/3. When n is 25, the length of the library label is 8bp, the length of the molecular label is 16bp, and the combination number of the molecular label sequences is 416=4294967296。
11. When the combined label is N1N2TN4N5A……CNn-5Nn-4GNn-2Nn-1T, from left to right, position 3, 6, 9,. n-6, n-3, n is the library tag, position 1, 2, 4, 5, 7,. n-5, n-4, n-2, n-1 is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, a combination of said molecular tagsNumber 42n/3. When n is 24, the length of the library label is 8bp, the length of the molecular label is 16bp, and the combination number of the molecular label sequences is 416=4294967296。
12. When the combined label is N1N2TN4N5A……CNn-4Nn-3GNn-1NnFrom left to right, the 3 rd, 6 th, 9 th,. n-5, n-2 th positions are the library tags, and the 1 st, 2 nd, 4 th, 5 th, 7 th,. n-4 th, n-3 th, n-1 th, n positions are the molecular tags.
The base of the molecular tag may be any of four bases, for example N1N2TN4N5In a … … … …, N may be any one of A, T, C, G.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 42(n+1)/3. When n is 26, the length of the library label is 8bp, the length of the molecular label is 18bp, and the number of the molecular label sequence combinations is 418=68719476736。
13. When the combined label is AN2TN4N5GN7CN9N10……GNn-3CNn-1NnFrom left to right, the 1 st, 3 rd, 6 th, 8 th,. n-4, n-2 th positions are the library tags, and the 2 nd, 4 th, 5 th, 7 th, 9 th,. n-3, n-1, n-positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44n/7. When n is 21, the length of the library label is 9bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
14. When the combined label is AN2N3TN5GN7N8CN10……GNn-3Nn-2CNnFrom left to right, the 1 st, 4 th, 6 th, 9 th.. n-4, n-1 th position is the library tag, and the 2 nd, 3 th, 5 th, 7 th, 8 th.. n-3, n-2, n-position is the molecular tagAnd (6) a label.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44n/7. When n is 21, the length of the library label is 9bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
15. When the combined label is AN2N3TN5GN7N8CN10……GNn-4Nn-3CNn-1T, from left to right, position 1, 4, 6, 9,. n-5, n-2, n is the library tag, position 2, 3, 5, 7, 8,. n-4, n-3, n-1 is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44(n-1)/7. When n is 22, the length of the library label is 10bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
Further, every 1-2 bases of the library tags are arranged across every 1-2 bases of the molecular tags, and the combinatorial tags have a maximum of 3 consecutive identical bases. Reference is made to the following specific examples:
16. when the combined label is AN2N3TGN6CN8N9ATN12……GNn-4Nn-3CANnFrom left to right, the 1 st, 4 th, 5 th, 7 th, 10 th, 11 th.. cndot.n-5 th, n-2 th, n-1 th positions are the library tags, and the 2 nd, 3 th, 6 th, 8 th, 9 th, 12 th.. cndot.n-4 th, n-3 th, n-1 th positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4n/2. When n is 16, the length of the library label is 8bp, the length of the molecular label is 8bp, and the sequence of the molecular labelNumber of combinations 48=65536。
17. When the combined label is ATN3N4GN6CTN9N10AN12……GCNn-3Nn-2ANnFrom left to right, the 1 st, 2 nd, 5 th, 7 th, 8 th, 11 th.. cndot.n-5 th, n-4 th, n-1 th positions are the library tags, and the 3 rd, 4 th, 6 th, 9 th, 10 th, 12 th.. cndot.n-3 th, n-2 th, n-1 th positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4n/2. When n is 16, the length of the library label is 8bp, the length of the molecular label is 8bp, and the combination number of the molecular label sequences is 48=65536。
And thirdly, every 1-2 bases of the library label and every 2-3 bases of the molecular label are arranged in a cross mode, and the combined label has at most 4 continuous identical bases. Reference is made to the following specific examples:
18. when the combined label is AN2N3N4TGN7N8CN10N11N12AT……ANn-6Nn-5Nn-4TGNn-1NnFrom left to right, the 1 st, 5 th, 6 th, 9 th, 13 th, 14 th.. n-7 th, n-3 th, n-2 nd positions are the library tags, and the 2 nd, 3 th, 4 th, 7 th, 8 th, 10 th, 11 th, 12 th.. n-6 th, n-5 th, n-4 th, n-1 th, n-2 nd positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 45n/8. When n is 24, the length of the library label is 9bp, the length of the molecular label is 15bp, and the combination number of the molecular label sequences is 415=1073741824。
19. When the combined label is ATN3N4N5GCN8N9N10ATN13N14N15……GCNn-7Nn-6Nn-5ATNn-2Nn-1NnFrom left to right, the 1 st, 2 nd, 6 th, 7 th, 11 th, 12 th,. n-9 th, n-8 th, n-4 th, n-3 rd position is the library tag, and the 3 rd, 4 th, 5 th, 8 th, 9 th, 10 th, 13 th, 14 th, 15 th,. n-7 th, n-6 th, n-5 th, n-2 th, n-1 th, n-3 th position is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 43n/5. When n is 20, the length of the library label is 8bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
And fourthly, every 1-2 bases of the library label and every 1-3 bases of the molecular label are arranged in a cross mode, and the combined label has at most 4 continuous identical bases. Reference is made to the following specific examples:
20. when the combined label is AN2N3N4TGN7N8CN10……ANn-8Nn-7Nn-6TGNn-3Nn-2CNnFrom left to right, the 1 st, 5 th, 6 th, 9 th,... cndot.n-9 th, n-5 th, n-4 th, n-1 th positions are the library tags, and the 2 nd, 3 th, 4 th, 7 th, 8 th, 10 th,. cndot.n-8 th, n-7 th, n-6 th, n-3 th, n-2 th, n-1 th positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 46n/10. When n is 20, the length of the library label is 8bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
21. When the combined label is ATN3N4N5GN7ATN10N11N12GN14……ATNn-4Nn-3Nn-2GNnFrom left to right, the 1 st, 2 nd, 6 th, 8 th, 9 th, 13 th.... gtn-6 th, n-5 th, n-1 th are the library tags, and the 3 rd, 4 th, 5 th, 7 th, 10 th, 11 th, 12 th, 14 th.. gtn-7 th, n-6 th, n-5 th, 1 th, 6 th, 9 th, 6 th, n-5 th, n-,The n-2, n-1 and n positions are the molecular labels.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44n/7. When n is 21, the length of the library label is 9bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
The invention solves the problem that in the prior art, in order to avoid a plurality of continuous identical bases in a molecular label, U bases are added in the molecular label to separate the molecular label (NNNUUUNNUUUNNNN). The library label and the random molecular label are combined together for the first time, so that the library label and the molecular label with enough lengths can be ensured by increasing the length of the effective molecular label on the premise of ensuring no invalid length, and the requirements of specific schemes are met.
According to the specific embodiment of the invention, the length of the molecular tag is 6-18 bp, and the length of the library tag is 8-12 bp.
The invention also provides an adaptor, wherein the adaptor contains the combined label, and the combined label is positioned at any position of the adaptor except 20bp bases at the tail end of the overhang T and the non-overhang.
According to a specific embodiment of the invention, the adaptor further comprises a discriminating signature sequence of 4 non-repeating bases, said discriminating signature sequence being linked to the 3 'end or the 5' end of the combined tag.
The invention also provides a method for determining that a target region of a sample to be detected contains a low-frequency mutant nucleic acid sequence, which comprises the following steps as shown in figure 1:
s1, performing a joint adding reaction on the target region nucleic acid of the sample to be detected by using the joint, and performing PCR amplification on the jointed target region nucleic acid of the sample to be detected to obtain an amplification product, wherein the amplification product forms a target region nucleic acid sequencing library of the sample to be detected;
s2, sequencing the target region nucleic acid sequencing library of the sample to be tested to obtain a sequenced nucleic acid sequence;
s3, classifying the sequenced nucleic acid sequences according to the molecular tags contained in the joints, and classifying the sequenced nucleic acid sequences carrying the same molecular tags into the same nucleic acid sequence set;
s4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base type and the frequency of each base position in the nucleic acid sequence set;
s5, obtaining a nucleic acid sequence containing a correct base arrangement position in the nucleic acid sequence set by data analysis according to the base type and frequency of each base position in the nucleic acid sequence set;
s6, comparing the nucleic acid sequence containing the correct base sequence position with the rest nucleic acid sequences in the nucleic acid sequence set or the nucleic acid sequences in the parallel nucleic acid sequence set to obtain the nucleic acid sequence containing the low-frequency mutation.
The scheme of the invention will be explained with reference to the examples. It will be appreciated by persons skilled in the art that the following examples are illustrative only and are not to be construed as limiting the invention. Reagents, sequences (adaptors, tags and primers), software and equipment not specifically submitted to the following examples are conventional commercial products or open sources, unless otherwise submitted.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Example 1 method for determining Low-frequency mutant nucleic acid sequence in target region of sample to be tested
1. Designing a combined label and a joint containing the combined label.
The combinatorial tag is designed according to the way that the library tag and the molecular tag are arranged in a single base crossing mode, and the combinatorial tag contains at most 2 continuous identical bases. A group of 16 combined labels is designed according to the experimental requirements. As shown in table 1, 16 combination tags:
TABLE 1
Figure BDA0001350145790000141
Figure BDA0001350145790000151
Wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
The combinatorial tags designed above are designed as a set of adapters, where the combinatorial tags can be located anywhere on the adapters except for the 20bp bases at the end of the overhang "T" and the non-overhang. NNN.. NNN represents a combinatorial tag, and the type of adaptor may be a fully complementary double stranded structure, a Y-type structure with one end complementary and one open end, or a Y-type structure in which a combinatorial tag can be introduced into an adaptor by PCR, as shown in fig. 2, 3, 4, and 5. The combined labels can be only positioned at any end or middle of the joint, or can be distributed at 2 or more than 2 positions, the number of N represents the number of bases of the combined labels, and the number of bases at the position can be increased when more types of the combined labels are needed, for example, 8bp, 12bp, 16bp, 24bp or more bases are adopted.
As shown in table 2, 16 linkers containing different combination tags:
TABLE 2
Figure BDA0001350145790000152
Figure BDA0001350145790000161
When the linker is as shown in FIG. 1 and FIG. 2 and the like, it is necessary to design the structure containing the reverse complement of the combinatorial tag at the same time, for example, it is necessary to design the F-directional sequence and the R-directional sequence in Table 2 at the same time, and FIGS. 3 and 4 and the like only need to design the single-stranded combinatorial tag, for example, the F-directional sequence in Table 2, and it is not necessary to design the reverse complement of the combinatorial tag.
Depending on the needs of the experiment, identifying signature sequences and/or library tags may also be added at the 3 'or 5' end of the combinatorial tags. For example, when sequencing using the Ion Torrent platform, Barcode sequences that identify different samples can be added to it.
2. Synthesis of linkers containing combinatorial tags
And synthesizing the designed combined label or the corresponding reverse complementary sequence thereof and the sequences of the 3 'end and the 5' end thereof according to the designed joint sequence to obtain the joint containing the combined label. As will be understood by those skilled in the art, the synthesis method may be any method known in the art, or may be entrusted to a primer synthesis company.
3. Diluting the obtained joint into working solution for later use.
4. Extraction of sample DNA
The patient's peripheral EDTA anticoagulated blood was withdrawn in 10ml and the plasma was freshly centrifuged and the plasma DNA extracted according to methods well known to those skilled in the art.
5. DNA end repair
The extracted DNA solution and the mixed solution of the end-repairing reagent are mixed, and the mixture is reacted according to an end-repairing method well known to those skilled in the art, and then separated and purified after the reaction is finished.
5.1 the following reaction system was formulated in a 1.5ml EP tube:
reagent Volume/ul
DNA 50
10 XPNK buffer 5
dNTP solution (10mM) 2
T4DNA polymerase 1
T4PNK 1
KLENOW fragment (10-fold dilution) 1
Total volume/ul 50
And (3) uniformly mixing at room temperature, slightly centrifuging, placing the reaction system in a PCR instrument, reacting for 30 minutes at 20 ℃, and purifying by using AMpure XP magnetic beads after the reaction is finished.
5.2 add 90ul magnetic beads to 50ul system reaction product, after AMpure XP magnetic beads purification, repeatedly wash twice with 500ul 75% ethanol, discard supernatant. Drying at 37 ℃ until the magnetic beads are dried. Add 23ul of water, mix the beads well, and suck 22ul of supernatant after clarification.
6. Coupling reaction
And (3) mixing the DNA solution with the repaired tail end with the working solution containing the joint of the combined label and the mixed solution of the connecting reaction reagent obtained in the step (3), reacting according to a joint adding method well known by a person skilled in the art, and separating and purifying after the reaction is finished.
6.1 preparing a reaction solution from the solution obtained in the step 5 according to the following system:
Figure BDA0001350145790000171
Figure BDA0001350145790000181
and (3) uniformly mixing at room temperature, slightly centrifuging, placing the reaction system in a PCR instrument, reacting for 30 minutes at 20 ℃, and purifying by using AMpure XP magnetic beads after the reaction is finished.
6.2 magnetic bead purification was carried out by the method shown in 5.2, except that 75. mu.l of magnetic beads were added to 50. mu.l of the reaction product in the system, and the reaction product was washed twice with 500. mu.l of 75% ethanol, and the supernatant was discarded. Drying at 37 ℃ until the magnetic beads are dried. Add 36ul of water, mix the beads well, and aspirate 34.5ul of supernatant after clarification.
7. PCR enrichment and sequencing library construction
Mixing the DNA added with the joint and the mixed solution of the PCR reaction reagent uniformly, carrying out PCR reaction according to a method well known by a person skilled in the art, carrying out separation and purification after the reaction is finished, carrying out QC detection on the library after the library is constructed, and waiting for sequencing after the library is qualified.
7.1 reaction solutions were prepared in 1 new PCR tube according to the following system:
reagent Volume/ul
DNA 34.5
10×PfxAmplification buffer 5
dNTP solution (10mM) 5
MgSO4(50mM) 2
PCR primer PE1(10pmol/ul) 4
PCR primer PE2(10pmol/ul) 4
Pfx DNA polymerase 1
Total volume/ul 50
Mixing evenly at room temperature, slightly centrifuging, placing the reaction system in a PCR instrument, and reacting according to the following conditions:
Figure BDA0001350145790000182
Figure BDA0001350145790000191
after the reaction was completed, purification was performed using AMpure XP magnetic beads.
7.2 magnetic bead purification was carried out by the method shown in 5.2, except that 50. mu.l of magnetic beads were added to 50. mu.l of the reaction product in the 50. mu.l system. The library construction is finished.
8. Library quality inspection
QPCR and Agilent 2100 detection are carried out on the library, and qualified library quality inspection is arranged on a computer.
9. DNA sequencing of the library
The library can be sequenced using a second generation sequencer such as Ion Torrent Proton, Ion Torrent PGM, and the like.
10. Analysis of sequencing results
Analyzing the sequencing result of the DNA obtained after sequencing, classifying the obtained DNA sequences according to the combined labels, and taking the sequences carrying the same combined labels as 1 'molecular cluster', wherein the molecular cluster is 1 type of DNA formed by PCR of the initial 1 DNA molecule, namely the 'copied strand' of the positive strand and the negative strand of the original DNA molecule.
The base type of each base position in the molecular cluster and the frequency of the occurrence of the base type are counted.
Based on the data analysis, errors due to PCR and sequencing were found and corrected.
Thus obtaining the correct sequence of the original DNA, and finding out the real mutation sequence through the interior of the molecular cluster and parallel comparison.
Example 2
The method for determining the low frequency mutation-containing nucleic acid sequence in the target region of the sample to be tested is basically the same as that in example 1, except that 2 bases of the library tag and 1 base of the molecular tag are arranged in a cross manner in step 1.
As shown in table 3 below:
Figure BDA0001350145790000201
linker P1 sequence 5 '-3':
SEQ ID NO 46:CCTCTCTATGGGCAGTCGGTGAT。
wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
Example 3
The method for determining that the target region of the sample to be detected contains the low-frequency mutation nucleic acid sequence is basically the same as that in the embodiment 1, the difference is that 1-2 bases of the library tag and 1-2 bases of the molecular tag are arranged in a cross mode in the step 1.
As shown in table 4 below:
Figure BDA0001350145790000202
linker P1 sequence 5 '-3':
SEQ ID NO 59:CCTCTCTATGGGCAGTCGGTGAT。
wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
Example 4
The method for determining that the target region of the sample to be detected contains the low-frequency mutation nucleic acid sequence is basically the same as that in the embodiment 1, the difference is that in the step 1, 1-2 bases of the library tag and 2-3 bases of the molecular tag are arranged in a cross mode.
As shown in table 5 below:
Figure BDA0001350145790000211
linker P1 sequence 5 '-3':
SEQ ID NO 72:CCTCTCTATGGGCAGTCGGTGAT。
wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention shall fall within the protection scope defined by the claims of the present invention.
Figure DEST_PATH_IDA0001419781340000011
Figure DEST_PATH_IDA0001419781340000021
Figure DEST_PATH_IDA0001419781340000031
Figure DEST_PATH_IDA0001419781340000041
Figure DEST_PATH_IDA0001419781340000051
Figure DEST_PATH_IDA0001419781340000061
Figure DEST_PATH_IDA0001419781340000071
Figure DEST_PATH_IDA0001419781340000081
Figure DEST_PATH_IDA0001419781340000091
Figure DEST_PATH_IDA0001419781340000101
Figure DEST_PATH_IDA0001419781340000111
Figure DEST_PATH_IDA0001419781340000121
Figure DEST_PATH_IDA0001419781340000131
Figure DEST_PATH_IDA0001419781340000141
Figure DEST_PATH_IDA0001419781340000151
Figure DEST_PATH_IDA0001419781340000161
Figure DEST_PATH_IDA0001419781340000171
Figure DEST_PATH_IDA0001419781340000181
Figure DEST_PATH_IDA0001419781340000191
Figure DEST_PATH_IDA0001419781340000201
Figure DEST_PATH_IDA0001419781340000211
Figure DEST_PATH_IDA0001419781340000221
Figure DEST_PATH_IDA0001419781340000231
Figure DEST_PATH_IDA0001419781340000241

Claims (8)

1. A method for determining that a target region of a sample to be tested contains a low-frequency mutant nucleic acid sequence comprises the following steps:
s1, performing a joint adding reaction on the target region nucleic acid of the sample to be detected by using a joint, wherein the joint contains a combined label, the combined label comprises a molecular label and a library label, the base of the molecular label and the base of the library label are arranged in a cross way, the combined label is positioned at any position of the joint except for 20bp base at the tail end of an overhang end T and a non-overhang end, performing PCR amplification on the target region nucleic acid of the sample to be detected after the joint is added, and obtaining an amplification product, wherein the amplification product forms a target region nucleic acid sequencing library of the sample to be detected;
s2, sequencing the target region nucleic acid sequencing library of the sample to be tested to obtain a sequenced nucleic acid sequence;
s3, classifying the sequenced nucleic acid sequences according to the molecular tags contained in the joints, and classifying the sequenced nucleic acid sequences carrying the same molecular tags into the same nucleic acid sequence set;
s4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base type and the frequency of each base position in the nucleic acid sequence set;
s5, obtaining a nucleic acid sequence containing a correct base arrangement position in the nucleic acid sequence set by data analysis according to the base type and frequency of each base position in the nucleic acid sequence set;
s6, comparing the nucleic acid sequence containing the correct base sequence position with the rest nucleic acid sequences in the nucleic acid sequence set or the nucleic acid sequences in the parallel nucleic acid sequence set to obtain the nucleic acid sequence containing the low-frequency mutation.
2. The method of claim 1, wherein the adaptor further comprises an identifying signature sequence of 4 non-repeating bases, wherein the identifying signature sequence is linked to the 3 'end or the 5' end of the combined tag.
3. The method of claim 1, wherein every 1-2 bases of the library tag are crossed with every 1-3 bases of the molecular tag.
4. The method of claim 3, wherein every 1 base of the library tag crosses every 1 base of the molecular tag, and the combinatorial tag has at most 2 consecutive identical bases.
5. The method of claim 3, wherein every 1-2 bases of the library tag are crossed with every 1-2 bases of the molecular tag, and the combined tag has at most 3 consecutive identical bases.
6. The method of claim 3, wherein every 1-2 bases of the library tag are crossed with every 2-3 bases of the molecular tag, and the combined tag has at most 4 consecutive identical bases.
7. The method of claim 3, wherein every 1-2 bases of the library tag are crossed with every 1-3 bases of the molecular tag, and the combined tag has at most 4 consecutive identical bases.
8. The method of claim 1, wherein the molecular tag has a length of 6-18 bp and the library tag has a length of 8-12 bp.
CN201710573056.XA 2017-07-14 2017-07-14 Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations Active CN107354209B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710573056.XA CN107354209B (en) 2017-07-14 2017-07-14 Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations
PCT/CN2017/100425 WO2019010776A1 (en) 2017-07-14 2017-09-04 Combined label, connector and method for determining that low-frequency mutation nucleic acid sequence is comprised

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710573056.XA CN107354209B (en) 2017-07-14 2017-07-14 Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations

Publications (2)

Publication Number Publication Date
CN107354209A CN107354209A (en) 2017-11-17
CN107354209B true CN107354209B (en) 2021-01-08

Family

ID=60293441

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710573056.XA Active CN107354209B (en) 2017-07-14 2017-07-14 Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations

Country Status (2)

Country Link
CN (1) CN107354209B (en)
WO (1) WO2019010776A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110438121A (en) * 2018-05-03 2019-11-12 深圳华大临床检验中心 Connector, connector library and its application
CN111073961A (en) * 2019-12-20 2020-04-28 苏州赛美科基因科技有限公司 High-throughput detection method for gene rare mutation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103938277A (en) * 2014-04-18 2014-07-23 中国科学院北京基因组研究所 Trace DNA-based next-generation sequencing library construction method
CN105861710A (en) * 2016-05-20 2016-08-17 北京科迅生物技术有限公司 Sequencing joint and preparation method and application thereof in ultra-low frequency mutation detection
WO2016160844A2 (en) * 2015-03-30 2016-10-06 Cellular Research, Inc. Methods and compositions for combinatorial barcoding
CN106048009A (en) * 2016-06-03 2016-10-26 人和未来生物科技(长沙)有限公司 Label joint for detection of ultra-low-frequency gene mutation and application of label joint

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7947446B2 (en) * 2007-05-29 2011-05-24 Ming-Sheng Lee High throughput mutation screening methods and kits using a universalized approach—differential sequence fill-in (DSF)-enabled sequential adapter ligation and amplification
CN104293938B (en) * 2014-09-30 2017-11-03 天津华大基因科技有限公司 Build the method and its application of sequencing library
CN106811460B (en) * 2015-11-30 2020-11-27 浙江安诺优达生物科技有限公司 Construction method and kit of next-generation sequencing library for low-frequency mutation detection
CN106676182B (en) * 2017-02-07 2020-08-14 北京诺禾致源科技股份有限公司 Method and device for detecting low-frequency gene fusion
CN106834275A (en) * 2017-02-22 2017-06-13 天津诺禾医学检验所有限公司 The analysis method of the construction method, kit and library detection data in ctDNA ultralow frequency abrupt climatic changes library

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103938277A (en) * 2014-04-18 2014-07-23 中国科学院北京基因组研究所 Trace DNA-based next-generation sequencing library construction method
WO2016160844A2 (en) * 2015-03-30 2016-10-06 Cellular Research, Inc. Methods and compositions for combinatorial barcoding
CN105861710A (en) * 2016-05-20 2016-08-17 北京科迅生物技术有限公司 Sequencing joint and preparation method and application thereof in ultra-low frequency mutation detection
CN106048009A (en) * 2016-06-03 2016-10-26 人和未来生物科技(长沙)有限公司 Label joint for detection of ultra-low-frequency gene mutation and application of label joint

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RN 914039-67-1;无;《STN REGISTRY》;20061127;第27页 *

Also Published As

Publication number Publication date
WO2019010776A1 (en) 2019-01-17
CN107354209A (en) 2017-11-17

Similar Documents

Publication Publication Date Title
CN106367485B (en) Double label connector groups of a kind of more positioning for detecting gene mutation and its preparation method and application
JP2024059651A (en) Methods and compositions for DNA profiling
CN104694635B (en) A kind of high flux simplifies the construction method in gene order-checking library
CN105002567B (en) Simplify the construction method for the sequencing library that methylates without reference gene group high flux
CN106555226A (en) A kind of method and test kit for building high-throughput sequencing library
CN106811460B (en) Construction method and kit of next-generation sequencing library for low-frequency mutation detection
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN106939344B (en) Linker for next generation sequencing
CN108715902A (en) Plum blossom weeping branch character SNP marker and its application
CN107354209B (en) Combinatorial tags, linkers and methods for determining nucleic acid sequences containing low frequency mutations
WO2012037875A1 (en) Dna tags and use thereof
Menon et al. Bioinformatics tools and methods to analyze single-cell RNA sequencing data
CN110724731A (en) Method for adding internal reference quantity of nucleic acid copy number in multiplex PCR system
CN115715323A (en) High-compatibility PCR-free library building and sequencing method
CN108220418A (en) The detection kit and method of Du Shi based on multiplex PCR capture technique/bayesian muscular dystrophy
CN111440846A (en) Position anchoring bar code system for nanopore sequencing library building
CN108932401B (en) Identification method of sequencing sample and application thereof
US20190112594A1 (en) Compositions and methods that are useful for identifying allele variants that modulate gene expression
CN110218811B (en) Method for screening rice mutant
CN109797437A (en) A kind of construction method of sequencing library when detecting multiple samples and its application
CN112301432B (en) Method and kit for constructing whole genome high-throughput sequencing library
CN109097457A (en) The method for determining predetermined site mutation type in sample of nucleic acid
CN104073549B (en) A kind of method the most quickly measuring BAC end sequence
CN113444769A (en) Construction method and application of DNA tag sequence
WO2019010775A1 (en) Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Gao Xiaohuan

Inventor after: Zeng Xiaojing

Inventor after: Li Sheng

Inventor after: Zhang Yinxin

Inventor after: Han Yingxin

Inventor after: He Zhe

Inventor after: Wang Jiawei

Inventor after: Xia Weicheng

Inventor after: Jiang Biman

Inventor before: Gao Xiaohuan

Inventor before: Zeng Xiaojing

Inventor before: Zhang Yinxin

Inventor before: Han Yingxin

Inventor before: He Zhe

Inventor before: Wang Jiawei

Inventor before: Xia Weicheng

Inventor before: Li Sheng

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant