Detailed Description
The following describes embodiments of the present invention in detail. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
It is to be noted that, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
The invention provides a combined label, which comprises a molecular label and a library label, wherein bases of the library label are arranged with the molecular label in a crossed way.
The library tag is a tag sequence used for identifying different sample libraries in sequencing so as to achieve the aim of sequencing a plurality of libraries together. For example, when the sequencing platform is proton, the library tag used is barcode. When the sequencing platform is illumina, the library tag used is index.
According to a specific embodiment of the invention, every 1-2 bases of the library tag are arranged across every 1-3 bases of the molecular tag. The detailed description is as follows;
first, every 1 base of the library tag is crossed with every 1 base of the molecular tag, and the combined tag has at most 2 continuous identical bases. Reference is made to the following specific examples:
1. when the combined label is AN2TN4GN6CN8……ANn-6TNn-4GNn-2CNnFrom left to right, the 1 st, 3 rd, 5 th, 7 th, 9 th.. N-3 th, N-1 th position is the library tag (ATGC … ATGC), and the 2 nd, 4 th, 6 th, 8 th, 10 th.. N-2 th, N th position is the molecular tag (N)2N4N6N8…Nn-6Nn-4Nn-2Nn)。
The base of the molecular tag is different from the base of the library tag immediately preceding it, e.g. AN2TN4GN6CN8… … in the formula (I), N2Instead of A, T, C, G may be used, N4Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3n/2. For example, when n ═ 16, the length of the library tag is 8bp, and the molecular tag isHas a length of 8bp and a molecular tag sequence combination number of 38=6561。
2. When the combined label is N1AN3TN5GN7……CNn-7ANn-5TNn-3GNn-1C, from left to right, position 2, 4, 6, 8, 10,. and n is the library tag, and position 1, 3, 5, 7, 9,. and n-3, and n-1 is the molecular tag.
The base of the molecular tag is different from the base of the library tag next to it, e.g. N1AN3TN5GN7… … in the formula (I), N1Instead of A, T, C, G may be used, N3Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3n/2. For example, when n is 16, the length of the library tag is 8bp, the length of the molecular tag is 8bp, and the number of combinations of the molecular tag sequences is 38=6561。
3. When the combined label is AN2TN4GN6CN8……ANn-7TNn-5GNn-3CNn-1A, from left to right, position 1, 3, 5, 7, 9,. n-2, n is the library tag and position 2, 4, 6, 8, 10,. n-1 is the molecular tag.
The base of the molecular tag is different from the base of the library tag immediately preceding it, e.g. AN2TN4GN6CN8… … in the formula (I), N2Instead of A, T, C, G may be used, N4Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3(n-1)/2. For example, when n is 17, the length of the library tag is 9bp, the length of the molecular tag is 8bp, and the number of combinations of the molecular tag sequences is 38=6561。
4. When the combined label is N1AN3TN5GN7……CNn-8ANn-6TNn-4GNn-2CNnFrom left to right, position 2, 4, 6, 8, 10,. n-1 is the library tag and position 1, 3, 5, 7, 9,. n-2, n is the molecular tag.
The base of the molecular tag is different from the base of the library tag next to it, e.g. N1AN3TN5GN7… … in the formula (I), N1Instead of A, T, C, G may be used, N3Instead of T, A, C, G may be any of these.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 3(n+1)/2. For example, when n is 17, the length of the library tag is 8bp, the length of the molecular tag is 9bp, and the molecular tag sequence combination is 39=19683。
And secondly, every 1-2 bases of the library label and every 1-2 bases of the molecular label are arranged in a cross mode, and the combined label has at most 3 continuous identical bases.
Further, every 1-2 bases of the library tag are arranged across every 1 base of the molecular tag, and the combinatorial tag has a maximum of 3 consecutive identical bases. Reference is made to the following specific examples:
5. when the combined label is ATN3GCN6……ACNn-3TCNnFrom left to right, position 1, 2, 4, 5, 7, 8, · (n-2), (n-1) is the library tag, and position 3, 6, 9, 12, 15, 18,. ere (n-3), n is the molecular tag.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4n/3. When n is 18, the length of the library label is 12bp, the length of the molecular label is 6bp, and the combination number of the molecular label sequences is 46=4069。
6. When the combined label is N1ATN4GC……Nn-6ACNn-3TGNnFrom left to right, items 2, 3, 5, 6, 8, 9,. (n-2), (n-1)Positions are the library tags, positions 1, 4, 7, 10, 13, 16, 19,. multidot. (n-6), (n-3), n are the molecular tags.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4(n+2)/3. When n is 19, the length of the library tag is 12bp, the length of the intermolecular molecular tag sequence in the library is 7bp, and the number of combinations of the molecular tag sequences is 47=16384。
7. When the combined label is ATN3GCN6……ACNn-4TGNn-1C, from left to right, position 1, 2, 4, 5, 7, 8, · (n-2), n is the library tag, position 3, 6, 9, 12, 15, 18,. the (n-4), (n-1) is the molecular tag.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4(n-1)/3. When n is 19, the length of the library tag is 13bp, the length of the intermolecular molecular tag sequence in the library is 6bp, and the number of combinations of the molecular tag sequences is 46=4069。
8. When the combined label is TN2GCN5ACN8……TGNn-2CT, left to right, position 1, 3, 4, 6, 7, · (n-4), (n-3), (n-1), n is the library tag, position 2, 5, 8, 12, 15, 18, · (n-2) is the molecular tag.
The base of the molecular tag is different from the base of any library tag to which it is adjacent.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4 (n-1)/3. When n is 13, the length of the library tag is 9bp, the length of the intermolecular molecular tag sequence in the library is 4bp, and the number of combinations of the molecular tag sequences is 44=256。
Further, every 1 base of the library tag is crossed with every 1-2 bases of the molecular tag, and the combined tag has at most 3 continuous identical bases. Reference is made to the following specific examples:
9. when the combined label is AN2N3TN5N6……CNn-4Nn-3GNn-1NnFrom left to right, the 1 st, 4 th, 7 th,. n-5, n-2 th positions are the library tags, and the 2 nd, 3 rd, 5 th, 6 th,. n-4, n-3, n-1, n-positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 42n/3. When n is 24, the length of the library label is 8bp, the length of the molecular label is 16bp, and the combination number of the molecular label sequences is 416=4294967296。
10. When the combined label is AN2N3TN5N6……CNn-5Nn-4GNn-2N n-1T, from left to right, position 1, 4, 7,. n-6, n-3, n is the library tag, position 2, 3, 5, 6,. n-5, n-4, n-2, n-1 is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 42(n-1)/3. When n is 25, the length of the library label is 8bp, the length of the molecular label is 16bp, and the combination number of the molecular label sequences is 416=4294967296。
11. When the combined label is N1N2TN4N5A……CNn-5Nn-4GNn-2Nn-1T, from left to right, position 3, 6, 9,. n-6, n-3, n is the library tag, position 1, 2, 4, 5, 7,. n-5, n-4, n-2, n-1 is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, a combination of said molecular tagsNumber 42n/3. When n is 24, the length of the library label is 8bp, the length of the molecular label is 16bp, and the combination number of the molecular label sequences is 416=4294967296。
12. When the combined label is N1N2TN4N5A……CNn-4Nn-3GNn-1NnFrom left to right, the 3 rd, 6 th, 9 th,. n-5, n-2 th positions are the library tags, and the 1 st, 2 nd, 4 th, 5 th, 7 th,. n-4 th, n-3 th, n-1 th, n positions are the molecular tags.
The base of the molecular tag may be any of four bases, for example N1N2TN4N5In a … … … …, N may be any one of A, T, C, G.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 42(n+1)/3. When n is 26, the length of the library label is 8bp, the length of the molecular label is 18bp, and the number of the molecular label sequence combinations is 418=68719476736。
13. When the combined label is AN2TN4N5GN7CN9N10……GNn-3CNn-1NnFrom left to right, the 1 st, 3 rd, 6 th, 8 th,. n-4, n-2 th positions are the library tags, and the 2 nd, 4 th, 5 th, 7 th, 9 th,. n-3, n-1, n-positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44n/7. When n is 21, the length of the library label is 9bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
14. When the combined label is AN2N3TN5GN7N8CN10……GNn-3Nn-2CNnFrom left to right, the 1 st, 4 th, 6 th, 9 th.. n-4, n-1 th position is the library tag, and the 2 nd, 3 th, 5 th, 7 th, 8 th.. n-3, n-2, n-position is the molecular tagAnd (6) a label.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44n/7. When n is 21, the length of the library label is 9bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
15. When the combined label is AN2N3TN5GN7N8CN10……GNn-4Nn-3CNn-1T, from left to right, position 1, 4, 6, 9,. n-5, n-2, n is the library tag, position 2, 3, 5, 7, 8,. n-4, n-3, n-1 is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44(n-1)/7. When n is 22, the length of the library label is 10bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
Further, every 1-2 bases of the library tags are arranged across every 1-2 bases of the molecular tags, and the combinatorial tags have a maximum of 3 consecutive identical bases. Reference is made to the following specific examples:
16. when the combined label is AN2N3TGN6CN8N9ATN12……GNn-4Nn-3CANnFrom left to right, the 1 st, 4 th, 5 th, 7 th, 10 th, 11 th.. cndot.n-5 th, n-2 th, n-1 th positions are the library tags, and the 2 nd, 3 th, 6 th, 8 th, 9 th, 12 th.. cndot.n-4 th, n-3 th, n-1 th positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4n/2. When n is 16, the length of the library label is 8bp, the length of the molecular label is 8bp, and the sequence of the molecular labelNumber of combinations 48=65536。
17. When the combined label is ATN3N4GN6CTN9N10AN12……GCNn-3Nn-2ANnFrom left to right, the 1 st, 2 nd, 5 th, 7 th, 8 th, 11 th.. cndot.n-5 th, n-4 th, n-1 th positions are the library tags, and the 3 rd, 4 th, 6 th, 9 th, 10 th, 12 th.. cndot.n-3 th, n-2 th, n-1 th positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 4n/2. When n is 16, the length of the library label is 8bp, the length of the molecular label is 8bp, and the combination number of the molecular label sequences is 48=65536。
And thirdly, every 1-2 bases of the library label and every 2-3 bases of the molecular label are arranged in a cross mode, and the combined label has at most 4 continuous identical bases. Reference is made to the following specific examples:
18. when the combined label is AN2N3N4TGN7N8CN10N11N12AT……ANn-6Nn-5Nn-4TGNn-1NnFrom left to right, the 1 st, 5 th, 6 th, 9 th, 13 th, 14 th.. n-7 th, n-3 th, n-2 nd positions are the library tags, and the 2 nd, 3 th, 4 th, 7 th, 8 th, 10 th, 11 th, 12 th.. n-6 th, n-5 th, n-4 th, n-1 th, n-2 nd positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 45n/8. When n is 24, the length of the library label is 9bp, the length of the molecular label is 15bp, and the combination number of the molecular label sequences is 415=1073741824。
19. When the combined label is ATN3N4N5GCN8N9N10ATN13N14N15……GCNn-7Nn-6Nn-5ATNn-2Nn-1NnFrom left to right, the 1 st, 2 nd, 6 th, 7 th, 11 th, 12 th,. n-9 th, n-8 th, n-4 th, n-3 rd position is the library tag, and the 3 rd, 4 th, 5 th, 8 th, 9 th, 10 th, 13 th, 14 th, 15 th,. n-7 th, n-6 th, n-5 th, n-2 th, n-1 th, n-3 th position is the molecular tag.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 43n/5. When n is 20, the length of the library label is 8bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
And fourthly, every 1-2 bases of the library label and every 1-3 bases of the molecular label are arranged in a cross mode, and the combined label has at most 4 continuous identical bases. Reference is made to the following specific examples:
20. when the combined label is AN2N3N4TGN7N8CN10……ANn-8Nn-7Nn-6TGNn-3Nn-2CNnFrom left to right, the 1 st, 5 th, 6 th, 9 th,... cndot.n-9 th, n-5 th, n-4 th, n-1 th positions are the library tags, and the 2 nd, 3 th, 4 th, 7 th, 8 th, 10 th,. cndot.n-8 th, n-7 th, n-6 th, n-3 th, n-2 th, n-1 th positions are the molecular tags.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 46n/10. When n is 20, the length of the library label is 8bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
21. When the combined label is ATN3N4N5GN7ATN10N11N12GN14……ATNn-4Nn-3Nn-2GNnFrom left to right, the 1 st, 2 nd, 6 th, 8 th, 9 th, 13 th.... gtn-6 th, n-5 th, n-1 th are the library tags, and the 3 rd, 4 th, 5 th, 7 th, 10 th, 11 th, 12 th, 14 th.. gtn-7 th, n-6 th, n-5 th, 1 th, 6 th, 9 th, 6 th, n-5 th, n-,The n-2, n-1 and n positions are the molecular labels.
The base of the molecular tag may be any one of four bases.
In the case of 1 defined library tag, the number of combinations of the molecular tags is 44n/7. When n is 21, the length of the library label is 9bp, the length of the molecular label is 12bp, and the combination number of the molecular label sequences is 412=16777216。
The invention solves the problem that in the prior art, in order to avoid a plurality of continuous identical bases in a molecular label, U bases are added in the molecular label to separate the molecular label (NNNUUUNNUUUNNNN). The library label and the random molecular label are combined together for the first time, so that the library label and the molecular label with enough lengths can be ensured by increasing the length of the effective molecular label on the premise of ensuring no invalid length, and the requirements of specific schemes are met.
According to the specific embodiment of the invention, the length of the molecular tag is 6-18 bp, and the length of the library tag is 8-12 bp.
The invention also provides an adaptor, wherein the adaptor contains the combined label, and the combined label is positioned at any position of the adaptor except 20bp bases at the tail end of the overhang T and the non-overhang.
According to a specific embodiment of the invention, the adaptor further comprises a discriminating signature sequence of 4 non-repeating bases, said discriminating signature sequence being linked to the 3 'end or the 5' end of the combined tag.
The invention also provides a method for determining that a target region of a sample to be detected contains a low-frequency mutant nucleic acid sequence, which comprises the following steps as shown in figure 1:
s1, performing a joint adding reaction on the target region nucleic acid of the sample to be detected by using the joint, and performing PCR amplification on the jointed target region nucleic acid of the sample to be detected to obtain an amplification product, wherein the amplification product forms a target region nucleic acid sequencing library of the sample to be detected;
s2, sequencing the target region nucleic acid sequencing library of the sample to be tested to obtain a sequenced nucleic acid sequence;
s3, classifying the sequenced nucleic acid sequences according to the molecular tags contained in the joints, and classifying the sequenced nucleic acid sequences carrying the same molecular tags into the same nucleic acid sequence set;
s4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base type and the frequency of each base position in the nucleic acid sequence set;
s5, obtaining a nucleic acid sequence containing a correct base arrangement position in the nucleic acid sequence set by data analysis according to the base type and frequency of each base position in the nucleic acid sequence set;
s6, comparing the nucleic acid sequence containing the correct base sequence position with the rest nucleic acid sequences in the nucleic acid sequence set or the nucleic acid sequences in the parallel nucleic acid sequence set to obtain the nucleic acid sequence containing the low-frequency mutation.
The scheme of the invention will be explained with reference to the examples. It will be appreciated by persons skilled in the art that the following examples are illustrative only and are not to be construed as limiting the invention. Reagents, sequences (adaptors, tags and primers), software and equipment not specifically submitted to the following examples are conventional commercial products or open sources, unless otherwise submitted.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Example 1 method for determining Low-frequency mutant nucleic acid sequence in target region of sample to be tested
1. Designing a combined label and a joint containing the combined label.
The combinatorial tag is designed according to the way that the library tag and the molecular tag are arranged in a single base crossing mode, and the combinatorial tag contains at most 2 continuous identical bases. A group of 16 combined labels is designed according to the experimental requirements. As shown in table 1, 16 combination tags:
TABLE 1
Wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
The combinatorial tags designed above are designed as a set of adapters, where the combinatorial tags can be located anywhere on the adapters except for the 20bp bases at the end of the overhang "T" and the non-overhang. NNN.. NNN represents a combinatorial tag, and the type of adaptor may be a fully complementary double stranded structure, a Y-type structure with one end complementary and one open end, or a Y-type structure in which a combinatorial tag can be introduced into an adaptor by PCR, as shown in fig. 2, 3, 4, and 5. The combined labels can be only positioned at any end or middle of the joint, or can be distributed at 2 or more than 2 positions, the number of N represents the number of bases of the combined labels, and the number of bases at the position can be increased when more types of the combined labels are needed, for example, 8bp, 12bp, 16bp, 24bp or more bases are adopted.
As shown in table 2, 16 linkers containing different combination tags:
TABLE 2
When the linker is as shown in FIG. 1 and FIG. 2 and the like, it is necessary to design the structure containing the reverse complement of the combinatorial tag at the same time, for example, it is necessary to design the F-directional sequence and the R-directional sequence in Table 2 at the same time, and FIGS. 3 and 4 and the like only need to design the single-stranded combinatorial tag, for example, the F-directional sequence in Table 2, and it is not necessary to design the reverse complement of the combinatorial tag.
Depending on the needs of the experiment, identifying signature sequences and/or library tags may also be added at the 3 'or 5' end of the combinatorial tags. For example, when sequencing using the Ion Torrent platform, Barcode sequences that identify different samples can be added to it.
2. Synthesis of linkers containing combinatorial tags
And synthesizing the designed combined label or the corresponding reverse complementary sequence thereof and the sequences of the 3 'end and the 5' end thereof according to the designed joint sequence to obtain the joint containing the combined label. As will be understood by those skilled in the art, the synthesis method may be any method known in the art, or may be entrusted to a primer synthesis company.
3. Diluting the obtained joint into working solution for later use.
4. Extraction of sample DNA
The patient's peripheral EDTA anticoagulated blood was withdrawn in 10ml and the plasma was freshly centrifuged and the plasma DNA extracted according to methods well known to those skilled in the art.
5. DNA end repair
The extracted DNA solution and the mixed solution of the end-repairing reagent are mixed, and the mixture is reacted according to an end-repairing method well known to those skilled in the art, and then separated and purified after the reaction is finished.
5.1 the following reaction system was formulated in a 1.5ml EP tube:
reagent
|
Volume/ul
|
DNA
|
50
|
10 XPNK buffer
|
5
|
dNTP solution (10mM)
|
2
|
T4DNA polymerase
|
1
|
T4PNK
|
1
|
KLENOW fragment (10-fold dilution)
|
1
|
Total volume/ul
|
50 |
And (3) uniformly mixing at room temperature, slightly centrifuging, placing the reaction system in a PCR instrument, reacting for 30 minutes at 20 ℃, and purifying by using AMpure XP magnetic beads after the reaction is finished.
5.2 add 90ul magnetic beads to 50ul system reaction product, after AMpure XP magnetic beads purification, repeatedly wash twice with 500ul 75% ethanol, discard supernatant. Drying at 37 ℃ until the magnetic beads are dried. Add 23ul of water, mix the beads well, and suck 22ul of supernatant after clarification.
6. Coupling reaction
And (3) mixing the DNA solution with the repaired tail end with the working solution containing the joint of the combined label and the mixed solution of the connecting reaction reagent obtained in the step (3), reacting according to a joint adding method well known by a person skilled in the art, and separating and purifying after the reaction is finished.
6.1 preparing a reaction solution from the solution obtained in the step 5 according to the following system:
and (3) uniformly mixing at room temperature, slightly centrifuging, placing the reaction system in a PCR instrument, reacting for 30 minutes at 20 ℃, and purifying by using AMpure XP magnetic beads after the reaction is finished.
6.2 magnetic bead purification was carried out by the method shown in 5.2, except that 75. mu.l of magnetic beads were added to 50. mu.l of the reaction product in the system, and the reaction product was washed twice with 500. mu.l of 75% ethanol, and the supernatant was discarded. Drying at 37 ℃ until the magnetic beads are dried. Add 36ul of water, mix the beads well, and aspirate 34.5ul of supernatant after clarification.
7. PCR enrichment and sequencing library construction
Mixing the DNA added with the joint and the mixed solution of the PCR reaction reagent uniformly, carrying out PCR reaction according to a method well known by a person skilled in the art, carrying out separation and purification after the reaction is finished, carrying out QC detection on the library after the library is constructed, and waiting for sequencing after the library is qualified.
7.1 reaction solutions were prepared in 1 new PCR tube according to the following system:
reagent
|
Volume/ul
|
DNA
|
34.5
|
10×PfxAmplification buffer
|
5
|
dNTP solution (10mM)
|
5
|
MgSO4(50mM)
|
2
|
PCR primer PE1(10pmol/ul)
|
4
|
PCR primer PE2(10pmol/ul)
|
4
|
Pfx DNA polymerase
|
1
|
Total volume/ul
|
50 |
Mixing evenly at room temperature, slightly centrifuging, placing the reaction system in a PCR instrument, and reacting according to the following conditions:
after the reaction was completed, purification was performed using AMpure XP magnetic beads.
7.2 magnetic bead purification was carried out by the method shown in 5.2, except that 50. mu.l of magnetic beads were added to 50. mu.l of the reaction product in the 50. mu.l system. The library construction is finished.
8. Library quality inspection
QPCR and Agilent 2100 detection are carried out on the library, and qualified library quality inspection is arranged on a computer.
9. DNA sequencing of the library
The library can be sequenced using a second generation sequencer such as Ion Torrent Proton, Ion Torrent PGM, and the like.
10. Analysis of sequencing results
Analyzing the sequencing result of the DNA obtained after sequencing, classifying the obtained DNA sequences according to the combined labels, and taking the sequences carrying the same combined labels as 1 'molecular cluster', wherein the molecular cluster is 1 type of DNA formed by PCR of the initial 1 DNA molecule, namely the 'copied strand' of the positive strand and the negative strand of the original DNA molecule.
The base type of each base position in the molecular cluster and the frequency of the occurrence of the base type are counted.
Based on the data analysis, errors due to PCR and sequencing were found and corrected.
Thus obtaining the correct sequence of the original DNA, and finding out the real mutation sequence through the interior of the molecular cluster and parallel comparison.
Example 2
The method for determining the low frequency mutation-containing nucleic acid sequence in the target region of the sample to be tested is basically the same as that in example 1, except that 2 bases of the library tag and 1 base of the molecular tag are arranged in a cross manner in step 1.
As shown in table 3 below:
linker P1 sequence 5 '-3':
SEQ ID NO 46:CCTCTCTATGGGCAGTCGGTGAT。
wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
Example 3
The method for determining that the target region of the sample to be detected contains the low-frequency mutation nucleic acid sequence is basically the same as that in the embodiment 1, the difference is that 1-2 bases of the library tag and 1-2 bases of the molecular tag are arranged in a cross mode in the step 1.
As shown in table 4 below:
linker P1 sequence 5 '-3':
SEQ ID NO 59:CCTCTCTATGGGCAGTCGGTGAT。
wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
Example 4
The method for determining that the target region of the sample to be detected contains the low-frequency mutation nucleic acid sequence is basically the same as that in the embodiment 1, the difference is that in the step 1, 1-2 bases of the library tag and 2-3 bases of the molecular tag are arranged in a cross mode.
As shown in table 5 below:
linker P1 sequence 5 '-3':
SEQ ID NO 72:CCTCTCTATGGGCAGTCGGTGAT。
wherein underlined bases are molecular tag sequences and non-underlined bases are library tag sequences.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention shall fall within the protection scope defined by the claims of the present invention.