WO2019010775A1 - Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation - Google Patents

Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation Download PDF

Info

Publication number
WO2019010775A1
WO2019010775A1 PCT/CN2017/100421 CN2017100421W WO2019010775A1 WO 2019010775 A1 WO2019010775 A1 WO 2019010775A1 CN 2017100421 W CN2017100421 W CN 2017100421W WO 2019010775 A1 WO2019010775 A1 WO 2019010775A1
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
molecular tag
acid sequence
molecular
linker
Prior art date
Application number
PCT/CN2017/100421
Other languages
French (fr)
Chinese (zh)
Inventor
曾晓静
高晓峘
韩颖鑫
张印新
何哲
王佳伟
夏伟成
李胜
Original Assignee
广州精科医学检验所有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州精科医学检验所有限公司 filed Critical 广州精科医学检验所有限公司
Publication of WO2019010775A1 publication Critical patent/WO2019010775A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Definitions

  • the present invention relates to the field of nucleic acid sequencing technology. Specifically, the present invention relates to a molecular tag and a composition thereof, a molecular tag-containing linker and a composition thereof, and a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence.
  • High-throughput sequencing is currently the most widely used sequencing technology.
  • the incidence is 0.1%-0.2% or higher, and the DNA polymerase used in the PCR process is also wrong.
  • the rate and error rate are 10-7 to 10-5, especially as the number of PCR cycles increases, the error rate also increases.
  • Each position of the molecular tag can be one of four bases A, T, C, and G.
  • the length of the molecular tag is selected according to actual experimental needs. According to the length of the molecular tag and the change of four bases, the molecular tag There can be 4 n-th power types. If the molecular tags of the original template are completely randomly distributed, the diversity of the molecular tags ensures that each original template is unique after the molecular tag is attached to the original library.
  • each original template will act as The initial template forms a cluster of "molecular clusters". If there are no sequencing errors and PCR errors, the molecular sequences in each cluster are the error-free "replication strands" of the original template positive and negative strands.
  • the base sequences at each position of the molecular tag are completely randomly distributed.
  • the same amount of A, T, The four bases of C and G because the energy or synthesis efficiency required for the synthesis of these four bases is different, the frequency of occurrence of the four bases A, T, C, and G at each position is not completely equal.
  • the object of the present invention is to provide a molecular tag which has a completely random distribution of bases by optimizing the design of the molecular tag, and a molecular tag composition having a ratio of 0.95-1.05:1 for each molecular tag, and using the molecular tag and The linker synthesized by the composition was constructed and sequenced to effectively distinguish between sequencing errors and low frequency mutations.
  • the invention provides a molecular tag having up to two consecutive identical bases.
  • Another aspect of the present invention also provides a molecular tag composition comprising the above molecular tag, and the ratio of each molecular tag is from 0.95 to 1.05:1.
  • Another aspect of the present invention also provides a linker comprising the above molecular tag, And the molecular tag is located at any position other than the overhang "T" of the linker and the 20 bp base of the non-overhanging end.
  • Another aspect of the present invention also provides a linker composition comprising the above-described linker, and the ratio of each linker is from 0.95 to 1.05:1.
  • Another aspect of the present invention also provides a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence, comprising the steps of:
  • the molecular tag provided by the present invention does not have a plurality of identical bases in succession, thereby avoiding a situation in which the sequencing quality is poor due to the appearance of a plurality of consecutive bases. And various kinds of molecular tags inside The proportion of the labels is the same, avoiding the situation of the dominant labels, and maximizing the performance of the molecular labels.
  • FIG. 1 is a schematic view showing the structure of a molecular tag in a fully complementary double link head according to an embodiment of the present invention.
  • FIG. 2 is a schematic view showing the structure of a molecular tag in a Y-type connector with one end complementary to one end at the complementary end in the embodiment of the present invention.
  • FIG. 3 is a schematic view showing the structure of a molecular tag in a Y-type connector with one end complementary to one end in an open end.
  • Figure 4 is a schematic illustration of a Y-type structure in which a molecular tag is not located on a linker, but a linker can be introduced by PCR, in accordance with an embodiment of the present invention.
  • FIG. 5 is a flow chart of a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence according to an embodiment of the present invention.
  • the present invention provides a molecular tag having up to two consecutive identical bases on the molecular tag.
  • the molecular tag is a single strand or a reverse complementary double strand.
  • the number of bases of the molecular tag is 6-24 bp.
  • the present invention also provides a molecular tag composition
  • a molecular tag composition comprising the molecular tag as described above, and the ratio of each molecular tag is from 0.95 to 1.05:1.
  • the ratio includes at least one of a molar ratio, a molecular mass ratio, and a molecular ratio.
  • the number of species of the molecular tag comprises 4n and n is equal to 6-24.
  • n is equal to 6-24.
  • 4096, 16384, 65536, 262144, 16777216, 268435456, and even more types can be designed.
  • the molecular tag when the molecular tag is a single-stranded structure, the molecular tag is in a ratio of a molar number of 0.95-1.05:1, or a molecular mass of 0.95-1.05:1, or a molecular number of 0.95-1.05:1.
  • the ratio is mixed.
  • the molecular tag when the molecular tag is a single-stranded structure, the molecular tag is mixed in a ratio of 1:1 molar ratio, or a molecular mass ratio of 1:1, or a molecular number of 1:1.
  • the single-stranded molecular tag is firstly matched according to the ratio of the molar number of 0.95-1.05:1, or the molecular mass of 0.95-1.05:1, or the ratio of the number of molecules of 0.95-1.05:1.
  • the reverse complementary sequences are annealed to form a molecular tag of a double-stranded structure, and these double-stranded molecular tags are then mixed in a ratio of 0.95-1.05:1.
  • the single-stranded molecular tag is firstly reversed according to a molar ratio of 1:1, or a molecular mass of 1:1, or a molecular number of 1:1.
  • the complementary sequences are annealed to form a molecular tag of the double-stranded structure, and the double-stranded molecular tags are then mixed in a ratio of 1:1.
  • the invention also provides for the use of the molecular tag composition for correcting sequencing errors and PCR errors, detecting low frequency mutations, de-redundancy, and calculating the number of specific molecules or cells carrying a particular molecule.
  • Another aspect of the invention provides a linker comprising a molecule as described above A tag, and the molecular tag is located at any position other than the overhang "T" of the linker and the 20 bp base of the non-overhang end.
  • the molecular tag "NNN...NNN” may be located at the 3' end, 5' of the fully complementary duplex of the linker. End or middle, anywhere except the overhang "T” and non-overhanging end blocks, which are 20 bp base length.
  • the molecular label "NNN...NNN” may be located at one end of the joint Y-shaped structure, open One end or the middle, except for the protruding end "T” and any position other than the 20 bp base of the non-overhanging end.
  • the molecular tag may not be located on the linker, but may be introduced into the Y-type structure of the linker by PCR.
  • the molecular tag may also be located at two or more positions of the joint.
  • the linker further comprises a library tag linked to the 3' or 5' end of the molecular tag.
  • the library tag is used to distinguish different sample libraries, and after PCR amplification, PCR products of multiple samples are mixed and sequenced, and samples of each sequence are further based on library tags. Sources are distinguished.
  • the linker further comprises an identifying characteristic sequence, the identifying characteristic sequence being 4 non-repeating bases, for example: "ATCG” or “TGAC”, the identifying feature sequence and the The 3' or 5' end of the molecular tag is linked.
  • Another aspect of the present invention also provides a linker composition comprising a linker as described above, and the ratio of each linker is from 0.95 to 1.05:1.
  • the types of the joints include:
  • the one end of the molecular tag is complementary to the Y-type open end
  • the molecular tag is not located on the linker, but can be introduced into the Y-type structure of the linker by PCR.
  • a joint having a molecular tag at two or more positions is also included.
  • the ratio of the joint is at least one of a molar ratio, a molecular mass ratio, and a molecular ratio of each type of joint.
  • a further aspect of the present invention provides a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence, as shown in FIG. 5, comprising the steps of:
  • said step S6 further comprises filtering sequencing errors brought in by PCR and sequencing.
  • the data analysis can be performed using statistical analysis methods well known to those skilled in the art, such as Z test, T test, run test, and the like.
  • Molecular tags are designed according to the possibility of random distribution of 4 bases per position, and the molecular tag contains up to 2 consecutive identical bases. According to the needs of the experiment, different kinds of molecular labels M were designed.
  • the number of species of the molecular tag sequence includes 4n, and n is equal to 6-24. As shown in Table 1, 16 molecular labels:
  • a linker containing any of the above molecular tags is designed, wherein the molecular tag can be located at any position other than the overhang "T" of the linker and the 20 bp base of the non-overhang end.
  • NNN...NNN represents a molecular tag
  • the type of the linker may be a fully complementary double-stranded structure, a Y-type structure in which one end is complementary to one end, or may be PCR-
  • the molecular tag is introduced into the Y-type structure of the linker.
  • the molecular tag may be located at either or both ends of the linker, or may be distributed at two or more positions.
  • the number of N represents the number of bases of the molecular tag, and the number of molecular tags required increases the number of bases at the position.
  • the number for example, the number of bases of 8 bp, 12 bp, 16 bp, 24 bp or more.
  • 16 kinds of linkers containing different molecular tags As shown in Table 2, 16 kinds of linkers containing different molecular tags:
  • Identification signature sequences and/or library tags can also be added at the 3' or 5' end of the molecular tag as desired for the experiment. For example, when sequencing using the illumina platform, an index sequence that identifies different samples can be added to it.
  • the designed molecular tag or its corresponding reverse complement sequence and its sequence at the 3' end and the 5' end are synthesized to obtain a linker containing the molecular tag.
  • synthetic methods can be employed in methods well known in the art or can be commissioned by a primer synthesis company.
  • the synthesized molecular tag-containing linkers were mixed at a molar ratio of 1:1 for different types.
  • each type of joint is mixed in a ratio of 1:1.
  • the patient's peripheral EDTA anticoagulation was taken 10 ml, and the plasma was separated by fresh centrifugation, and plasma DNA was extracted according to a method well known to those skilled in the art.
  • the extracted DNA solution and the end-repaired reagent mixture are mixed, and the reaction is carried out according to a method of terminal repair well known to those skilled in the art, and the reaction is separated and purified.
  • reaction system After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine, and reacted at 20 ° C for 30 minutes. After the reaction was completed, it was purified using AMpure XP magnetic beads.
  • terminal-repaired DNA solution and the "A"-added reagent mixture are mixed, and the reaction is carried out according to the method of adding "A" at the end well known to those skilled in the art, and the reaction is separated and purified.
  • reaction system After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine, and reacted at 37 ° C for 30 minutes. After the reaction was completed, it was purified using AMpure XP magnetic beads.
  • Magnetic bead purification was carried out by the method shown in 5.2, except that 75 ul of magnetic beads were added to the 50 ul system reaction product.
  • the DNA solution after the addition of "A” is mixed with the molecular tag-containing linker and the reaction reagent mixture obtained in the step S3, and the reaction is carried out according to a method of adding a linker well known to those skilled in the art, and after the completion of the reaction, separation and purification are carried out.
  • reaction system After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine, and reacted at 20 ° C for 15 minutes. After the reaction was completed, it was purified using AMpure XP magnetic beads.
  • Magnetic bead purification was carried out using the method shown in 5.2, except that 75 ul of magnetic beads were added to the 50 ul system reaction product.
  • the DNA after the addition of the linker and the PCR reaction reagent mixture are mixed, and the PCR reaction is carried out according to a method well known to those skilled in the art. After the reaction is completed, the separation and purification are carried out. After the completion of the library construction, the library is subjected to QC detection, and the test is waited after passing the test. Sequencing.
  • Reagent Volume /ul DNA 32 10 ⁇ Pfx amplification buffer 5 dNTP solution (10nM) 2 MgSO 4 (50nM) 2 PCR primer PE1.0 (10pmol/ul) 4 index-X (10pmol/ul) 4 Pfx DNA polymerase 1 Total volume / ul 50
  • reaction system After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine and reacted according to the following conditions:
  • Magnetic bead purification was carried out using the method shown in 5.2, except that 50 ul of magnetic beads were added to the 50 ul system reaction product. The library construction is over.
  • the library was subjected to QPCR and Agilent 2100 detection, and the quality-qualified library was arranged on the machine.
  • the library can be sequenced using a second generation sequencer such as Hiseq2000, Hiseq 2500, Proton, Miseq, NS500.
  • a second generation sequencer such as Hiseq2000, Hiseq 2500, Proton, Miseq, NS500.
  • the sequencing results of the DNA obtained after sequencing are analyzed, and the obtained DNA sequences are classified according to molecular tags, and the sequence carrying the same molecular tag is taken as a "molecular cluster" which is the initial one DNA molecule.

Abstract

The present invention provides a molecular tag and a composition thereof, a joint containing the molecular tag and a composition thereof, as well as a method for determining nucleotide sequence containing low-frequency mutation in a target region of a sample to be detected. The molecular tag at most contains two continuous and same bases.

Description

分子标签、接头及确定含有低频突变核酸序列的方法Molecular tags, linkers, and methods for determining sequences containing low frequency mutant nucleic acids 技术领域Technical field
本发明涉及核酸测序技术领域,具体的,本发明涉及分子标签及其组合物、含有分子标签的接头及其组合物、确定待测样本目标区域含有低频突变核酸序列的方法。The present invention relates to the field of nucleic acid sequencing technology. Specifically, the present invention relates to a molecular tag and a composition thereof, a molecular tag-containing linker and a composition thereof, and a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence.
背景技术Background technique
高通量测序是目前应用范围最广的测序技术,然而其在测序中仍不可避免的存在一些测序错误,发生率为0.1%-0.2%或者更高,并且PCR过程使用的DNA聚合酶也有错误率,错误率为10-7~10-5,特别是随着PCR循环数的增加错误率也有所增加。High-throughput sequencing is currently the most widely used sequencing technology. However, there are still some sequencing errors in the sequencing, the incidence is 0.1%-0.2% or higher, and the DNA polymerase used in the PCR process is also wrong. The rate and error rate are 10-7 to 10-5, especially as the number of PCR cycles increases, the error rate also increases.
为了检测低于0.1%的碱基突变(低频突变)或测序错误,学者发明了分子标签的方法,分子标签是在PCR之前给每个测序模板的一端或者两端加入一段特殊的序列。分子标签的每个位置可以是A、T、C、G 4种碱基中的1种,分子标签的长度根据实际的实验需要选择,根据分子标签的长度及4种碱基的变化,分子标签可以有4的n次方种类。如果原始模板的分子标签是完全随机分布的,那分子标签的多样性能够保证每个原始模板在原始文库中连上分子标签后是独一无二的,在之后的PCR过程中,每个原始模板会作为初始模板形成一簇“分子簇”,如果没有测序错误和PCR错误,这各簇中的分子序列都是初始模板正链和负链的无错误“复制链”。In order to detect base mutations (low frequency mutations) or sequencing errors of less than 0.1%, scholars have invented a method of molecular tagging by adding a special sequence to one or both ends of each sequencing template prior to PCR. Each position of the molecular tag can be one of four bases A, T, C, and G. The length of the molecular tag is selected according to actual experimental needs. According to the length of the molecular tag and the change of four bases, the molecular tag There can be 4 n-th power types. If the molecular tags of the original template are completely randomly distributed, the diversity of the molecular tags ensures that each original template is unique after the molecular tag is attached to the original library. In the subsequent PCR process, each original template will act as The initial template forms a cluster of "molecular clusters". If there are no sequencing errors and PCR errors, the molecular sequences in each cluster are the error-free "replication strands" of the original template positive and negative strands.
理论上,分子标签的每个位置的碱基序列是完全随机分布的。然而,在引物合成过程中,合成某一碱基时,会加入等量的A、T、 C、G四种碱基,由于这四种碱基合成所需的能量或者合成效率不一样,使得每个位置上A、T、C、G四种碱基的出现频率并不是完全相等的。这样会造成部分的碱基处于优势地位,导致了分子标签中并不是每个位置都遵循A、T、C、G四种碱基随机分布的概率,并且会出现优势分子序列,甚至会出现多个连续一样的碱基,例如8个A、8个G等,从而导致实际上得到的随机分子标签种类并没有理论上那么多。In theory, the base sequences at each position of the molecular tag are completely randomly distributed. However, in the synthesis of primers, when a certain base is synthesized, the same amount of A, T, The four bases of C and G, because the energy or synthesis efficiency required for the synthesis of these four bases is different, the frequency of occurrence of the four bases A, T, C, and G at each position is not completely equal. This will cause some of the bases to be in a dominant position, resulting in the fact that not every position in the molecular tag follows the probability of random distribution of four bases A, T, C, and G, and there will be a dominant molecular sequence, and even more The same consecutive bases, for example, 8 A, 8 G, etc., result in the number of random molecular tag types actually obtained is not as theoretical.
多个连续一样的碱基不仅会增加测序错误的可能性,也会增加优势分子序列的比例。由于比例不随机,使得某几种甚至更多的分子连上了同一种标签序列。当这些连上同一种标签序列的分子属于同源性高或者序列十分相似的情况下,技术人员无法区别判断属于测序错误和低频突变的分子。更进一步的,当低频突变和正常丰度的序列连上一样的分子克隆时会导致将低频突变当成测序错误或PCR错误从而漏检。因此分子标签的不随机性会降低其效用,甚至限制了其应用。Multiple consecutive identical bases not only increase the likelihood of sequencing errors, but also increase the proportion of dominant molecular sequences. Because the ratio is not random, some or even more molecules are linked to the same tag sequence. When these molecules linked to the same tag sequence are highly homologous or the sequences are very similar, the skilled person cannot distinguish between molecules belonging to sequencing errors and low frequency mutations. Further, when the low-frequency mutation and the normal abundance sequence are linked to the same molecular cloning, the low-frequency mutation is caused to be a sequencing error or a PCR error to be missed. Therefore, the non-randomness of the molecular label will reduce its utility and even limit its application.
发明内容Summary of the invention
本发明的目的在于,通过优化分子标签的设计,提供一种碱基完全随机分布的分子标签,及每种分子标签的比例均为0.95-1.05:1的分子标签组合物,利用该分子标签及其组合物合成的接头进行文库构建并对其进行测序,从而有效地区分测序错误和低频突变。The object of the present invention is to provide a molecular tag which has a completely random distribution of bases by optimizing the design of the molecular tag, and a molecular tag composition having a ratio of 0.95-1.05:1 for each molecular tag, and using the molecular tag and The linker synthesized by the composition was constructed and sequenced to effectively distinguish between sequencing errors and low frequency mutations.
本发明一方面提供一种分子标签,所述分子标签上最多含有2个连续相同的碱基。In one aspect, the invention provides a molecular tag having up to two consecutive identical bases.
本发明另一方面还提供一种分子标签组合物,含有上述分子标签,且每种分子标签的比例为0.95-1.05:1。Another aspect of the present invention also provides a molecular tag composition comprising the above molecular tag, and the ratio of each molecular tag is from 0.95 to 1.05:1.
本发明另一方面还提供一种接头,所述接头含有上述分子标签, 且所述分子标签位于所述接头除突出端“T”和非突出端末端20bp碱基以外的任意位置。Another aspect of the present invention also provides a linker comprising the above molecular tag, And the molecular tag is located at any position other than the overhang "T" of the linker and the 20 bp base of the non-overhanging end.
本发明另一方面还提供一种接头组合物,含有上述接头,且每种接头的比例为0.95-1.05:1。Another aspect of the present invention also provides a linker composition comprising the above-described linker, and the ratio of each linker is from 0.95 to 1.05:1.
本发明另一方面还提供一种确定待测样本目标区域含有低频突变核酸序列的方法,包括如下步骤:Another aspect of the present invention also provides a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence, comprising the steps of:
S1、利用如上所述的接头,对待测样本目标区域核酸进行加接头反应,对加接头后的待测样本目标区域核酸进行PCR扩增,获得扩增产物,所述扩增产物构成所述待测样本的目标区域核酸测序文库;S1, using the linker as described above, performing a linker reaction on the nucleic acid of the target region of the sample to be tested, and performing PCR amplification on the nucleic acid of the target region of the sample to be tested after the addition of the linker to obtain an amplification product, and the amplification product constitutes the Measuring a target region nucleic acid sequencing library of the sample;
S2、对所述待测样本的目标区域核酸测序文库进行测序,获得测序后核酸序列;S2, sequencing the target region nucleic acid sequencing library of the sample to be tested, and obtaining the sequenced nucleic acid sequence;
S3、将所述测序后核酸序列按照所述接头中含有的分子标签进行分类,将携带有相同分子标签的所述测序后的核酸序列归类为同一核酸序列集;S3, classifying the sequenced nucleic acid sequence according to the molecular tag contained in the linker, and classifying the sequenced nucleic acid sequence carrying the same molecular tag into the same nucleic acid sequence set;
S4、将所述核酸序列集内的测序后核酸序列进行相互比较,统计所述核酸序列集中每个碱基位置的碱基种类及其频率;S4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base types and frequencies of each base position in the nucleic acid sequence set;
S5、根据所述核酸序列集中每个碱基位置的碱基种类及其频率,通过数据分析,得到所述核酸序列集中含有正确的碱基排列位置的核酸序列;S5, according to the base type and frequency of each base position in the nucleic acid sequence set, by data analysis, obtaining a nucleic acid sequence in which the nucleic acid sequence contains a correct base arrangement position;
S6、将所述含有正确的碱基排列位置的核酸序列与所述核酸序列集中的其余的核酸序列或平行的核酸序列集中的核酸序列进行比较,得到含有低频突变的核酸序列。S6: Comparing the nucleic acid sequence containing the correct base arrangement position with the nucleic acid sequence of the remaining nucleic acid sequence or the parallel nucleic acid sequence set in the nucleic acid sequence to obtain a nucleic acid sequence containing a low frequency mutation.
本发明所提供的分子标签没有连续多个相同的碱基,避免由于多个连续碱基出现导致测序质量差的情况。并且分子标签内部各种 标签的比例一致,避免出现优势标签的情况,能够最大程度发挥分子标签的效能。The molecular tag provided by the present invention does not have a plurality of identical bases in succession, thereby avoiding a situation in which the sequencing quality is poor due to the appearance of a plurality of consecutive bases. And various kinds of molecular tags inside The proportion of the labels is the same, avoiding the situation of the dominant labels, and maximizing the performance of the molecular labels.
附图说明DRAWINGS
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments herein
图1为本发明实施例中完全互补双链接头中分子标签结构示意图。1 is a schematic view showing the structure of a molecular tag in a fully complementary double link head according to an embodiment of the present invention.
图2为本发明实施例中一端互补一端开放的Y型接头中分子标签位于互补端的结构示意图。2 is a schematic view showing the structure of a molecular tag in a Y-type connector with one end complementary to one end at the complementary end in the embodiment of the present invention.
图3为本发明实施例中一端互补一端开放的Y型接头中分子标签位于开放端的结构示意图。3 is a schematic view showing the structure of a molecular tag in a Y-type connector with one end complementary to one end in an open end.
图4为本发明实施例中分子标签不位于接头上,但可通过PCR引入接头的Y型结构的示意图。Figure 4 is a schematic illustration of a Y-type structure in which a molecular tag is not located on a linker, but a linker can be introduced by PCR, in accordance with an embodiment of the present invention.
图5为本发明实施例中确定待测样本目标区域含有低频突变核酸序列的方法流程图。FIG. 5 is a flow chart of a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence according to an embodiment of the present invention.
具体实施方式Detailed ways
下面详细描述本发明的实施例。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below. The embodiments described below with reference to the accompanying drawings are intended to be illustrative of the invention and are not to be construed as limiting.
需要说明的是,在本发明的描述中,除非另有说明,“多个”的含义是两个或两个以上。It should be noted that in the description of the present invention, the meaning of "a plurality" is two or more unless otherwise specified.
本发明提供一种分子标签,所述分子标签上最多含有2个连续相同的碱基。The present invention provides a molecular tag having up to two consecutive identical bases on the molecular tag.
根据本发明的实施例,所述分子标签为单链或反向互补的双链。 According to an embodiment of the invention, the molecular tag is a single strand or a reverse complementary double strand.
根据本发明的实施例,所述分子标签的碱基数目为6-24bp。According to an embodiment of the invention, the number of bases of the molecular tag is 6-24 bp.
本发明还提供一种分子标签组合物,含有如上所述的分子标签,且每种分子标签的比例为0.95-1.05:1。The present invention also provides a molecular tag composition comprising the molecular tag as described above, and the ratio of each molecular tag is from 0.95 to 1.05:1.
根据本发明的实施例,所述比例包括摩尔比、分子质量比、分子数比的至少之一。According to an embodiment of the present invention, the ratio includes at least one of a molar ratio, a molecular mass ratio, and a molecular ratio.
根据本发明的实施例,所述分子标签的种类数包括4n,n等于6-24。例如根据实验需要,可以设计出4096、16384、65536、262144、16777216、268435456种,甚至更多的种类。According to an embodiment of the invention, the number of species of the molecular tag comprises 4n and n is equal to 6-24. For example, according to experimental needs, 4096, 16384, 65536, 262144, 16777216, 268435456, and even more types can be designed.
根据本发明的实施例,当分子标签是单链的结构,则将分子标签按照摩尔数0.95-1.05:1的比例,或分子质量0.95-1.05:1的比例,或分子数0.95-1.05:1的比例混合。优选的,当分子标签是单链的结构,则将分子标签按照摩尔数1:1的比例,或分子质量1:1的比例,或分子数1:1的比例混合。According to an embodiment of the present invention, when the molecular tag is a single-stranded structure, the molecular tag is in a ratio of a molar number of 0.95-1.05:1, or a molecular mass of 0.95-1.05:1, or a molecular number of 0.95-1.05:1. The ratio is mixed. Preferably, when the molecular tag is a single-stranded structure, the molecular tag is mixed in a ratio of 1:1 molar ratio, or a molecular mass ratio of 1:1, or a molecular number of 1:1.
当分子标签是双链的结构,先将单链的分子标签按照摩尔数0.95-1.05:1的比例,或分子质量0.95-1.05:1的比例,或分子数0.95-1.05:1的比例与对应的反向互补的序列进行退火互补形成双链结构的分子标签,再将这些双链分子标签按照0.95-1.05:1的比例混合。优选的,当分子标签是双链的结构,先将单链的分子标签按照摩尔数1:1的比例,或分子质量1:1的比例,或分子数1:1的比例与对应的反向互补的序列进行退火互补形成双链结构的分子标签,再将这些双链分子标签按照1:1的比例混合。When the molecular tag is a double-stranded structure, the single-stranded molecular tag is firstly matched according to the ratio of the molar number of 0.95-1.05:1, or the molecular mass of 0.95-1.05:1, or the ratio of the number of molecules of 0.95-1.05:1. The reverse complementary sequences are annealed to form a molecular tag of a double-stranded structure, and these double-stranded molecular tags are then mixed in a ratio of 0.95-1.05:1. Preferably, when the molecular tag is a double-stranded structure, the single-stranded molecular tag is firstly reversed according to a molar ratio of 1:1, or a molecular mass of 1:1, or a molecular number of 1:1. The complementary sequences are annealed to form a molecular tag of the double-stranded structure, and the double-stranded molecular tags are then mixed in a ratio of 1:1.
本发明还提供所述分子标签组合物,在纠正测序错误和PCR错误、检测低频突变、去冗余以及计算特定分子或携带有特定分子的细胞数量中的应用。The invention also provides for the use of the molecular tag composition for correcting sequencing errors and PCR errors, detecting low frequency mutations, de-redundancy, and calculating the number of specific molecules or cells carrying a particular molecule.
本发明另一方面提供一种接头,所述接头含有如上所述的分子 标签,且所述分子标签位于所述接头除突出端“T”和非突出端末端20bp碱基以外的任意位置。Another aspect of the invention provides a linker comprising a molecule as described above A tag, and the molecular tag is located at any position other than the overhang "T" of the linker and the 20 bp base of the non-overhang end.
根据本发明的实施例,如图1所示,当所述接头为完全互补的双链结构时,所述分子标签“NNN…NNN”可位于接头完全互补双链中的3’端、5’端或中间,除突出端“T”和非突出端末端方框以外的任意位置,所述方框内为20bp碱基长度。According to an embodiment of the present invention, as shown in FIG. 1, when the linker is a fully complementary double-stranded structure, the molecular tag "NNN...NNN" may be located at the 3' end, 5' of the fully complementary duplex of the linker. End or middle, anywhere except the overhang "T" and non-overhanging end blocks, which are 20 bp base length.
根据本发明的实施例,如图2和3所示,当所述接头为一端互补一端开放的Y型结构时,所述分子标签“NNN…NNN”可位于接头Y型结构互补的一端、开放的一端或中间,除突出端“T”和非突出端末端20bp碱基以外的任意位置。According to an embodiment of the present invention, as shown in FIGS. 2 and 3, when the joint is a Y-shaped structure in which one end is complementary to one end, the molecular label "NNN...NNN" may be located at one end of the joint Y-shaped structure, open One end or the middle, except for the protruding end "T" and any position other than the 20 bp base of the non-overhanging end.
根据本发明的实施例,如图4所示,所述分子标签还可以不位于接头上,但可通过PCR引入接头的Y型结构中。According to an embodiment of the present invention, as shown in FIG. 4, the molecular tag may not be located on the linker, but may be introduced into the Y-type structure of the linker by PCR.
进一步的,根据本发明的实施例,所述分子标签还可以位于接头的2个或2个以上的位置。Further, according to an embodiment of the present invention, the molecular tag may also be located at two or more positions of the joint.
根据本发明的实施例,所述接头还含有文库标签,所述文库标签与所述分子标签的3’端或5’端相连。According to an embodiment of the invention, the linker further comprises a library tag linked to the 3' or 5' end of the molecular tag.
本领域技术人员可以理解的,所述文库标签用于区分不同样品文库,能够在进行PCR扩增后,将多个样本的PCR产物进行混合测序,进而基于文库标签的不同,对各序列的样本来源进行区分。As can be understood by those skilled in the art, the library tag is used to distinguish different sample libraries, and after PCR amplification, PCR products of multiple samples are mixed and sequenced, and samples of each sequence are further based on library tags. Sources are distinguished.
根据本发明的实施例,所述接头还含有识别性特征序列,所述识别性特征序列为4个不重复的碱基,例如:“ATCG”或“TGAC”,所述识别性特征序列与所述分子标签的3’端或5’端相连。According to an embodiment of the invention, the linker further comprises an identifying characteristic sequence, the identifying characteristic sequence being 4 non-repeating bases, for example: "ATCG" or "TGAC", the identifying feature sequence and the The 3' or 5' end of the molecular tag is linked.
本发明另一方面还提供一种接头组合物,所述接头组合物含有如上所述的接头,且每种接头的比例为0.95-1.05:1。Another aspect of the present invention also provides a linker composition comprising a linker as described above, and the ratio of each linker is from 0.95 to 1.05:1.
根据本发明的一些具体示例,所述接头的种类包括: According to some specific examples of the invention, the types of the joints include:
如图1所示,含有分子标签的完全互补的双链接头;As shown in Figure 1, a fully complementary double-linker containing a molecular tag;
如图2和3所示,含有分子标签的一端互补一端开放的Y型接头;As shown in Figures 2 and 3, the one end of the molecular tag is complementary to the Y-type open end;
以及如图4所示,分子标签不位于接头上,但可通过PCR引入接头的Y型结构中。And as shown in Figure 4, the molecular tag is not located on the linker, but can be introduced into the Y-type structure of the linker by PCR.
根据本发明的实施例,还包括分子标签位于2个或2个以上的位置的接头。According to an embodiment of the invention, a joint having a molecular tag at two or more positions is also included.
根据本发明的实施例,所述接头的比例为各个种类的接头的摩尔比、分子质量比、分子数比的至少之一。According to an embodiment of the present invention, the ratio of the joint is at least one of a molar ratio, a molecular mass ratio, and a molecular ratio of each type of joint.
本发明再一方面还提供一种确定待测样本目标区域含有低频突变核酸序列的方法,如图5所示,包括如下步骤:A further aspect of the present invention provides a method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence, as shown in FIG. 5, comprising the steps of:
S1、利用如上所述的接头,对待测样本目标区域核酸进行加接头反应,对加接头后的待测样本目标区域核酸进行PCR扩增,获得扩增产物,所述扩增产物构成所述待测样本的目标区域核酸测序文库;S1, using the linker as described above, performing a linker reaction on the nucleic acid of the target region of the sample to be tested, and performing PCR amplification on the nucleic acid of the target region of the sample to be tested after the addition of the linker to obtain an amplification product, and the amplification product constitutes the Measuring a target region nucleic acid sequencing library of the sample;
S2、对所述待测样本的目标区域核酸测序文库进行测序,获得测序后核酸序列;S2, sequencing the target region nucleic acid sequencing library of the sample to be tested, and obtaining the sequenced nucleic acid sequence;
S3、将所述测序后核酸序列按照所述接头中含有的分子标签进行分类,将携带有相同分子标签的所述测序后的核酸序列归类为同一核酸序列集;S3, classifying the sequenced nucleic acid sequence according to the molecular tag contained in the linker, and classifying the sequenced nucleic acid sequence carrying the same molecular tag into the same nucleic acid sequence set;
S4、将所述核酸序列集内的测序后核酸序列进行相互比较,统计所述核酸序列集中每个碱基位置的碱基种类及其频率;S4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base types and frequencies of each base position in the nucleic acid sequence set;
S5、根据所述核酸序列集中每个碱基位置的碱基种类及其频率,通过数据分析,得到所述核酸序列集中含有正确的碱基排列位置的核酸序列; S5, according to the base type and frequency of each base position in the nucleic acid sequence set, by data analysis, obtaining a nucleic acid sequence in which the nucleic acid sequence contains a correct base arrangement position;
S6、将所述含有正确的碱基排列位置的核酸序列与所述核酸序列集中的其余的核酸序列或平行的核酸序列集中的核酸序列进行比较,得到含有低频突变的核酸序列。S6: Comparing the nucleic acid sequence containing the correct base arrangement position with the nucleic acid sequence of the remaining nucleic acid sequence or the parallel nucleic acid sequence set in the nucleic acid sequence to obtain a nucleic acid sequence containing a low frequency mutation.
根据本发明的实施例,所述步骤S6还包括过滤由PCR和测序带入的测序错误。According to an embodiment of the invention, said step S6 further comprises filtering sequencing errors brought in by PCR and sequencing.
所述数据分析可采用本领域技术人员所熟知的统计学分析方法进行分析,例如Z检验、T检验、游程检验等。The data analysis can be performed using statistical analysis methods well known to those skilled in the art, such as Z test, T test, run test, and the like.
下面将结合实施例对本发明的方案进行解释。本领域技术人员将会理解,下面示例仅用于解释本发明,而不能理解为对本发明的限制。除另有交待,以下实施例中涉及的未特别交待的试剂、序列(接头、标签和引物)、软件及仪器,都是常规市售产品或者开源的。The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be construed as limiting. Unless otherwise stated, the reagents, sequences (linkers, tags and primers), software and instruments not specifically addressed in the following examples are conventionally commercially available or open source.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.
实施例一 检测低频突变基因Example 1 Detection of low frequency mutant gene
1、设计分子标签及含有该分子标签的接头1. Design molecular tags and connectors containing the molecular tags
按照每个位置4种碱基随机分布的可能性设计分子标签,分子标签上最多含有2个连续相同的碱基。按照实验需要,设计不同种类的分子标签M种。分子标签序列的种类数包括4n,n等于6-24。 如表1所示,16种分子标签:Molecular tags are designed according to the possibility of random distribution of 4 bases per position, and the molecular tag contains up to 2 consecutive identical bases. According to the needs of the experiment, different kinds of molecular labels M were designed. The number of species of the molecular tag sequence includes 4n, and n is equal to 6-24. As shown in Table 1, 16 molecular labels:
表1Table 1
SEQ ID NO:SEQ ID NO: 分子标签Molecular label SEQ ID NO:SEQ ID NO: 分子标签Molecular label
11 ATCGATGCATCGATGC 99 ACTGCATCACTGCATC
22 ACGTGCACACGTGCAC 1010 AGACTAGCAGACTAGC
33 TCAGCATCTCAGCATC 1111 TAGTCAGCTAGTCAGC
44 TGCATGACTGCATGAC 1212 TCGACTACTCGACTAC
55 GCATCATCGCATCATC 1313 GTATCGACGTATCGAC
66 GTACTATCGTACTATC 1414 GACTGATCGACTGATC
77 CATGATGCCATGATGC 1515 CTAGTGACCTAGTGAC
88 CGGTATTCCGGTATTC 1616 CGTACATCCGTACATC
设计含有上述任意一种分子标签的接头,其中分子标签可位于接头除突出端“T”和非突出端末端20bp碱基以外的任意位置。如图1、图2、图3、图4所示,NNN...NNN代表分子标签,接头的种类可以是,完全互补的双链结构、一端互补一端开放的Y型结构,或者可通过PCR将分子标签引入接头的Y型结构。分子标签可以仅位于接头的任意一端或中间,也可以分布于2个或者2个以上的位置,N的个数代表分子标签的碱基数目,需要的分子标签种类多就增加该位置的碱基个数,比如采用8bp、12bp、16bp、24bp或者更多的碱基个数。如表2所示,16种含有不同分子标签的接头:A linker containing any of the above molecular tags is designed, wherein the molecular tag can be located at any position other than the overhang "T" of the linker and the 20 bp base of the non-overhang end. As shown in Fig. 1, Fig. 2, Fig. 3, and Fig. 4, NNN...NNN represents a molecular tag, and the type of the linker may be a fully complementary double-stranded structure, a Y-type structure in which one end is complementary to one end, or may be PCR- The molecular tag is introduced into the Y-type structure of the linker. The molecular tag may be located at either or both ends of the linker, or may be distributed at two or more positions. The number of N represents the number of bases of the molecular tag, and the number of molecular tags required increases the number of bases at the position. The number, for example, the number of bases of 8 bp, 12 bp, 16 bp, 24 bp or more. As shown in Table 2, 16 kinds of linkers containing different molecular tags:
表2Table 2
Figure PCTCN2017100421-appb-000001
Figure PCTCN2017100421-appb-000001
Figure PCTCN2017100421-appb-000002
Figure PCTCN2017100421-appb-000002
Figure PCTCN2017100421-appb-000003
Figure PCTCN2017100421-appb-000003
当接头如图1和图2及其类似的结构,需要同时设计含有分子标签反向互补的结构,如需要同时设计表2中的F向序列和R向序 列,图3、图4及其类似的结构则只需要设计单链分子标签,如表2中的F向序列而不需要设计分子标签反向互补序列。When the joint is as shown in Fig. 1 and Fig. 2 and the like, it is necessary to simultaneously design a structure containing the reverse complement of the molecular label. If it is necessary to simultaneously design the F-direction sequence and the R-direction sequence in Table 2 Columns, Figures 3, 4 and similar structures only need to design single-stranded molecular tags, such as the F-directed sequences in Table 2, without the need to design molecular tag reverse complementary sequences.
根据实验的需要,还可以在分子标签的3’或5’端添加识别性特征序列和/或文库标签。例如,使用illumina平台测序时,可以将识别不同样本的index序列加入其中。Identification signature sequences and/or library tags can also be added at the 3' or 5' end of the molecular tag as desired for the experiment. For example, when sequencing using the illumina platform, an index sequence that identifies different samples can be added to it.
2、合成含有分子标签的接头2. Synthesis of joints containing molecular tags
根据所设计的接头序列,将设计出来的分子标签或及其对应的反向互补序列及其3'端、5'端的序列进行合成,得到含有分子标签的接头。本领域人员可以理解的,合成方法可采用本领域熟知的方法,也可委托给引物合成公司合成。According to the designed adaptor sequence, the designed molecular tag or its corresponding reverse complement sequence and its sequence at the 3' end and the 5' end are synthesized to obtain a linker containing the molecular tag. As will be appreciated by those skilled in the art, synthetic methods can be employed in methods well known in the art or can be commissioned by a primer synthesis company.
3、将得到的接头按比例混合,得到一组接头组合物3. Mix the obtained joints in proportion to obtain a set of joint compositions
将合成的含有分子标签的接头按照不同的种类的摩尔数1:1的比例进行混合。The synthesized molecular tag-containing linkers were mixed at a molar ratio of 1:1 for different types.
例如当如图1、图2、图3及其类似的结构的接头种类时,每种种类的接头按摩尔数1:1的比例混合。For example, when the types of joints of the structures of Fig. 1, Fig. 2, Fig. 3, and the like are used, each type of joint is mixed in a ratio of 1:1.
当如图5及其类似的结构,将分子标签直接与Y型接头按照摩尔数1:1的比例混合,得到一组接头组合物。When the molecular label is directly mixed with the Y-type joint in a molar ratio of 1:1 as shown in Fig. 5 and the like, a set of the joint composition is obtained.
4、提取样本DNA4, extract sample DNA
抽取病人外周EDTA抗凝血10ml,并新鲜离心分离血浆,按照本领域技术人员熟知的方法提取血浆DNA。The patient's peripheral EDTA anticoagulation was taken 10 ml, and the plasma was separated by fresh centrifugation, and plasma DNA was extracted according to a method well known to those skilled in the art.
5、DNA末端修复5, DNA end repair
将提取得到的DNA溶液和末端修复的试剂混合液混合,按照本领域技术人员熟知的末端修复的方法进行反应,反应结束后进行分离纯化。The extracted DNA solution and the end-repaired reagent mixture are mixed, and the reaction is carried out according to a method of terminal repair well known to those skilled in the art, and the reaction is separated and purified.
5.1 按如下反应体系在1.5mlEP管中配制: 5.1 Prepare in a 1.5 ml EP tube as follows:
试剂Reagent 体积/ulVolume /ul
DNADNA 8585
10×PNK缓冲液10×PNK buffer 1010
dNTP溶液(10mM)dNTP solution (10mM) 22
T4DNA聚合酶T4 DNA polymerase 11
T4PNKT4PNK 11
KLENOW片段(稀释10KLENOW fragment (dilution 10 11
总体积/ulTotal volume / ul 100100
室温混匀,轻微离心后,反应体系置于PCR仪中,20℃反应30分钟,反应结束后,使用AMpure XP磁珠纯化。After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine, and reacted at 20 ° C for 30 minutes. After the reaction was completed, it was purified using AMpure XP magnetic beads.
5.2 在100ul体系反应产物中加入150ul磁珠,进行AMpure XP磁珠纯化后,反复用500ul 75%乙醇洗涤两次,弃上清液。37℃烘干,至磁珠干燥。加入34.5ul水,混匀磁珠,待澄清,吸取34ul上清液。5.2 Add 150 ul of magnetic beads to the 100 ul system reaction product, and after performing AMpure XP magnetic bead purification, repeatedly wash twice with 500 ul of 75% ethanol, and discard the supernatant. Dry at 37 ° C until the beads are dry. Add 34.5 ul of water, mix the magnetic beads, and clarify, and draw 34 ul of the supernatant.
6、末端加“A”6, add "A" at the end
将末端修复的DNA溶液和加“A”的试剂混合液混合,按照本领域技术人员熟知的末端加“A”的方法进行反应,反应结束后进行分离纯化。The terminal-repaired DNA solution and the "A"-added reagent mixture are mixed, and the reaction is carried out according to the method of adding "A" at the end well known to those skilled in the art, and the reaction is separated and purified.
6.1 将5中得到的溶液按照以下体系配制反应液:6.1 Prepare the reaction solution from the solution obtained in 5 according to the following system:
试剂Reagent 体积volume
末端修复DNAEnd repair DNA 3434
10×蓝色缓冲液10× blue buffer 55
dATP(1mM)dATP (1mM) 1010
Klenow 3'-5'exo-Klenow 3'-5'exo- 11
总体积/ulTotal volume / ul 5050
室温混匀,轻微离心后,反应体系置于PCR仪中,37℃反应30分钟,反应结束后,使用AMpure XP磁珠纯化。After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine, and reacted at 37 ° C for 30 minutes. After the reaction was completed, it was purified using AMpure XP magnetic beads.
6.2 采用如5.2所示的方法进行磁珠纯化,其区别在于50ul体系反应产物中加入75ul磁珠。6.2 Magnetic bead purification was carried out by the method shown in 5.2, except that 75 ul of magnetic beads were added to the 50 ul system reaction product.
7、加接头反应7, add joint reaction
将加“A”后的DNA溶液和步骤S3中得到的含有分子标签的接头、连接反应试剂混合液混合,按照本领域技术人员熟知的加接头的方法进行反应,反应结束后进行分离纯化。The DNA solution after the addition of "A" is mixed with the molecular tag-containing linker and the reaction reagent mixture obtained in the step S3, and the reaction is carried out according to a method of adding a linker well known to those skilled in the art, and after the completion of the reaction, separation and purification are carried out.
7.1 将6中得到的溶液按照以下体系配制反应液:7.1 Prepare the reaction solution from the solution obtained in 6 according to the following system:
Figure PCTCN2017100421-appb-000004
Figure PCTCN2017100421-appb-000004
室温混匀,轻微离心后,反应体系置于PCR仪中,20℃反应15分钟,反应结束后,使用AMpure XP磁珠纯化。After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine, and reacted at 20 ° C for 15 minutes. After the reaction was completed, it was purified using AMpure XP magnetic beads.
7.2 采用如5.2所示的方法进行磁珠纯化,其区别在于50ul体系反应产物中加入75ul磁珠。7.2 Magnetic bead purification was carried out using the method shown in 5.2, except that 75 ul of magnetic beads were added to the 50 ul system reaction product.
8、PCR富集,构建测序文库8. PCR enrichment and construction of sequencing library
将加接头后的DNA和PCR反应试剂混合液混均,按照本领域技术人员熟知的方法进行PCR反应,反应结束后进行分离纯化,到此文库构建结束,对文库进行QC检测,检测合格后等待测序。The DNA after the addition of the linker and the PCR reaction reagent mixture are mixed, and the PCR reaction is carried out according to a method well known to those skilled in the art. After the reaction is completed, the separation and purification are carried out. After the completion of the library construction, the library is subjected to QC detection, and the test is waited after passing the test. Sequencing.
8.1 在1个新的PCR管中按照以下体系配制反应液: 8.1 Prepare the reaction solution in a new PCR tube according to the following system:
试剂Reagent 体积/ulVolume /ul
DNADNA 3232
10×Pfx扩增缓冲液10×Pfx amplification buffer 55
dNTP溶液(10nM)dNTP solution (10nM) 22
MgSO4(50nM)MgSO 4 (50nM) 22
PCR引物PE1.0(10pmol/ul)PCR primer PE1.0 (10pmol/ul) 44
index-X(10pmol/ul)index-X (10pmol/ul) 44
Pfx DNA聚合酶Pfx DNA polymerase 11
总体积/ulTotal volume / ul 5050
室温混匀,轻微离心后,反应体系置于PCR仪中,按照以下条件进行反应:After mixing at room temperature, after slight centrifugation, the reaction system was placed in a PCR machine and reacted according to the following conditions:
Figure PCTCN2017100421-appb-000005
Figure PCTCN2017100421-appb-000005
反应结束后,使用AMpure XP磁珠纯化。After the reaction was completed, it was purified using AMpure XP magnetic beads.
8.2 采用如5.2所示的方法进行磁珠纯化,其区别在于50ul体系反应产物中加入50ul磁珠。文库构建结束。8.2 Magnetic bead purification was carried out using the method shown in 5.2, except that 50 ul of magnetic beads were added to the 50 ul system reaction product. The library construction is over.
9、文库质检9, library quality inspection
对文库进行QPCR和Agilent 2100检测,质检合格文库安排上机。The library was subjected to QPCR and Agilent 2100 detection, and the quality-qualified library was arranged on the machine.
10、对文库进行DNA测序10. DNA sequencing of the library
可使用Hiseq2000、Hiseq2500、Proton、Miseq、NS500等二代测序仪对文库进行测序。 The library can be sequenced using a second generation sequencer such as Hiseq2000, Hiseq 2500, Proton, Miseq, NS500.
11、分析测序结果11, analysis of sequencing results
将测序后得到的DNA的测序结果进行分析,按照分子标签将得到的DNA序列进行分类,将携带有相同的分子标签的序列作为1个“分子簇”,这个分子簇是初始1个DNA分子通过PCR形成的1类DNA,即原始DNA分子的正链和负链的“复制链”。The sequencing results of the DNA obtained after sequencing are analyzed, and the obtained DNA sequences are classified according to molecular tags, and the sequence carrying the same molecular tag is taken as a "molecular cluster" which is the initial one DNA molecule. Class 1 DNA formed by PCR, the "replication strand" of the positive and negative strands of the original DNA molecule.
统计“分子簇”内部每个碱基位置的碱基种类及其出现的频数。Count the base types at each base position within the "molecular cluster" and the frequency of their occurrence.
根据数据分析,找出由于PCR和测序带入的错误并纠正。Based on data analysis, identify errors and correct them due to PCR and sequencing.
从而得到原始DNA的正确序列,并通过分子簇内部和平行比较,找出真正的突变序列。The correct sequence of the original DNA is thus obtained, and the true mutant sequence is found by internal and parallel comparison of the molecular clusters.
以上实施方式仅用以说明本发明的技术方案而非限制,尽管参照以上较佳实施方式对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换都不应脱离本发明技术方案的精神和范围。 The above embodiments are only used to illustrate the technical solutions of the present invention, and are not intended to be limiting, and the present invention will be described in detail with reference to the preferred embodiments of the present invention. Neither should the spirit and scope of the technical solutions of the present invention be deviated.

Claims (10)

  1. 一种分子标签,其特征在于,所述分子标签上最多含有2个连续相同的碱基。A molecular tag characterized in that the molecular tag contains up to two consecutive identical bases.
  2. 根据权利要求1所述的分子标签,其特征在于,所述分子标签为单链或反向互补的双链。The molecular tag according to claim 1, wherein the molecular tag is a single-stranded or reverse-complementary double-stranded chain.
  3. 根据权利要求1所述的分子标签,其特征在于,所述分子标签的碱基数目为6-24bp。The molecular tag according to claim 1, wherein the molecular tag has a base number of 6 to 24 bp.
  4. 一种分子标签组合物,其特征在于,含有如权利要求1-3任一项所述的分子标签,且每种分子标签的比例为0.95-1.05:1。A molecular tag composition comprising the molecular tag according to any one of claims 1 to 3, wherein the ratio of each molecular tag is from 0.95 to 1.05:1.
  5. 根据权利要求4所述的分子标签组合物,其特征在于,所述比例包括摩尔比、分子质量比、分子数比的至少之一。The molecular tag composition according to claim 4, wherein the ratio comprises at least one of a molar ratio, a molecular mass ratio, and a molecular ratio.
  6. 一种接头,其特征在于,所述接头含有如权利要求1-3任一项所述的分子标签,且所述分子标签位于所述接头除突出端“T”和非突出端末端20bp碱基以外的任意位置。A linker comprising the molecular tag according to any one of claims 1 to 3, wherein the molecular tag is located at the terminus "T" of the linker and 20 bp of the non-overhanging end Any location other than.
  7. 如权利要求6所述的接头,其特征在于,所述接头还含有文库标签,所述文库标签与所述分子标签的3’端或5’端相连。The linker of claim 6 wherein said linker further comprises a library tag linked to the 3' or 5' end of said molecular tag.
  8. 如权利要求6所述的接头,其特征在于,所述接头还含有识别性特征序列,所述识别性特征序列为4个不重复的碱基,所述识别性特征序列与所述分子标签的3’端或5’端相连。The linker according to claim 6, wherein said linker further comprises an identifying characteristic sequence, said identifying characteristic sequence being 4 non-repeating bases, said identifying characteristic sequence and said molecular tag The 3' end or the 5' end is connected.
  9. 一种接头组合物,其特征在于,所述接头组合物含有如权利要求6所述的接头,且每种接头的比例为0.95-1.05:1。A joint composition, characterized in that the joint composition contains the joint of claim 6, and the ratio of each joint is from 0.95 to 1.05:1.
  10. 一种确定待测样本目标区域含有低频突变核酸序列的方法,包括如下步骤:A method for determining a target region of a sample to be tested containing a low frequency mutant nucleic acid sequence, comprising the steps of:
    S1、利用如权利要求6所述的接头,对待测样本目标区域核酸 进行加接头反应,对加接头后的待测样本目标区域核酸进行PCR扩增,获得扩增产物,所述扩增产物构成所述待测样本的目标区域核酸测序文库;S1. Using the linker according to claim 6, the target region nucleic acid to be tested Performing a linker reaction, performing PCR amplification on the nucleic acid of the target region of the sample to be tested after the addition of the linker, to obtain an amplification product, the amplification product constituting the target region nucleic acid sequencing library of the sample to be tested;
    S2、对所述待测样本的目标区域核酸测序文库进行测序,获得测序后核酸序列;S2, sequencing the target region nucleic acid sequencing library of the sample to be tested, and obtaining the sequenced nucleic acid sequence;
    S3、将所述测序后核酸序列按照所述接头中含有的分子标签进行分类,将携带有相同分子标签的所述测序后的核酸序列归类为同一核酸序列集;S3, classifying the sequenced nucleic acid sequence according to the molecular tag contained in the linker, and classifying the sequenced nucleic acid sequence carrying the same molecular tag into the same nucleic acid sequence set;
    S4、将所述核酸序列集内的测序后核酸序列进行相互比较,统计所述核酸序列集中每个碱基位置的碱基种类及其频率;S4, comparing the sequenced nucleic acid sequences in the nucleic acid sequence set with each other, and counting the base types and frequencies of each base position in the nucleic acid sequence set;
    S5、根据所述核酸序列集中每个碱基位置的碱基种类及其频率,通过数据分析,得到所述核酸序列集中含有正确的碱基排列位置的核酸序列;S5, according to the base type and frequency of each base position in the nucleic acid sequence set, by data analysis, obtaining a nucleic acid sequence in which the nucleic acid sequence contains a correct base arrangement position;
    S6、将所述含有正确的碱基排列位置的核酸序列与所述核酸序列集中的其余的核酸序列或平行的核酸序列集中的核酸序列进行比较,得到含有低频突变的核酸序列。 S6: Comparing the nucleic acid sequence containing the correct base arrangement position with the nucleic acid sequence of the remaining nucleic acid sequence or the parallel nucleic acid sequence set in the nucleic acid sequence to obtain a nucleic acid sequence containing a low frequency mutation.
PCT/CN2017/100421 2017-07-14 2017-09-04 Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation WO2019010775A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710573052.1A CN107385030A (en) 2017-07-14 2017-07-14 Molecular label, the method for joint and determination containing low frequency mutant nucleic acid sequence
CN201710573052.1 2017-07-14

Publications (1)

Publication Number Publication Date
WO2019010775A1 true WO2019010775A1 (en) 2019-01-17

Family

ID=60339624

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/100421 WO2019010775A1 (en) 2017-07-14 2017-09-04 Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation

Country Status (2)

Country Link
CN (1) CN107385030A (en)
WO (1) WO2019010775A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109295049A (en) * 2018-09-26 2019-02-01 刘强 Label specific linkers, primer sets and the banking process in the library Blood Trace cfDNA

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009140191A2 (en) * 2008-05-12 2009-11-19 Biomedomics, Inc. Detection of disease related gene mutations by single molecule amplification with multiplex ligation assays
CN106048009A (en) * 2016-06-03 2016-10-26 人和未来生物科技(长沙)有限公司 Label joint for detection of ultra-low-frequency gene mutation and application of label joint
CN106367485A (en) * 2016-08-29 2017-02-01 厦门艾德生物医药科技股份有限公司 Multi-locating double tag adaptor set used for detecting gene mutation, and preparation method and application of multi-locating double tag adaptor set
CN106676182A (en) * 2017-02-07 2017-05-17 北京诺禾致源科技股份有限公司 Low-frequency gene fusion detection method and device
CN106811460A (en) * 2015-11-30 2017-06-09 安诺优达基因科技(北京)有限公司 For the construction method and kit of two generation sequencing libraries of low frequency abrupt climatic change
CN106834275A (en) * 2017-02-22 2017-06-13 天津诺禾医学检验所有限公司 The analysis method of the construction method, kit and library detection data in ctDNA ultralow frequency abrupt climatic changes library

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7198900B2 (en) * 2003-08-29 2007-04-03 Applera Corporation Multiplex detection compositions, methods, and kits
CN102115789B (en) * 2010-12-15 2013-03-13 厦门大学 Nucleic acid label for second-generation high-flux sequencing and design method thereof
US20160017410A1 (en) * 2014-07-17 2016-01-21 Jay Shendure Highly multiplex single amino acid mutagenesis for massively parallel functional analysis
CN106086162B (en) * 2015-11-09 2020-02-21 厦门艾德生物医药科技股份有限公司 Double-label joint sequence for detecting tumor mutation and detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009140191A2 (en) * 2008-05-12 2009-11-19 Biomedomics, Inc. Detection of disease related gene mutations by single molecule amplification with multiplex ligation assays
CN106811460A (en) * 2015-11-30 2017-06-09 安诺优达基因科技(北京)有限公司 For the construction method and kit of two generation sequencing libraries of low frequency abrupt climatic change
CN106048009A (en) * 2016-06-03 2016-10-26 人和未来生物科技(长沙)有限公司 Label joint for detection of ultra-low-frequency gene mutation and application of label joint
CN106367485A (en) * 2016-08-29 2017-02-01 厦门艾德生物医药科技股份有限公司 Multi-locating double tag adaptor set used for detecting gene mutation, and preparation method and application of multi-locating double tag adaptor set
CN106676182A (en) * 2017-02-07 2017-05-17 北京诺禾致源科技股份有限公司 Low-frequency gene fusion detection method and device
CN106834275A (en) * 2017-02-22 2017-06-13 天津诺禾医学检验所有限公司 The analysis method of the construction method, kit and library detection data in ctDNA ultralow frequency abrupt climatic changes library

Also Published As

Publication number Publication date
CN107385030A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN106367485B (en) Double label connector groups of a kind of more positioning for detecting gene mutation and its preparation method and application
US10604804B2 (en) Methods of lowering the error rate of massively parallel DNA sequencing using duplex consensus sequencing
CN106048009B (en) Label joint for ultralow frequency gene mutation detection and application thereof
CN106555226B (en) A kind of method and kit constructing high-throughput sequencing library
EP2631336B1 (en) Dna library and preparation method thereof, and method and device for detecting snps
AU2022202739A1 (en) High-Throughput Single-Cell Sequencing With Reduced Amplification Bias
US20210403991A1 (en) Sequencing Process
CN106086162A (en) A kind of double label joint sequences for detecting Tumor mutations and detection method
US20200010875A1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
CN102690809B (en) DNA index and application thereof in construction and sequencing of mate-paired indexed library
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
CN107075512B (en) Linker element and method for constructing sequencing library by using same
JP2019501641A (en) Rapid sequencing of short DNA fragments using nanopore technology
CN108517567B (en) Adaptor, primer group, kit and library construction method for cfDNA library construction
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN106939344B (en) Linker for next generation sequencing
US20210102246A1 (en) Genetic test for detecting congenital adrenal hyperplasia
CN111440846B (en) Position anchoring bar code system for nanopore sequencing library building
KR20170133270A (en) Method for preparing libraries for massively parallel sequencing using molecular barcoding and the use thereof
WO2012037875A1 (en) Dna tags and use thereof
WO2021253372A1 (en) High-compatibility pcr-free library building and sequencing method
CN108728515A (en) A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
CN105087560A (en) Multiplex PCR primers and method based on high-throughput sequencing to build pig BCR heavy chain library
WO2019010776A1 (en) Combined label, connector and method for determining that low-frequency mutation nucleic acid sequence is comprised
WO2019010775A1 (en) Molecular tag, joint and method for determining nucleotide sequence containing low-frequency mutation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17917625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17917625

Country of ref document: EP

Kind code of ref document: A1