WO2021227129A1 - Universal high-throughput sequencing adapter and application thereof - Google Patents

Universal high-throughput sequencing adapter and application thereof Download PDF

Info

Publication number
WO2021227129A1
WO2021227129A1 PCT/CN2020/092418 CN2020092418W WO2021227129A1 WO 2021227129 A1 WO2021227129 A1 WO 2021227129A1 CN 2020092418 W CN2020092418 W CN 2020092418W WO 2021227129 A1 WO2021227129 A1 WO 2021227129A1
Authority
WO
WIPO (PCT)
Prior art keywords
throughput sequencing
sequence
stranded
sequencing adapter
strand
Prior art date
Application number
PCT/CN2020/092418
Other languages
French (fr)
Chinese (zh)
Inventor
曹彦东
周洋
扶媛媛
杨颖�
张丽婷
Original Assignee
北京安智因生物技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京安智因生物技术有限公司 filed Critical 北京安智因生物技术有限公司
Publication of WO2021227129A1 publication Critical patent/WO2021227129A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the invention relates to the field of gene sequencing, in particular to a universal high-throughput sequencing adapter and its application in the process of sequencing database construction.
  • High-Throughput Sequencing high-throughput sequencing technology
  • NGS Next Generation Sequencing Technology
  • Sanger sequencing is to extend a synthetic short oligonucleotide primer by DNA polymerase, hybridize with a single-stranded DNA template to synthesize new DNA fragments, and separate different fragments by polyacrylamide electrophoresis to read DNA sequences.
  • NGS sequencing usually uses massively parallel sequencing (MPS), which can realize simultaneous sequencing of multiple samples and multiple sites, greatly improving sequencing throughput.
  • MPS massively parallel sequencing
  • high-throughput sequencing technology has been proven to have high accuracy and sensitivity in clinical genetic testing.
  • it is affected by various noises and errors in the library construction process and sequencing process, causing low-frequency mutations in the sequencing results. It is difficult to distinguish its authenticity.
  • the proportion of label hopping can be as high as 2% (illumina.Effects of index misassignment on multiplexing) and downstream[Z]Analysis).
  • NGS sequencing platforms are Ion Torrent and Illumina.
  • different platforms use different technical procedures for library construction, which makes the library not universal, that is, libraries suitable for the Ion Torrent platform cannot be used on the Illumina platform. Generate sequencing data and vice versa. This has greatly restricted the clinical application, so it is necessary to find a universal library and apply different sequencing platforms.
  • sequencing adapter research is the focus of library research.
  • the design of sequencing adapters mainly includes two directions. One is to try to improve the shape of the adapter, such as Y-shaped or U-shaped adapters. In order to reduce or avoid the appearance of adapter dimers and increase the amount of available sequencing data; the other is to add specific molecular tags to the adapter structure to identify errors in the library construction process.
  • sequencing libraries prepared through the above two research directions can still only be used for fixed sequencing platforms, and cannot be used in mainstream sequencing platforms such as Ion Torrent and Illumina at the same time.
  • the Ion Torrent platform and the Illumina platform have different sequencing principles and different methods for constructing sequencing templates.
  • the Ion Torrent platform uses emulsion PCR to construct the sequencing template; the Illumina platform uses bridge amplification or exclusive amplification to construct the sequencing template. .
  • Ion Torrent generally uses straight link heads; Illumina generally uses Y-links for library construction.
  • the technical problem to be solved by the present invention is to overcome the disadvantages that the high-throughput sequencing adapter in the prior art cannot realize the compatibility of different sequencing platforms, and the applicability is not strong.
  • the first objective of the present invention is to seek a universal high-throughput sequencing adapter suitable for multiple sequencing platforms
  • the second objective of the present invention is to seek a method for preparing a universal high-throughput sequencing adapter suitable for multiple sequencing platforms
  • the third objective of the present invention is to seek the application of a universal high-throughput sequencing adapter
  • the fourth objective of the present invention is to seek a method for detecting low-frequency gene mutations.
  • the present invention provides the following technical solutions:
  • the present invention provides a single link head, which is characterized in that the single link heads are connected in sequence
  • the free arm includes a library amplification primer binding region and a carrier binding region;
  • the double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
  • the double-stranded complementary region further includes a tag sequence.
  • the tag sequence is located at one end of the double-stranded complementary region away from the free arm.
  • the tag sequence consists of 6-12 random bases.
  • the length of the free arm of the single link head is 30-56 bp, and the length of the double-stranded complementary region is 40-58 bp.
  • the free arm can be composed of the following sequence:
  • the double-stranded complementary region may be composed of the following sequence:
  • XXXXXX represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the free arm further includes a tag sequence, and the tag sequence is consistent with the double-stranded complementary region tag sequence.
  • the free arm can be composed of the following sequence:
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the present invention also provides a Y-type high-throughput sequencing adapter, characterized in that the sequencing adapter includes a first single strand and a second single strand;
  • the first single strand and the second single strand respectively include:
  • the free arm includes a library amplification primer binding region and a carrier binding region;
  • the double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
  • the free arm sequences of the first single strand and the second single strand are not complementary, and the first single strand and the second single strand may be annealed to form a Y-shaped structure double strand.
  • the double-stranded complementary region includes a tag sequence, and the tag sequence is located at one end of the double-stranded complementary region away from the free arm.
  • the sequencing platform includes, but is not limited to, Illumina, Ion Torrent, PacBio, Roche, Helicos, and ABI platforms; preferably, the sequencing platform is Ion Torrent and Illumina platforms.
  • the second single-stranded free arm further includes a tag sequence.
  • the tag sequence in the free arm is the same as the tag sequence in the double-stranded complementary region; more preferably, the tag sequence in the free arm is close to the end of the double-stranded complementary region.
  • the length of the double-stranded complementary region of the first single strand and the second single strand is 40-58 bp; the length of the free arm of the first single strand is 30-45 bp, and the length of the free arm of the second single strand is 30-45 bp.
  • the length is 35-56bp; the tag sequence is composed of random bases of 6-12bp.
  • the 3'end of the free arm of the first or second single strand is modified for stability
  • thio modification is carried out
  • the phosphodiester bond between the last 3 bases at the 3'end is replaced by phosphorothioate.
  • the first single-stranded sequence is as follows:
  • Double-stranded complementary region sequence
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the first single strand is connected by a free arm and a double-strand complementary region in a 5'-3' direction sequentially.
  • the second single-stranded sequence is as follows:
  • Double-stranded complementary region sequence
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the second single-stranded double-stranded complementary region and the free arm are connected in a 5'-3' direction sequentially.
  • the present invention also provides a high-throughput sequencing adapter set, characterized in that the sequencing adapter set includes the above-mentioned high-throughput sequencing adapter.
  • the high-throughput sequencing adapter set further includes another Y-type high-throughput sequencing adapter as follows: the Y-type high-throughput sequencing adapter includes third and fourth single strands;
  • the sequence of the other Y-type high-throughput sequencing adaptor is similar to the sequence of the above-mentioned Y-type high-throughput sequencing adaptor, except that the sequence of the double-stranded complementary region is different;
  • sequence of the double-strand complementary region of the third single-strand is as follows:
  • sequence of the double-strand complementary region of the fourth single-stranded is complementary to the sequence of the third single-stranded double-strand complementary region;
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • connection sequence between the free arms of the third and fourth single strands and the double-strand complementary region sequence is the same as that of the first and second single strands.
  • the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
  • the first single-stranded sequence (SEQ ID NO.1):
  • the second single-stranded sequence (SEQ ID NO.2):
  • the third single-stranded sequence (SEQ ID NO.3):
  • the fourth single-stranded sequence (SEQ ID NO.4):
  • the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
  • the first single-stranded sequence (SEQ ID NO.5):
  • the second single-stranded sequence (SEQ ID NO.6):
  • the third single-stranded sequence (SEQ ID NO.7):
  • the fourth single-stranded sequence (SEQ ID NO. 8):
  • SEQ ID NO.1-8 in the sequence listing does not contain "XXXXX".
  • the present invention also provides a composition, characterized in that the composition comprises the above-mentioned high-throughput sequencing linker or linker set.
  • the present invention also provides a complex, which is characterized in that the complex is connected to the above-mentioned high-throughput sequencing adapter or adapter set.
  • the present invention also provides a kit, characterized in that the composition comprises the above-mentioned high-throughput sequencing adapter or adapter set.
  • the kit is a high-throughput sequencing library building kit or a gene sequence enrichment kit.
  • the present invention also provides a method for preparing the above-mentioned high-throughput sequencing adapter, which is characterized in that it comprises the following steps:
  • S1 synthesizes the first strand and the second strand single-stranded sequence respectively
  • S2 specifically anneals the two single-stranded sequences of S1 to obtain the high-throughput sequencing adapter.
  • the present invention also provides a method for constructing a sequencing library, which is characterized in that:
  • S1 prepares the target fragment of the sample to be tested
  • S2 connects the aforementioned high-throughput sequencing adapter or adapter set to the target fragment of S1 to obtain a ligation product
  • S3 amplifies the S1 ligation product, and obtains the sequencing library of the sample to be tested after purification.
  • the present invention also provides a method for detecting low-frequency gene mutations, and is characterized in that it comprises the following steps:
  • S1 prepares the above-mentioned high-throughput sequencing adapters or adapter sets, for the same sample, the tag sequences are the same;
  • S2 performs target fragment amplification on the sample to be tested and digests the primers
  • S3 connects the digested product of S2 to the mid-to-high-throughput sequencing adapter or adapter set of S1 to obtain a ligation product, amplify the ligation product, and obtain a sequencing library after purification;
  • S4 sequence the sequencing library of S3, correct the sequencing data according to the tag sequence of the high-throughput sequencing adapter, and perform mutation analysis based on the corrected sequencing data.
  • the mutation analysis in step S4 is: based on the fact that a specific mutation appears in both the sense strand and the antisense strand of the same read, it is determined as a true low-frequency mutation.
  • the sample to be tested is genomic DNA.
  • the present invention also provides the following applications of the above-mentioned high-throughput sequencing adapter, adapter set, composition, complex or kit:
  • the universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms.
  • the distal end of the paired double-stranded part contains the tag sequence, and the non-free ends of the two free arms contain the tag sequence.
  • the base composition of the tag sequence carried by the same sample is consistent. According to the consistency of the tag sequence, it can be judged whether there is cross-contamination during the library construction process. After using some models of the Illumina sequencing platform for sequencing, the analysis of sequencing data can determine whether there is index hopping based on whether the base composition of the tag sequence of the same read is consistent.
  • the universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms.
  • the distal end of the paired double-stranded part contains a tag sequence, and the base composition of the tag sequence should be the same for the sense strand and antisense strand of the same read.
  • a specific mutation must be in the sense strand of the same read, and the antisense strand can be judged as a true mutation. If a certain read only has mutations in the sense strand or the antisense strand, it can be judged as an error in the library construction or sequencing process, and the mutation cannot be included in the subsequent analysis process to avoid false positives.
  • the tag sequence contained in the universal high-throughput sequencing adapter provided by the present invention only uses a segment of tag sequence, and must exist in both the sense strand and the antisense strand through specific mutations; the base composition of the tag sequence of the sample should be in the read segment
  • the base composition of the tag sequence is the same.
  • the tag sequence of a sample is not the same as the tag sequence in the read segment, it can indicate that the read segment does not belong to this sample, that is, a tag skip situation has occurred.
  • the design of the present invention can effectively overcome the inherent label jumping problem of the sequencing part of the platform, and can realize the authenticity interpretation of low-frequency mutations.
  • the universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms.
  • the universal high-throughput sequencing adapter includes a PN adapter and an AN adapter; the PN adapter double-strand complementary region consists of 40 to 58 bases; the AN adapter double-strand complementary region consists of 40 to 58 bases; the PN adapter or The 5'free arm of the AN linker is composed of 30 to 45 bases; the 3'free arm of the PN linker or the AN adaptor is composed of 35 to 56 bases; the tag sequence is composed of 6 to 12 bases, so as to achieve at least 114048 bases.
  • Universal high-throughput sequencing adapter is composed of 30 to 45 bases; the 3'free arm of the PN linker or the AN adaptor is composed of 35 to 56 bases; the tag sequence is composed of 6 to 12 bases, so as to achieve at least 114048 bases.
  • FIG. 1 Schematic diagram of the structure of the universal high-throughput sequencing adapter shown in Example 1;
  • FIG. 1 The quality control map of the universal high-throughput library 2100 in Example 2, including the quality control map of the universal high-throughput sequencing adapter library 2100 (sample R19054232);
  • FIG. 1 The quality control map of the universal high-throughput library 2100 in Example 2, including the quality control map of the universal high-throughput sequencing adapter library 2100 (sample R20005128);
  • the terms “including”, “including”, “having”, “containing” or “involving” are inclusive or open-ended, and do not exclude other unlisted elements or method steps .
  • the term “consisting of” is considered a preferred embodiment of the term “comprising”. If in the following a certain group is defined as comprising at least a certain number of embodiments, this should also be understood as revealing a group preferably consisting of only these embodiments.
  • nucleic acid refers to any molecule comprising ribonucleic acid, deoxyribonucleic acid or its analogue unit, preferably a polymeric molecule.
  • the nucleic acid can be single-stranded or double-stranded.
  • the single-stranded nucleic acid may be a nucleic acid of one strand of denatured double-stranded DNA.
  • the single-stranded nucleic acid may be a single-stranded nucleic acid that is not derived from any double-stranded DNA.
  • complementary refers to the hydrogen bond base pairing between the nucleotide bases G, A, T, C, and U, so that when two given polynucleotides or polynucleotide sequences anneal to each other At this time, A paired with T, G paired with C in DNA, G paired with C, and A paired with U in RNA.
  • the “sequencing adapter” in the present invention refers to a double-stranded nucleotide sequence connected to the two ends of the target fragment to be sequenced.
  • the double-stranded oligonucleotide sequence can be double-stranded completely complementary or partially double-stranded. Complementarity, such as a "Y-type” linker formed because the terminal partial sequence is not complementary.
  • the sequencing linker of the present invention is preferably such a "Y-type” linker.
  • the composition of the nucleotide sequence of the sequencing adapter is related to the applicable sequencing platform.
  • the composition can include library amplification primer sequence, sample tag sequence, sequencing primer sequence, etc.; and the sequence length of the sequencing adapter is also related to the sequencing platform.
  • the length of the linker can be specifically: 3'free arm sequence is 35-56 bp, 5'free arm sequence is 30-45 bp, double-stranded complementary region sequence The length is 40 ⁇ 58bp.
  • Fig. 1 is a preferred universal high-throughput sequencing "Y-type" linker of the present invention, which includes a PN linker and an AN linker, which can be respectively located at either end of the target sequence.
  • Both the PN linker and the AN linker comprise a double-stranded complementary region, a single-stranded 5'free arm and a single-stranded 3'free arm.
  • the PN linker of the universal high-throughput sequencing linker and the double-stranded complementary region of the AN linker both include a tag sequence, and the tag sequence is composed of 6 to 12 bases.
  • the non-free end of the AN adaptor single-stranded 3'free arm of the high-throughput sequencing adaptor also contains the same base composition as the tag sequence
  • the PN adaptor of the above-mentioned universal high-throughput sequencing adaptor is single-stranded
  • the non-free end of the 3'free arm contains the same base composition as the tag sequence.
  • the 3'end of the 3'-free arm of the universal high-throughput sequencing linker AN linker and PN linker is thio modified; preferably, the last 3 bases The phosphodiester bond between is replaced by phosphorothioate.
  • the double-stranded complementary ends of the universal high-throughput sequencing linker AN linker and PN linker can be connected to the original gene fragment through a ligation reaction by a ligase.
  • the 5'free arm of the AN adaptor and the 3'free arm of the PN adaptor are non-complementary paired single strands and cannot be connected to the original gene fragments, thereby ensuring the efficiency of connecting the universal high-throughput sequencing adaptor to the DNA fragments.
  • PN linker and AN linker respectively refer to a partial double-stranded structural fragment (Y-type structure) containing a double-stranded complementary region and a single-stranded 3'/5' free arm, which is in the library When constructing, they are connected to one end of the target sequence respectively, and the nucleotide sequences of the two are preferably different.
  • the "free arm” in the present invention refers to the region where the bases in the linker sequence are not complementary paired, such as the unpaired region of the PN linker or AN linker of the present invention. Therefore, even if it is not clear that the sequence between the free arms is not complementary, this The field should also understand that the two sequences are not complementary and can form a Y-shaped structure in some cases.
  • the free arm of the present invention includes a library amplification primer region; in other embodiments, the 3'free arm of the present invention also includes a tag sequence.
  • the “double-stranded complementary region” in the present invention refers to the double-stranded complementary region contained in the sequencing adapter. This region usually contains sequencing primer sequences.
  • the double-stranded complementary region of the present invention contains at least two sequencing platforms for sequencing. Primer sequence.
  • the "tag sequence” in the present invention refers to a nucleotide sequence with a base length of 6 to 12 bp, which is used to identify different library samples.
  • non-free end in the present invention refers to the end where the double-stranded complementary region of the PN linker or the AN linker is connected to the 3'or 5'free arm of the single strand.
  • the "free end” in the present invention refers to the 3'end of the single-stranded 3'free arm or the 5'end of the single-stranded 5'free arm of the PN linker or the AN linker.
  • the "high-throughput sequencing platform” in the present invention refers to sequencing platforms such as Ion Torrent, Illumina, Roche454, and ABI. Although the sequencing platforms are preferably Ion Torrent and Illumina in the present invention, they are not limited. It is clear in the art that, based on the inventive concept of the present invention, primer sequences can be selected for any two or more platforms, and they can be constructed in the linker sequence to prepare the compatible high-throughput sequencing linker of the present invention. In addition, for sequencers under different sequencing platforms, considering that the sequencing principles of sequencers under the same type of sequencing platform are basically the same, the method of the present invention is applicable to all models under the same platform, for example, in the Ion Torrent sequencing platform.
  • the "low-frequency mutation” in the present invention refers to mutations where the frequency of gene mutation is less than 5%, including less than 5%, less than 4%, less than 3%, less than 2%, less than 1%, etc. Various mutations.
  • AN1/PN1 is a set of universal high-throughput sequencing adapters with shorter sequences
  • AN2/PN2 is another set of universal high-throughput sequencing adapters with longer sequences.
  • the specific preparation method is as follows:
  • the end of the double-strand complementary region of the AN linker contains a tag sequence, where the tag sequence is 6-12 random bases "X".
  • the 3'free arm of the AN linker also contains a tag sequence of 6-12 random bases "X”, and is connected to the non-free end of the 3'free arm of the AN linker.
  • the end of the double-stranded complementary region of the PN linker contains a tag sequence, where the tag sequence is 6-12 random bases "X”.
  • the 3'free arm of the PN linker also contains a tag sequence of 6-12 random bases "X", and is connected to the non-free end of the 3'free arm of the PN linker. *Represents the thio modification site.
  • the universal high-throughput sequencing linker AN linker and the phosphodiester between the last 3 bases of the 3'end of the 3'free arm of the PN linker The bond is replaced by phosphorothioate.
  • the universal high-throughput sequencing adapters AN1/PN1 and AN2/PN2 prepared in Example 1 were used in the experiment, respectively.
  • the number of universal high-throughput sequencing adapters corresponds to the number of samples to be tested. For example, if the number of samples to be tested is 10, 10 sets of universal high-throughput sequencing adapter sets are prepared correspondingly, and each set of universal high-throughput sequencing adapter sets includes PN1 adapters and AN1 adapters.
  • the base sequence composition of the tag sequence in the same group of PN1 adaptors and AN1 adaptors is the same, and the base sequence composition of the tag sequence in different adaptor groups is different.
  • Sample genomic DNA extraction Take peripheral blood samples 1 and 2 (corresponding to R190542432 and R20005128 respectively) for genomic DNA extraction.
  • the sample DNA was extracted in accordance with the operating instructions of the nucleic acid extraction reagent (DR181003-48) produced by Beijing Anzhiyin Biotechnology Co., Ltd.
  • the target regions to be examined are the entire coding region and the variable splicing region of the ACTA2, COL3A1, FBN1, MYH11, MYLK, SMAD3, TGFBR1, and TGFBR2 genes (20bp from exons to introns).
  • the multiple PCR primer pool of the target detection area is based on the design of Ion Ampliseq Designer, synthesized and provided by Thermo Fisher.
  • Target fragment amplification the specific implementation is as follows:
  • the ligase is Fast T4 DNA Ligase produced by Shanghai Yisheng Biotechnology Co., Ltd.
  • the ligation buffer is Shanghai Yi 5 ⁇ Fast Ligation Buffer produced by Sheng Biotechnology Co., Ltd.
  • the specific implementation is as follows:
  • the Ion Torrent platform and the Illumina platform are used to perform sequencing verification on the above-mentioned high-throughput library, as follows:
  • Ion 520 TM & Ion 530 TM Kit–OT Dilute the library after purification and quality inspection, use Ion 520 TM & Ion 530 TM Kit–OT, and proceed according to the kit operating procedures. After template preparation on the IonTouch 2 instrument, the Ion GeneStudio TM S5 Plus gene sequencer is used for sequencing and data analysis.
  • the library was diluted, and Miseq DX Reagent Kit v3 was used to proceed in accordance with the kit operating procedures. Sequencing and data analysis were performed on the Miseq DX gene sequencer.
  • the average read length of the above two samples is ⁇ 200bp, indicating that all samples in the sample are read through, that is, the bases between the beginning and the end of the target fragment to be tested can be identified;
  • Mean depth average sequencing depth
  • On Target target rate
  • Uniformity is ⁇ 90 %, indicating that the amplification efficiency of each read in the target area to be tested and the efficiency of connecting the universal high-throughput adapter are similar.
  • the above parameters all indicate that the two ends of the target segment to be tested are successfully connected to the universal high-throughput sequencing adapter, and the sequencing is successful; it indicates that the library connected to the universal high-throughput sequencing adapter can be sequenced on the Ion GeneStudio TM S5 Plus gene sequencer.
  • the data output of the above two samples are both ⁇ 0.5G, the Reads data are both ⁇ 3M, and the proportion of Q30 is ⁇ 75%, indicating that the two samples are successfully sequenced; indicating that the two samples are successfully connected to the universal high-throughput sequencing adapter at both ends of the target segment to be tested.
  • the library connected to the universal high-throughput sequencing adapter can be sequenced on the Miseq DX gene sequencer.
  • the use of the sequencing adapters prepared by the present invention to build a library can meet the sequencing requirements of the Ion GeneStudio TM S5 Plus platform and the Miseq DX platform at the same time, that is, meet the requirements of the two mainstream sequencing platforms of the Ion Torrent platform and the Illumina platform at the same time.
  • the sequencing adapter of the present invention has the properties of a universal library-building adapter.
  • the library applicable to the Ion GeneStudio TM S5Plus sequencer can be applied to other Ion Torrent platform sequencers, such as PGM, Proton, etc.
  • the library applicable to the Miseq DX gene sequencer can be applied to other types of sequencers on the Illumina platform, such as MiniSeq, NextSeq, etc. Therefore, it can be clarified that the library connected with the universal sequencing adapter of the present invention can be applied to all types of sequencers on the Ion Torrent platform and the Illumina platform.
  • This embodiment further verifies the application of the sequencing adapter of the present invention in low-frequency detection, and specifically provides a detection method for judging the authenticity of low-frequency mutations, which can correct sequencing errors introduced by index hopping.
  • the technical circuit diagram is shown in Figure 4, which specifically includes the following steps:
  • sample is a commercial tumor SNV 5% gDNA standard (GW-OGTM005), which is serially diluted with commercial human genomic DNA (G304A) to a mutation frequency of 2.5%, 1.25%, and 0.5%, named as sample 1, sample 2.
  • Sample 3 Sample 4.
  • the target area to be inspected is the designated hot spot area of EGFR(L858R/T790M/ ⁇ E746_ ⁇ A750)/PIK3CA(E545K)/KRAS(G12D/G13D/A146T)/NRAS(Q61K) gene.
  • the multiple PCR primer pool of the target detection area uses Thermo Fisher's Ampliseq colon&lung panel. 3 replicates for each sample.
  • Target fragment amplification the specific implementation is as follows:
  • the high-throughput sequencing adapter set adopts the PN1/AN1 and PN2/AN2 described in Example 1. Taking the PN2/AN2 test data as an example, 4 sets of adapter sets are prepared as follows: The sample sequence tags are ATCACG; CGATGT; TTAGGC; TGACCA. For specific preparation methods, refer to Example 1.
  • the amplified library was purified using Ampure magnetic beads, and the purified library was quantified using QUBIT 4.0.
  • the library concentration is calculated according to the dilution factor.
  • the library concentration higher than 1ng/uL can be used for subsequent experimental steps, and the library construction fails when the library concentration is lower than 1ng/uL.
  • the library was diluted, and Miseq DX Reagent Kit v3 was used to proceed in accordance with the kit operating procedures. Sequencing and data analysis were performed on the Miseq DX gene sequencer.
  • Sequencing data analysis mainly includes the following contents:
  • sequencing tag sequence and the universal high-throughput sequencing adapter AN adapter double-stranded end tag sequence base sequence to form a consistent identification of sample cross-contamination and tag hopping (index hopping) introduction
  • the sequencing error For the above-mentioned sequencing data classified into the same sample source, further use the sequencing tag sequence and the universal high-throughput sequencing adapter AN adapter double-stranded end tag sequence base sequence to form a consistent identification of sample cross-contamination and tag hopping (index hopping) introduction The sequencing error.
  • a universal high-throughput sequencing adapter is used. After the sequencing is completed, the obtained sequencing data is analyzed. First, the tag sequence is used to identify the source data of the same sample, and the sample is divided into four mutation frequencies of sample 1, sample 2, sample 3, and sample 4. Then identify whether the double-stranded partial tag sequence of the read linker is the same as the sample tag sequence, and eliminate the index hopping problem. Then, the authenticity of the mutation site is further recognized by whether the positive read and negative read with the same tag sequence have the same mutation site. However, mutations in which only positive or negative reads exist, or mutations in which the tag sequence in the read is inconsistent with the sample tag are excluded, so as to realize the correct identification of low-frequency mutations.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are a universal high-throughput sequencing adapter and an application thereof. The universal high-throughput sequencing adapter comprises a double-stranded complementary region and a single-stranded free arm. The sequencing adapter can be compatible with multiple sequencing platforms, such as Ion Torrent and Illumina platforms; the sequencing adapter is suitable for clinical testing and is cost-saving, and can further be applied to the authenticity interpretation of low-frequency mutations.

Description

一种通用型高通量测序接头及其应用A universal high-throughput sequencing adapter and its application
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年05月14日提交中国专利局的申请号为202010407833.5、名称为“一种通用型高通量测序接头及其应用”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202010407833.5 and the title "A universal high-throughput sequencing adapter and its application" submitted to the Chinese Patent Office on May 14, 2020, the entire content of which is incorporated by reference In this application.
技术领域Technical field
本发明涉及基因测序领域,具体涉及一种测序建库过程中的通用型高通量测序接头及其应用。The invention relates to the field of gene sequencing, in particular to a universal high-throughput sequencing adapter and its application in the process of sequencing database construction.
背景技术Background technique
随着基因测序技术的发展,高通量测序技术(High-Throughput Sequencing)在临床实践中应用越来越广泛,如在高危疾病的新生儿筛查、遗传疾病的诊断和基因携带的检测以及基因药物检测用于个体化用药剂量、选择和药物反应等诸多方面。高通量测序技术(High-Throughput Sequencing)又名下一代测序技术Next Generation Sequencing,NGS),是相对于传统的桑格尔测序技术(Sanger Sequencing)而言的。Sanger测序原理是通过DNA聚合酶使人工合成的短寡核酸引物延伸,与单链DNA模板杂交,合成新的DNA片段,通过聚丙烯酰胺电泳分离不同片段达到读取DNA序列的技术。在Sanger测序出现以后的时间里,由于其读长长、数据准确性高,一直被认为是DNA测序的金标准。至今也是用于下一代测序结果的验证的金标准。但Sanger测序数据通量低,在进行较多基因较多位点的同时检测时费时费力。With the development of gene sequencing technology, high-throughput sequencing technology (High-Throughput Sequencing) has become more and more widely used in clinical practice, such as neonatal screening for high-risk diseases, diagnosis of genetic diseases, detection of gene carrying, and genetic Drug testing is used in many aspects such as individualized drug dosage, selection and drug response. High-throughput sequencing technology (High-Throughput Sequencing), also known as Next Generation Sequencing Technology (NGS), is relative to the traditional Sanger Sequencing technology. The principle of Sanger sequencing is to extend a synthetic short oligonucleotide primer by DNA polymerase, hybridize with a single-stranded DNA template to synthesize new DNA fragments, and separate different fragments by polyacrylamide electrophoresis to read DNA sequences. Since the advent of Sanger sequencing, it has been considered the gold standard for DNA sequencing due to its long read length and high data accuracy. It is also the gold standard for the verification of next-generation sequencing results. However, the throughput of Sanger sequencing data is low, and it is time-consuming and labor-intensive to detect more genes and more loci at the same time.
NGS测序通常采用大规模平行测序技术(Massively parallel sequencing,MPS),可实现多样本多位点的同时测序,大大提高了测序通量。目前,高通量测序技术在临床基因检测中被证实具有很高的准确度和灵敏度,然而受到文库构建过程及测序过程中的各种噪音及错误的影响,使得测序结果中出现低频突变时,很难区分其真实性。如由于标签跳跃(index hopping)引入的数据污染,相比桥式扩增,在排他性扩增(Exclusion Amplification,ExAmp)方式下,标签跳跃的比例可高达2%(illumina.Effects of index misassignment on multiplexing and downstream[Z]Analysis)。NGS sequencing usually uses massively parallel sequencing (MPS), which can realize simultaneous sequencing of multiple samples and multiple sites, greatly improving sequencing throughput. At present, high-throughput sequencing technology has been proven to have high accuracy and sensitivity in clinical genetic testing. However, it is affected by various noises and errors in the library construction process and sequencing process, causing low-frequency mutations in the sequencing results. It is difficult to distinguish its authenticity. For example, due to data pollution introduced by index hopping, compared to bridge amplification, in the Exclusion Amplification (ExAmp) mode, the proportion of label hopping can be as high as 2% (illumina.Effects of index misassignment on multiplexing) and downstream[Z]Analysis).
目前应用广泛的NGS测序平台主要为Ion Torrent和Illumina,但由于测序技术原理不同,不同平台在文库构建时采用技术流程不同,进而造成文库不通用,即适用于Ion Torrent平台的文库不可在Illumina平台产生测序数据,反之亦然。这就对临床应用产生了极大的限制,因此有必要去寻求一种通用型文库进而适用不同的测序平台。At present, the most widely used NGS sequencing platforms are Ion Torrent and Illumina. However, due to the different principles of sequencing technology, different platforms use different technical procedures for library construction, which makes the library not universal, that is, libraries suitable for the Ion Torrent platform cannot be used on the Illumina platform. Generate sequencing data and vice versa. This has greatly restricted the clinical application, so it is necessary to find a universal library and apply different sequencing platforms.
文库构建是NGS测序中的重要组成部分,其中测序接头研究则是文库研究中的重点,目前关于测序接头设计主要包括两个方向,一个是试图改进接头形状,如Y型接头或U型接头,以减少 或避免接头二聚体的出现,提高可用测序数据量;另一个是在接头结构中加入特定的分子标签用于识别文库构建过程中产生的错误。不过目前现有技术中,通过以上两个研究方向制备的测序文库仍然只能针对固定的测序平台使用,不能同时在主流的测序平台如Ion Torrent和Illumina中使用。Ion Torrent平台及Illumina平台由于其测序原理不同,构建测序模板的方式也完全不同,Ion Torrent平台采用乳液PCR进行测序模板的构建;Illumina平台采用桥式扩增或排他性扩增方式进行测序模板的构建。根据模板构建方式Ion Torrent普遍采用直链接头;Illumina普通采用Y型接头进行文库构建。鉴于以上目前常规高通量测序接头均为单一适用接头,不能适用Ion Torrent及Illumina双平台。Library construction is an important part of NGS sequencing, and sequencing adapter research is the focus of library research. At present, the design of sequencing adapters mainly includes two directions. One is to try to improve the shape of the adapter, such as Y-shaped or U-shaped adapters. In order to reduce or avoid the appearance of adapter dimers and increase the amount of available sequencing data; the other is to add specific molecular tags to the adapter structure to identify errors in the library construction process. However, in the current existing technology, sequencing libraries prepared through the above two research directions can still only be used for fixed sequencing platforms, and cannot be used in mainstream sequencing platforms such as Ion Torrent and Illumina at the same time. The Ion Torrent platform and the Illumina platform have different sequencing principles and different methods for constructing sequencing templates. The Ion Torrent platform uses emulsion PCR to construct the sequencing template; the Illumina platform uses bridge amplification or exclusive amplification to construct the sequencing template. . According to the template construction method, Ion Torrent generally uses straight link heads; Illumina generally uses Y-links for library construction. In view of the fact that the above-mentioned current conventional high-throughput sequencing adapters are all single applicable adapters, the Ion Torrent and Illumina dual platforms cannot be applied.
有鉴于此,提出本发明。In view of this, the present invention is proposed.
发明内容Summary of the invention
本发明要解决的技术问题是克服现有技术中的高通量测序接头无法实现不同测序平台的兼容性通用,适用性不强等缺陷。The technical problem to be solved by the present invention is to overcome the disadvantages that the high-throughput sequencing adapter in the prior art cannot realize the compatibility of different sequencing platforms, and the applicability is not strong.
因此,本发明的第一目的是寻求一种适用于多种测序平台的通用型高通量测序接头;Therefore, the first objective of the present invention is to seek a universal high-throughput sequencing adapter suitable for multiple sequencing platforms;
本发明的第二目的是寻求一种适用于多种测序平台的通用型高通量测序接头的制备方法;The second objective of the present invention is to seek a method for preparing a universal high-throughput sequencing adapter suitable for multiple sequencing platforms;
本发明的第三目的是寻求一种通用型高通量测序接头的应用;The third objective of the present invention is to seek the application of a universal high-throughput sequencing adapter;
本发明的第四目的是寻求一种基因低频突变的检测方法。The fourth objective of the present invention is to seek a method for detecting low-frequency gene mutations.
为实现上述目的,本发明提供了如下技术方案:In order to achieve the above objective, the present invention provides the following technical solutions:
本发明提供了一种单链接头,其特征在于,所述单链接头依次连接The present invention provides a single link head, which is characterized in that the single link heads are connected in sequence
1)自由臂,1) Free arm,
2)双链互补区,其中,2) Double-stranded complementary region, where,
所述自由臂包含文库扩增引物结合区和载体结合区;The free arm includes a library amplification primer binding region and a carrier binding region;
所述双链互补区中包含两种或两种以上测序平台的测序引物结合区。The double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
在一些实施方式中,所述双链互补区中还包含标签序列。In some embodiments, the double-stranded complementary region further includes a tag sequence.
优选的,所述标签序列位于双链互补区远离自由臂一端。Preferably, the tag sequence is located at one end of the double-stranded complementary region away from the free arm.
在一些实施方式中,所述标签序列由6~12个随机碱基组成。In some embodiments, the tag sequence consists of 6-12 random bases.
在一些实施方式中,所述单链接头的自由臂长度为为30-56bp,所述双链互补区长度为40-58bp。In some embodiments, the length of the free arm of the single link head is 30-56 bp, and the length of the double-stranded complementary region is 40-58 bp.
在一些具体的实施方式中,In some specific embodiments,
所述自由臂可由如下序列构成:The free arm can be composed of the following sequence:
5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;
所述双链互补区域可由如下序列构成:The double-stranded complementary region may be composed of the following sequence:
5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;
其中,所述“XXXXXX”表示6~12个随机碱基组成的标签序列;Wherein, the "XXXXXX" represents a tag sequence composed of 6-12 random bases;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
在一些实施方式中,所述自由臂中还包含标签序列,且标签序列与双链互补区标签序列一致。In some embodiments, the free arm further includes a tag sequence, and the tag sequence is consistent with the double-stranded complementary region tag sequence.
在一些具体的实施方式中,In some specific embodiments,
所述自由臂可由如下序列构成:The free arm can be composed of the following sequence:
5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’,5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’,
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
本发明还提供了一种Y型高通量测序接头,其特征在于,所述测序接头包括第一单链和第二单链;The present invention also provides a Y-type high-throughput sequencing adapter, characterized in that the sequencing adapter includes a first single strand and a second single strand;
所述第一单链和第二单链分别包含:The first single strand and the second single strand respectively include:
1)自由臂,1) Free arm,
2)双链互补区,其中,2) Double-stranded complementary region, where,
所述自由臂包含文库扩增引物结合区和载体结合区;The free arm includes a library amplification primer binding region and a carrier binding region;
所述双链互补区中包含两种或两种以上测序平台的测序引物结合区。The double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
在一些实施方式中,所述第一单链和第二单链的自由臂序列不互补,所述第一单链和第二单链经退火可形成Y型结构双链。In some embodiments, the free arm sequences of the first single strand and the second single strand are not complementary, and the first single strand and the second single strand may be annealed to form a Y-shaped structure double strand.
在一些实施方式中,所述双链互补区中包含标签序列,所述标签序列位于双链互补区远离自由臂一端。In some embodiments, the double-stranded complementary region includes a tag sequence, and the tag sequence is located at one end of the double-stranded complementary region away from the free arm.
在一些实施方式中,所述测序平台包括但不限于Illumina、Ion Torrent、PacBio、Roche、Helicos、ABI平台;优选的,所述测序平台为Ion Torrent和Illumina平台。In some embodiments, the sequencing platform includes, but is not limited to, Illumina, Ion Torrent, PacBio, Roche, Helicos, and ABI platforms; preferably, the sequencing platform is Ion Torrent and Illumina platforms.
在一些实施方式中,所述第二单链自由臂中还包含标签序列。In some embodiments, the second single-stranded free arm further includes a tag sequence.
在一些优选实施方式中,所述自由臂中的标签序列与双链互补区中标签序列相同;更优选的,所述自由臂中标签序列靠近双链互补区端。In some preferred embodiments, the tag sequence in the free arm is the same as the tag sequence in the double-stranded complementary region; more preferably, the tag sequence in the free arm is close to the end of the double-stranded complementary region.
在一些实施方式中,所述第一单链和第二单链的双链互补区长度为40-58bp;所述第一单链自由臂长度为30-45bp,所述第二单链自由臂长度为35-56bp;所述标签序列为6~12bp的随机碱基组成。In some embodiments, the length of the double-stranded complementary region of the first single strand and the second single strand is 40-58 bp; the length of the free arm of the first single strand is 30-45 bp, and the length of the free arm of the second single strand is 30-45 bp. The length is 35-56bp; the tag sequence is composed of random bases of 6-12bp.
在一些实施方式中,所述第一或第二单链的自由臂3’末端进行稳定性修饰;In some embodiments, the 3'end of the free arm of the first or second single strand is modified for stability;
优选的,进行硫代修饰;Preferably, thio modification is carried out;
更优选的,在3’末端最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。More preferably, the phosphodiester bond between the last 3 bases at the 3'end is replaced by phosphorothioate.
在一些实施方式中,所述第一单链序列如下:In some embodiments, the first single-stranded sequence is as follows:
自由臂序列:Free arm sequence:
5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;
双链互补区域序列:Double-stranded complementary region sequence:
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
5’-GCTCTTCCGATNNNNNNNNNNCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’5’-GCTCTTCCGATNNNNNNNNNNCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
在一些优选的实施方式中,所述第一单链为自由臂和双链互补区依次5’-3’方向连接。In some preferred embodiments, the first single strand is connected by a free arm and a double-strand complementary region in a 5'-3' direction sequentially.
在一些实施方式中,所述第二单链序列如下:In some embodiments, the second single-stranded sequence is as follows:
自由臂序列:Free arm sequence:
5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’;5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’;
双链互补区域序列:Double-stranded complementary region sequence:
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGNNNNNNNNNNATCGGAAGAGC-3’;5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGNNNNNNNNNNATCGGAAGAGC-3’;
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
在一些优选的实施方式中,所述第二单链为双链互补区和自由臂依次5’-3’方向连接。In some preferred embodiments, the second single-stranded double-stranded complementary region and the free arm are connected in a 5'-3' direction sequentially.
本发明还提供了一种高通量测序接头组,其特征在于,所述测序接头组包括上述所述的高通量测序接头。The present invention also provides a high-throughput sequencing adapter set, characterized in that the sequencing adapter set includes the above-mentioned high-throughput sequencing adapter.
在一些实施方式中,所述高通量测序接头组还包括如下另一种Y型高通量测序接头:所述Y型高通量测序接头包含第三和第四单链;In some embodiments, the high-throughput sequencing adapter set further includes another Y-type high-throughput sequencing adapter as follows: the Y-type high-throughput sequencing adapter includes third and fourth single strands;
该另一种Y型高通量测序接头序列与上述Y型高通量测序接头序列类似,仅双链互补区序列不同;The sequence of the other Y-type high-throughput sequencing adaptor is similar to the sequence of the above-mentioned Y-type high-throughput sequencing adaptor, except that the sequence of the double-stranded complementary region is different;
其中,所述第三单链的双链互补区域序列如下:Wherein, the sequence of the double-strand complementary region of the third single-strand is as follows:
5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;
所述第四单链的双链互补区序列与第三单链双链互补区序列互补;The sequence of the double-strand complementary region of the fourth single-stranded is complementary to the sequence of the third single-stranded double-strand complementary region;
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
在一些优选的实施方式中,所述第三、第四单链的自由臂和双链互补区序列之间的连接顺序与第一、第二单链相同。In some preferred embodiments, the connection sequence between the free arms of the third and fourth single strands and the double-strand complementary region sequence is the same as that of the first and second single strands.
在一些优选的实施方式中,所述Y型高通量测序接头的单链序列分别如下:In some preferred embodiments, the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
第一单链序列(SEQ ID NO.1):The first single-stranded sequence (SEQ ID NO.1):
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
第二单链序列(SEQ ID NO.2):The second single-stranded sequence (SEQ ID NO.2):
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’;5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’;
第三单链序列(SEQ ID NO.3):The third single-stranded sequence (SEQ ID NO.3):
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
第四单链序列(SEQ ID NO.4):The fourth single-stranded sequence (SEQ ID NO.4):
5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’。5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’.
在另一些优选的实施方式中,所述Y型高通量测序接头的单链序列分别如下:In some other preferred embodiments, the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
第一单链序列(SEQ ID NO.5):The first single-stranded sequence (SEQ ID NO.5):
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
第二单链序列(SEQ ID NO.6):The second single-stranded sequence (SEQ ID NO.6):
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTTTG-3’;5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTTTG-3’;
第三单链序列(SEQ ID NO.7):The third single-stranded sequence (SEQ ID NO.7):
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
第四单链序列(SEQ ID NO.8):The fourth single-stranded sequence (SEQ ID NO. 8):
5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTTTG-3’。5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTTTG-3’.
鉴于序列表生成问题,序列表中的SEQ ID NO.1-8不包含“XXXXXX”。In view of the problem of sequence listing generation, SEQ ID NO.1-8 in the sequence listing does not contain "XXXXXX".
本发明还提供一种组合物,其特征在于,所组合物包含上述的高通量测序接头或接头组。The present invention also provides a composition, characterized in that the composition comprises the above-mentioned high-throughput sequencing linker or linker set.
本发明还提供一种复合物,其特征在于,所复合物连接于上述的高通量测序接头或接头组。The present invention also provides a complex, which is characterized in that the complex is connected to the above-mentioned high-throughput sequencing adapter or adapter set.
本发明还提供一种试剂盒,其特征在于,所组合物包含上述的高通量测序接头或接头组。The present invention also provides a kit, characterized in that the composition comprises the above-mentioned high-throughput sequencing adapter or adapter set.
在一些实施方式中,所述试剂盒为高通量测序建库试剂盒或基因序列富集试剂盒。In some embodiments, the kit is a high-throughput sequencing library building kit or a gene sequence enrichment kit.
本发明还提供上述高通量测序接头的制备方法,其特征在于,包括如下步骤:The present invention also provides a method for preparing the above-mentioned high-throughput sequencing adapter, which is characterized in that it comprises the following steps:
S1分别合成第一链和第二链单链序列;S1 synthesizes the first strand and the second strand single-stranded sequence respectively;
S2将S1所述两条单链序列进行特异性退火,得到所述高通量测序接头。S2 specifically anneals the two single-stranded sequences of S1 to obtain the high-throughput sequencing adapter.
本发明还提供一种测序文库的构建方法,其特征在于,The present invention also provides a method for constructing a sequencing library, which is characterized in that:
S1制备待测样本目标片段;S1 prepares the target fragment of the sample to be tested;
S2将上述高通量测序接头或接头组连接于S1的目标片段获得连接产物;S2 connects the aforementioned high-throughput sequencing adapter or adapter set to the target fragment of S1 to obtain a ligation product;
S3扩增S1连接产物,纯化后获得所述待测样本的测序文库。S3 amplifies the S1 ligation product, and obtains the sequencing library of the sample to be tested after purification.
本发明还提供一种基因低频突变的检测方法,且特征在于,包括如下步骤:The present invention also provides a method for detecting low-frequency gene mutations, and is characterized in that it comprises the following steps:
S1制备上述高通量测序接头或接头组,针对同一样本,所述标签序列相同;S1 prepares the above-mentioned high-throughput sequencing adapters or adapter sets, for the same sample, the tag sequences are the same;
S2对待测样本进行目标片段扩增,消化引物;S2 performs target fragment amplification on the sample to be tested and digests the primers;
S3将S2所述消化产物连接S1所述中高通量测序接头或接头组,获得连接产物,扩增连接产物,纯化后获得测序文库;S3 connects the digested product of S2 to the mid-to-high-throughput sequencing adapter or adapter set of S1 to obtain a ligation product, amplify the ligation product, and obtain a sequencing library after purification;
S4将S3所述测序文库进行测序,根据高通量测序接头的标签序列校正所述测序数据,基于矫正后的测序数据进行突变分析。S4 sequence the sequencing library of S3, correct the sequencing data according to the tag sequence of the high-throughput sequencing adapter, and perform mutation analysis based on the corrected sequencing data.
在一些实施方式中,所述步骤S4中的突变分析为:基于某一特定突变在同一读段的正义链和反义链均出现则判定为真低频突变。In some embodiments, the mutation analysis in step S4 is: based on the fact that a specific mutation appears in both the sense strand and the antisense strand of the same read, it is determined as a true low-frequency mutation.
在一些优选的实施方式中,所述待测样本为基因组DNA。In some preferred embodiments, the sample to be tested is genomic DNA.
本发明还提供一种上述高通量测序接头、接头组、组合物、复合物或试剂盒的如下应用:The present invention also provides the following applications of the above-mentioned high-throughput sequencing adapter, adapter set, composition, complex or kit:
a、在测序文库构建中或制备测序文库的产品中的应用;a. Application in the construction of sequencing libraries or in the preparation of sequencing library products;
b、在高通量测序中或在制备高通量测序产品中的应用;b. Application in high-throughput sequencing or in the preparation of high-throughput sequencing products;
c、在基因低频突变检测中或在制备基因低频突变检测产品中的应用;c. Application in gene low-frequency mutation detection or in the preparation of gene low-frequency mutation detection products;
d、在体外诊断或在制备体外诊断产品中的应用;d. Application in in vitro diagnostics or in the preparation of in vitro diagnostic products;
e、在用于目标基因或扩增富集中的应用。e. In the application of target gene or amplification enrichment.
本发明的有益技术效果:The beneficial technical effects of the present invention:
1)利用本发明的通用高通量测序接头或其试剂盒构建文库,可在主流测序平台Ion Torrent及Illumina所有型号测序平台上进行,产生测序数据。使得文库构建试剂盒及方法不受已有测序平台的限制。满足日益多样的临床需求。对一个特定的检测需求,只需要开发一种建库试剂盒即可在Ion Torrent及Illumina所有型号测序平台产生测序数据,也节约了应用企业的开发成本及周期。1) Using the universal high-throughput sequencing adapter of the present invention or its kit to construct a library can be performed on the mainstream sequencing platform Ion Torrent and all types of Illumina sequencing platforms to generate sequencing data. The library construction kit and method are not restricted by the existing sequencing platform. Meet the increasingly diverse clinical needs. For a specific detection requirement, it is only necessary to develop a library building kit to generate sequencing data on all types of Ion Torrent and Illumina sequencing platforms, which also saves the development cost and cycle of application companies.
2)本发明提供的通用高通量测序接头包含配对的双链互补区和不配对的单链自由臂。配对双链部分远端含有标签序列,两个自由臂的非自由端含有标签序列。同一样本带有的标签序列碱基构成一致,根据标签序列的一致性可判断在建库过程中是否存在交叉污染。在使用Illumina测序平台部分型号测序仪进行测序后,测序数据的分析可根据相同读段的标签序列碱基构成是否一致来判断是否出现index hopping情况。2) The universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms. The distal end of the paired double-stranded part contains the tag sequence, and the non-free ends of the two free arms contain the tag sequence. The base composition of the tag sequence carried by the same sample is consistent. According to the consistency of the tag sequence, it can be judged whether there is cross-contamination during the library construction process. After using some models of the Illumina sequencing platform for sequencing, the analysis of sequencing data can determine whether there is index hopping based on whether the base composition of the tag sequence of the same read is consistent.
3)本发明提供的通用高通量测序接头包含配对的双链互补区和不配对的单链自由臂。配对双链部分远端含有标签序列,同一读段正义链、反义链应带有的标签序列碱基构成相同。某一特定突变需在同一读段的正义链,反义链均出现方可判定为真突变。若某读段只在正义链或反义链出现突变,则可判定为文库构建或测序过程中的错误,不可计入突变进行后续分析流程,避免了假阳性。3) The universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms. The distal end of the paired double-stranded part contains a tag sequence, and the base composition of the tag sequence should be the same for the sense strand and antisense strand of the same read. A specific mutation must be in the sense strand of the same read, and the antisense strand can be judged as a true mutation. If a certain read only has mutations in the sense strand or the antisense strand, it can be judged as an error in the library construction or sequencing process, and the mutation cannot be included in the subsequent analysis process to avoid false positives.
4)本发明提供的通用高通量测序接头中包含的标签序列,只利用一段标签序列,通过特定突变需在正义链、反义链同时存在;样本的标签序列碱基构成应和读段内的标签序列碱基构成相同。当某一样本的标签序列与读段内标签序列不相同时,可说明此读段不属于此样本,即出现了标签跳跃情况。利用本发明设计能够有效克服测序部分平台自身固有的标签跳跃问题,可实现低频突变的真实性判读。4) The tag sequence contained in the universal high-throughput sequencing adapter provided by the present invention only uses a segment of tag sequence, and must exist in both the sense strand and the antisense strand through specific mutations; the base composition of the tag sequence of the sample should be in the read segment The base composition of the tag sequence is the same. When the tag sequence of a sample is not the same as the tag sequence in the read segment, it can indicate that the read segment does not belong to this sample, that is, a tag skip situation has occurred. The design of the present invention can effectively overcome the inherent label jumping problem of the sequencing part of the platform, and can realize the authenticity interpretation of low-frequency mutations.
5)本发明提供的通用高通量测序接头包含配对的双链互补区和不配对的单链自由臂。示例性的,通用高通量测序接头包含PN接头和AN接头;PN接头双链互补区域由40~58个碱基构成;AN接头双链互补区域由40~58个碱基构成;PN接头或AN接头5’自由臂由30~45个碱基构成;PN接头或AN接头3’自由臂由35~56个碱基构成;标签序列由6~12个碱基构成,从而实现至少构建114048个通用高通量测序接头。5) The universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms. Exemplarily, the universal high-throughput sequencing adapter includes a PN adapter and an AN adapter; the PN adapter double-strand complementary region consists of 40 to 58 bases; the AN adapter double-strand complementary region consists of 40 to 58 bases; the PN adapter or The 5'free arm of the AN linker is composed of 30 to 45 bases; the 3'free arm of the PN linker or the AN adaptor is composed of 35 to 56 bases; the tag sequence is composed of 6 to 12 bases, so as to achieve at least 114048 bases. Universal high-throughput sequencing adapter.
附图说明Description of the drawings
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the specific embodiments or the description of the prior art. Obviously, the appendix in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1.实施例1所示的通用高通量测序接头的结构示意图;Figure 1. Schematic diagram of the structure of the universal high-throughput sequencing adapter shown in Example 1;
图2.实施例2中通用高通量文库2100质控图谱,含通用高通量测序接头文库2100质控图谱(样本R19054232);Figure 2. The quality control map of the universal high-throughput library 2100 in Example 2, including the quality control map of the universal high-throughput sequencing adapter library 2100 (sample R19054232);
图3.实施例2中通用高通量文库2100质控图谱,含通用高通量测序接头文库2100质控图谱(样本R20005128);Figure 3. The quality control map of the universal high-throughput library 2100 in Example 2, including the quality control map of the universal high-throughput sequencing adapter library 2100 (sample R20005128);
图4.实施例4中技术线路图。Figure 4. Technical circuit diagram in embodiment 4.
具体实施方式Detailed ways
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
以下术语或定义仅仅是为了帮助理解本发明而提供。这些定义不应被理解为具有小于本领域技术人员所理解的范围。The following terms or definitions are only provided to help understand the present invention. These definitions should not be construed as having a scope less than that understood by those skilled in the art.
除非在下文中另有定义,本发明具体实施方式中所用的所有技术术语和科学术语的含义意图与本领域技术人员通常所理解的相同。虽然相信以下术语对于本领域技术人员很好理解,但仍然阐述以下定义以更好地解释本发明。Unless otherwise defined below, the meanings of all technical and scientific terms used in the specific embodiments of the present invention are intended to be the same as those commonly understood by those skilled in the art. Although it is believed that the following terms are well understood by those skilled in the art, the following definitions are still set forth to better explain the present invention.
如本发明中所使用,术语“包括”、“包含”、“具有”、“含有”或“涉及”为包含性的(inclusive)或开放式的,且不排除其它未列举的元素或方法步骤。术语“由…组成”被认为是术语“包含”的优选实施方案。如果在下文中某一组被定义为包含至少一定数目的实施方案,这也应被理解为揭示了一个优选地仅由这些实施方案组成的组。As used in the present invention, the terms "including", "including", "having", "containing" or "involving" are inclusive or open-ended, and do not exclude other unlisted elements or method steps . The term "consisting of" is considered a preferred embodiment of the term "comprising". If in the following a certain group is defined as comprising at least a certain number of embodiments, this should also be understood as revealing a group preferably consisting of only these embodiments.
在提及单数形式名词时使用的不定冠词或定冠词例如“一个”或“一种”,“所述”,包括该名词的复数形式。The indefinite or definite article used when referring to a noun in the singular form such as "a" or "an", "the" includes the plural form of the noun.
本发明中的术语“大约”、“大体”表示本领域技术人员能够理解的仍可保证论及特征的技术效果的准确度区间。该术语通常表示偏离指示数值的±10%,优选±5%。The terms "approximately" and "generally" in the present invention represent the accuracy range that can be understood by those skilled in the art and can still guarantee the technical effect of discussing the feature. The term usually indicates a deviation of ±10% from the indicated value, preferably ±5%.
此外,说明书和权利要求书中的术语第一、第二、第三、(a)、(b)、(c)以及诸如此类,是用于区分相似的元素,不是描述顺序或时间次序必须的。应理解,如此应用的术语在适当的环境下可互换,并且本发明描述的实施方案能以不同于本发明描述或举例说明的其它顺序实施。In addition, the terms first, second, third, (a), (b), (c) and the like in the specification and claims are used to distinguish similar elements, and are not necessary for the order of description or time. It should be understood that the terms so applied are interchangeable under appropriate circumstances, and the embodiments described in the present invention can be implemented in other orders than described or exemplified in the present invention.
本发明中的术语“核酸”或“核酸序列”指包含核糖核酸、脱氧核糖核酸或其类似物单元的任何分子、优选聚合分子。所述核酸可为单链的或双链的。单链核酸可为变性双链DNA的一条链的核酸。或者,单链核酸可为不来源于任何双链DNA的单链核酸。The term "nucleic acid" or "nucleic acid sequence" in the present invention refers to any molecule comprising ribonucleic acid, deoxyribonucleic acid or its analogue unit, preferably a polymeric molecule. The nucleic acid can be single-stranded or double-stranded. The single-stranded nucleic acid may be a nucleic acid of one strand of denatured double-stranded DNA. Alternatively, the single-stranded nucleic acid may be a single-stranded nucleic acid that is not derived from any double-stranded DNA.
本文所使用的术语“互补”涉及核苷酸碱基G、A、T、C和U之间的氢键碱基配对,以使得当两种给定的多核苷酸或多核苷酸序列彼此退火时,在DNA中A与T配对、G与C配对,在RNA中G与C配对、A与U配对。The term "complementary" as used herein refers to the hydrogen bond base pairing between the nucleotide bases G, A, T, C, and U, so that when two given polynucleotides or polynucleotide sequences anneal to each other At this time, A paired with T, G paired with C in DNA, G paired with C, and A paired with U in RNA.
其它术语在本发明各个方面的描述中进行定义。Other terms are defined in the description of various aspects of the invention.
本发明中所述的“测序接头”是指:连接在待测序目标片段两端的一段双链核苷酸序列,该双链寡核苷酸序列可以是双链完全互补,也可以是部分双链互补,比如因末端部分序列不互补而形成的“Y型”接头,本发明的测序接头优选为这种“Y型”接头。另外测序接头的核苷酸序列的构成与所适用测序平台相关,例如构成上,可以包括文库扩增引物序列,样本标签序列,测序引物序列等;而测序接头的序列长度也与测序平台相关,本领域可以进行选择,比如本发明的一些实施方式中,所述的接头长度具体可为:3’自由臂序列为35~56bp,5’自由臂序列长度为30~45bp,双链互补区序列长度为40~58bp。The "sequencing adapter" in the present invention refers to a double-stranded nucleotide sequence connected to the two ends of the target fragment to be sequenced. The double-stranded oligonucleotide sequence can be double-stranded completely complementary or partially double-stranded. Complementarity, such as a "Y-type" linker formed because the terminal partial sequence is not complementary. The sequencing linker of the present invention is preferably such a "Y-type" linker. In addition, the composition of the nucleotide sequence of the sequencing adapter is related to the applicable sequencing platform. For example, the composition can include library amplification primer sequence, sample tag sequence, sequencing primer sequence, etc.; and the sequence length of the sequencing adapter is also related to the sequencing platform. It can be selected in the art. For example, in some embodiments of the present invention, the length of the linker can be specifically: 3'free arm sequence is 35-56 bp, 5'free arm sequence is 30-45 bp, double-stranded complementary region sequence The length is 40~58bp.
示例性的,如图1为本发明的一种优选的通用高通量测序“Y型”接头,其包括PN接头及AN接头,其可分别位于目标序列的任一端。所述PN接头及AN接头均包含双链互补区、单链5’自由臂和单链3’自由臂。在一些实施方式中,通用高通量测序接头的PN接头及AN接头的双链互补区 均包含标签序列,所述标签序列为由6~12个碱基组成。在一些优选的实施方式中,所述高通量测序接头的AN接头单链3’自由臂非自由端还包含与标签序列相同的碱基构成,上述通用高通量测序接头的PN接头单链3’自由臂非自由端包含与标签序列相同的碱基构成。在一些实施方式中,为增强接头稳定性,防止水解,通用高通量测序接头AN接头及PN接头的3’-自由臂的3’末端进行硫代修饰;优选的,在最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。示例性的,可通过连接酶将通用高通量测序接头AN接头与PN接头的双链互补端与原始基因片段通过连接反应进行连接。而AN接头的5’自由臂及PN接头的3’自由臂由于为非互补配对单链而不能与原始基因片段相连,从而能够保障通用高通量测序接头与DNA片段的连接效率。Exemplarily, Fig. 1 is a preferred universal high-throughput sequencing "Y-type" linker of the present invention, which includes a PN linker and an AN linker, which can be respectively located at either end of the target sequence. Both the PN linker and the AN linker comprise a double-stranded complementary region, a single-stranded 5'free arm and a single-stranded 3'free arm. In some embodiments, the PN linker of the universal high-throughput sequencing linker and the double-stranded complementary region of the AN linker both include a tag sequence, and the tag sequence is composed of 6 to 12 bases. In some preferred embodiments, the non-free end of the AN adaptor single-stranded 3'free arm of the high-throughput sequencing adaptor also contains the same base composition as the tag sequence, and the PN adaptor of the above-mentioned universal high-throughput sequencing adaptor is single-stranded The non-free end of the 3'free arm contains the same base composition as the tag sequence. In some embodiments, in order to enhance the stability of the linker and prevent hydrolysis, the 3'end of the 3'-free arm of the universal high-throughput sequencing linker AN linker and PN linker is thio modified; preferably, the last 3 bases The phosphodiester bond between is replaced by phosphorothioate. Exemplarily, the double-stranded complementary ends of the universal high-throughput sequencing linker AN linker and PN linker can be connected to the original gene fragment through a ligation reaction by a ligase. The 5'free arm of the AN adaptor and the 3'free arm of the PN adaptor are non-complementary paired single strands and cannot be connected to the original gene fragments, thereby ensuring the efficiency of connecting the universal high-throughput sequencing adaptor to the DNA fragments.
本发明说明书中所述的“PN接头”和“AN接头”分别是指:包含双链互补区及单链3’/5’自由臂的部分双链结构片段(Y型结构),其在文库构建时分别连接于目标序列的一端,两者核苷酸序列优选不同。The "PN linker" and "AN linker" mentioned in the specification of the present invention respectively refer to a partial double-stranded structural fragment (Y-type structure) containing a double-stranded complementary region and a single-stranded 3'/5' free arm, which is in the library When constructing, they are connected to one end of the target sequence respectively, and the nucleotide sequences of the two are preferably different.
本发明中所述的“自由臂”是指:接头序列中碱基不互补配对的区域,比如本发明PN接头或AN接头的非配对区域,因此,即便不明确自由臂间序列不互补,本领域也应该理解到两者序列不互补,在一些情况下能够形成Y型结构。另外,在一些实施方式中,本发明的自由臂中包括文库扩增引物区;在另一些实施方式中,本发明的3’自由臂还包含标签序列。The "free arm" in the present invention refers to the region where the bases in the linker sequence are not complementary paired, such as the unpaired region of the PN linker or AN linker of the present invention. Therefore, even if it is not clear that the sequence between the free arms is not complementary, this The field should also understand that the two sequences are not complementary and can form a Y-shaped structure in some cases. In addition, in some embodiments, the free arm of the present invention includes a library amplification primer region; in other embodiments, the 3'free arm of the present invention also includes a tag sequence.
本发明中所述的“双链互补区”是指:包含于测序接头中的双链互补的区域,该区域通常包含测序引物序列,本发明所双链互补区包含至少两个测序平台的测序引物序列。The "double-stranded complementary region" in the present invention refers to the double-stranded complementary region contained in the sequencing adapter. This region usually contains sequencing primer sequences. The double-stranded complementary region of the present invention contains at least two sequencing platforms for sequencing. Primer sequence.
本发明中所述的“标签序列”是指:6~12bp碱基长度的核苷酸序列,用于识别不同文库样本。The "tag sequence" in the present invention refers to a nucleotide sequence with a base length of 6 to 12 bp, which is used to identify different library samples.
本发明中所述的“非自由端”是指:PN接头或AN接头双链互补区域与单链3’或5’自由臂相连的一端。The "non-free end" in the present invention refers to the end where the double-stranded complementary region of the PN linker or the AN linker is connected to the 3'or 5'free arm of the single strand.
本发明中所述的“自由端”是指:PN接头或AN接头的单链3’自由臂的3’端或单链5’自由臂的5’端。The "free end" in the present invention refers to the 3'end of the single-stranded 3'free arm or the 5'end of the single-stranded 5'free arm of the PN linker or the AN linker.
本发明所述的“高通量测序平台”是指:诸如Ion Torrent、Illumina、Roche454和ABI等测序平台,虽然本发明优选所述测序平台为Ion Torrent和Illumina,但并不对其进行限制。本领域清楚,基于本发明的发明构思,可以针对任意两个或多个平台进行引物序列选择,将其构建与接头序列中,进而制备出本发明的兼容性高通量测序接头。另外,对于不同测序平台下的测序仪,考虑到同类型测序平台下测序仪的测序原理基本相同,因此本发明的方法适用于相同平台下的所有机型,比如,所述Ion Torrent测序平台内所有型号,包括但不限于Ion GeneStudio TM S5 Plus、PGM、Proton等;Illumina测序平台内所有型号,包括但不限于Miseq DX、MiniSeq、NextSeq等都适于本发明。 The "high-throughput sequencing platform" in the present invention refers to sequencing platforms such as Ion Torrent, Illumina, Roche454, and ABI. Although the sequencing platforms are preferably Ion Torrent and Illumina in the present invention, they are not limited. It is clear in the art that, based on the inventive concept of the present invention, primer sequences can be selected for any two or more platforms, and they can be constructed in the linker sequence to prepare the compatible high-throughput sequencing linker of the present invention. In addition, for sequencers under different sequencing platforms, considering that the sequencing principles of sequencers under the same type of sequencing platform are basically the same, the method of the present invention is applicable to all models under the same platform, for example, in the Ion Torrent sequencing platform. All models, including but not limited to Ion GeneStudio TM S5 Plus, PGM, Proton, etc.; all models in the Illumina sequencing platform, including but not limited to Miseq DX, MiniSeq, NextSeq, etc., are suitable for the present invention.
本发明中所述的“低频突变”是指:基因突变频率低于5%的突变情况,包括低于5%,低于4%,低于3%,低于2%,低于1%等各种突变情况。The "low-frequency mutation" in the present invention refers to mutations where the frequency of gene mutation is less than 5%, including less than 5%, less than 4%, less than 3%, less than 2%, less than 1%, etc. Various mutations.
本发明通过附图和如下实施例进一步描述,所述的附图和实施例只是为了例证本发明的特定实施方案,不应理解为以任何方式限制本发明范围之意。除非另外说明,本发明中所公开的实验方法均采用本技术领域常规技术,通用高通量测序接头由生工生物工程(上海)股份有限公司完成,实施例中所用的试剂和原材料均可由市场购得。The present invention is further described by the accompanying drawings and the following examples. The accompanying drawings and examples are only to illustrate specific embodiments of the present invention and should not be construed as limiting the scope of the present invention in any way. Unless otherwise specified, the experimental methods disclosed in the present invention all adopt conventional techniques in this technical field. The universal high-throughput sequencing adapter is completed by Shenggong Bioengineering (Shanghai) Co., Ltd., and the reagents and raw materials used in the examples can be obtained from the market. Purchased.
实施例1 高通量测序接头设计和制备Example 1 Design and preparation of high-throughput sequencing adapters
根据图1所示结构组成,设计两组通用高通量测序接头AN1/PN1和AN2/PN2。其中,AN1/PN1为一组序列较短的通用高通量测序接头,AN2/PN2为另一组序列较长的通用高通量测序接头。According to the structure shown in Figure 1, two sets of universal high-throughput sequencing adapters AN1/PN1 and AN2/PN2 were designed. Among them, AN1/PN1 is a set of universal high-throughput sequencing adapters with shorter sequences, and AN2/PN2 is another set of universal high-throughput sequencing adapters with longer sequences.
具体制备方法如下:The specific preparation method is as follows:
制备以下序列1-4所示测序接头,将序列1与2退火形成AN1接头,将序列3与4退火形成AN2接头。Prepare the sequencing linker shown in the following sequence 1-4, annealing sequence 1 and 2 to form AN1 linker, and annealing sequence 3 and 4 to form AN2 linker.
序列1:Sequence 1:
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTA*T*G-3’;(3’自由臂链)5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTA*T*G-3’; (3’ free arm chain)
序列2:Sequence 2:
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;(5’自由臂链)5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’; (5’ free arm chain)
序列3:Sequence 3:
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTT*T*G-3’;(3’自由臂链)5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTT*T*G-3’; (3’ free arm chain)
序列4:Sequence 4:
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;(5’自由臂链)5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’; (5’ free arm chain)
制备以下序列5-8所示测序接头,将序列5与6退火形成PN1,序列7与8退火形成PN2接头。Prepare the sequencing linker shown in the following sequence 5-8, annealing sequence 5 and 6 to form PN1, and sequence 7 and 8 to form PN2 linker.
序列5:Sequence 5:
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
序列6:Sequence 6:
5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTA*T*G-3’5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTA*T*G-3’
序列7:Sequence 7:
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
序列8:Sequence 8:
5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTT*T*G-3’5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTT*T*G-3’
其中,AN接头双链互补区末端包含标签序列,其中标签序列为6~12个随机碱基“X”。AN接头3’自由臂也包含标签序列为6~12个随机碱基“X”,并连接于AN接头3’自由臂的非自由端。PN接头双链互补区末端包含标签序列,其中标签序列为6~12个随机碱基“X”。PN接头3’自由臂也包含标签序列为6~12个随机碱基“X”,并连接于PN接头3’自由臂的非自由端。*代表硫代修饰位点,具体的,为增强接头稳定性,防止水解,通用高通量测序接头AN接头及PN接头的3’自由臂的3’末端最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。Among them, the end of the double-strand complementary region of the AN linker contains a tag sequence, where the tag sequence is 6-12 random bases "X". The 3'free arm of the AN linker also contains a tag sequence of 6-12 random bases "X", and is connected to the non-free end of the 3'free arm of the AN linker. The end of the double-stranded complementary region of the PN linker contains a tag sequence, where the tag sequence is 6-12 random bases "X". The 3'free arm of the PN linker also contains a tag sequence of 6-12 random bases "X", and is connected to the non-free end of the 3'free arm of the PN linker. *Represents the thio modification site. Specifically, to enhance the stability of the linker and prevent hydrolysis, the universal high-throughput sequencing linker AN linker and the phosphodiester between the last 3 bases of the 3'end of the 3'free arm of the PN linker The bond is replaced by phosphorothioate.
实施例2 主动脉相关基因通用高通量测序文库制备Example 2 Preparation of a universal high-throughput sequencing library of aortic-related genes
分别采用实施例1制备的通用高通量测序接头AN1/PN1和AN2/PN2用于实验。The universal high-throughput sequencing adapters AN1/PN1 and AN2/PN2 prepared in Example 1 were used in the experiment, respectively.
其中,通用高通量测序接头数目对应待测样本的数量。例如,待测样本的数量为10,则对应制备10组通用高通量测序接头组,每组通用高通量测序接头组中均包含PN1接头及AN1接头。同组PN1接头与AN1接头中标签序列碱基序列构成相同,不同接头组中标签序列碱基序列构成不同。Among them, the number of universal high-throughput sequencing adapters corresponds to the number of samples to be tested. For example, if the number of samples to be tested is 10, 10 sets of universal high-throughput sequencing adapter sets are prepared correspondingly, and each set of universal high-throughput sequencing adapter sets includes PN1 adapters and AN1 adapters. The base sequence composition of the tag sequence in the same group of PN1 adaptors and AN1 adaptors is the same, and the base sequence composition of the tag sequence in different adaptor groups is different.
连接待测目的片段与上述测序接头组,获得连接产物;其中,同一样本来源的所述基因片段连接同一组所述通用高通量测序接头组;扩增所述连接产物得到扩增产物,纯化后获得所述待测样本的通用高通量测序文库。Connect the target fragment to be tested with the above-mentioned sequencing adapter set to obtain a ligation product; wherein the gene fragments from the same sample source are connected to the same set of the universal high-throughput sequencing adapter set; the ligation product is amplified to obtain an amplified product, which is purified Then, a universal high-throughput sequencing library of the sample to be tested is obtained.
具体步骤如下:Specific steps are as follows:
1、DNA提取及质检1. DNA extraction and quality inspection
(1)样本基因组DNA提取:取外周血样本1和2(分别对应R190542432和R20005128)分别进行基因组DNA提取。样本DNA的提取按照北京安智因生物技术有限公司生产的核酸提取试剂(DR181003-48)操作说明进行提取。(1) Sample genomic DNA extraction: Take peripheral blood samples 1 and 2 (corresponding to R190542432 and R20005128 respectively) for genomic DNA extraction. The sample DNA was extracted in accordance with the operating instructions of the nucleic acid extraction reagent (DR181003-48) produced by Beijing Anzhiyin Biotechnology Co., Ltd.
(2)利用NanodropOne进行DNA纯度测定,利用Qubit 4.0进行双链DNA浓度测定。将DNA稀释至5ng/ul备用。(2) Use NanodropOne for DNA purity measurement, and Qubit 4.0 for double-stranded DNA concentration measurement. Dilute the DNA to 5ng/ul for use.
2、文库构建2. Library construction
待检目标区域为ACTA2,COL3A1,FBN1,MYH11,MYLK,SMAD3,TGFBR1,TGFBR2基因的全编码区和可变剪切区(外显子向内含子外延20bp)。目标检测区域的多重PCR引物池,基于Ion Ampliseq Designer设计,由Thermo Fisher公司进行合成并提供。The target regions to be examined are the entire coding region and the variable splicing region of the ACTA2, COL3A1, FBN1, MYH11, MYLK, SMAD3, TGFBR1, and TGFBR2 genes (20bp from exons to introns). The multiple PCR primer pool of the target detection area is based on the design of Ion Ampliseq Designer, synthesized and provided by Thermo Fisher.
(1)目标片段扩增,具体实施如下:(1) Target fragment amplification, the specific implementation is as follows:
组分Component 反应体积Reaction volume
多重PCR master mixMultiplex PCR master mix 2uL2uL
引物池1/2Primer pool 1/2 5uL5uL
基因组DNA(5ng/uL)Genomic DNA (5ng/uL) 2uL2uL
无核酸酶水Nuclease-free water 1uL1uL
总体积total capacity 10uL10uL
反应条件Reaction conditions
Figure PCTCN2020092418-appb-000001
Figure PCTCN2020092418-appb-000001
Figure PCTCN2020092418-appb-000002
Figure PCTCN2020092418-appb-000002
(2)消化反应,具体实施如下:(2) Digestion reaction, the specific implementation is as follows:
将同一样本引物池1与引物池2扩增产物混合,体积为20uL,加入2uL消化反应预混液,反应条件见下:Mix the amplified products from primer pool 1 and primer pool 2 of the same sample to a volume of 20uL, and add 2uL digestion reaction premix. The reaction conditions are as follows:
反应温度temperature reflex 反应时间Reaction time
50℃50℃ 10min10min
55℃55°C 10min10min
60℃60℃ 20min20min
10℃10℃ 保持Keep
(3)利用连接酶将样本1和2分别连接通用高通量测序接头AN1/PN1和AN2/PN2,连接酶为上海翊圣生物科技有限公司生产的Fast T4 DNA Ligase,连接缓冲液为上海翊圣生物科技有限公司生产的5×Fast Ligation Buffer。具体实施如下:(3) Use ligase to connect samples 1 and 2 to the universal high-throughput sequencing adapters AN1/PN1 and AN2/PN2, respectively. The ligase is Fast T4 DNA Ligase produced by Shanghai Yisheng Biotechnology Co., Ltd., and the ligation buffer is Shanghai Yi 5×Fast Ligation Buffer produced by Sheng Biotechnology Co., Ltd. The specific implementation is as follows:
按照下表配置连接反应液:Configure and connect the reaction solution according to the following table:
组分Component 反应体积Reaction volume
连接酶Ligase 2uL2uL
连接缓冲液Connection buffer 4uL4uL
通用高通量测序接头PN接头(10uM)Universal high-throughput sequencing adapter PN adapter (10uM) 1uL1uL
通用高通量测序接头AN接头(10uM)Universal high-throughput sequencing adapter AN adapter (10uM) 1uL1uL
酶切后的PCR产物PCR product after digestion 22uL22uL
总体积total capacity 30uL30uL
反应条件Reaction conditions
反应温度temperature reflex 反应时间Reaction time
22℃22°C 30min30min
68℃68°C 5min5min
72℃72°C 5min5min
10℃10℃ 保持Keep
(4)纯化及扩增,具体实施如下:(4) Purification and amplification, the specific implementation is as follows:
利用Ampure磁珠纯化连接后产物,对纯化后的产物进行PCR扩增。Use Ampure magnetic beads to purify the ligated product, and perform PCR amplification on the purified product.
反应体系reaction system
组分Component 反应体积Reaction volume
PCR MIXPCR MIX 25uL25uL
上游引物(5uM)Upstream primer (5uM) 5uL5uL
下游引物(5uM)Downstream primer (5uM) 5uL5uL
纯化后连接产物Ligation product after purification 20uL20uL
总体积total capacity 50uL50uL
反应条件Reaction conditions
Figure PCTCN2020092418-appb-000003
Figure PCTCN2020092418-appb-000003
(5)文库纯化及定量,具体实施如下:(5) Library purification and quantification, the specific implementation is as follows:
利用Ampure磁珠对扩增后的文库进行纯化,纯化后文库利用Agilent 2100及QUBIT 4.0对文库进行质检及定量。文库2100质检图谱见图2和图3,显示文库长度片段主峰在400bp附近且文库主峰呈单一尖锐单峰,结果表明原始基因片段两端已连接通用高通量测序接头。根据稀释倍数计算得到文库浓度,文库浓度高于1ng/uL可进行后续实验步骤,低于1ng/uL建库失败。Use Ampure magnetic beads to purify the amplified library. After the purified library, use Agilent 2100 and QUBIT 4.0 for quality inspection and quantification of the library. The library 2100 quality inspection map is shown in Figure 2 and Figure 3, showing that the main peak of the library length fragment is around 400bp and the main peak of the library is a single sharp single peak. The result shows that the two ends of the original gene fragments have been connected to a universal high-throughput sequencing adapter. The library concentration is calculated according to the dilution factor. The library concentration higher than 1ng/uL can be used for subsequent experimental steps, and the library construction fails when the library concentration is lower than 1ng/uL.
实施例3 基于Ion Torrent平台和Illumina平台的测序分析Example 3 Sequencing analysis based on the Ion Torrent platform and the Illumina platform
本实施例分别采用Ion Torrent平台和Illumina平台对上述高通量文库进行测序验证,具体如下:In this embodiment, the Ion Torrent platform and the Illumina platform are used to perform sequencing verification on the above-mentioned high-throughput library, as follows:
1、Ion Torrent平台Ion GeneStudio TM S5 Plus测序仪上机测序,具体实施步骤如下: 1. The Ion GeneStudio TM S5 Plus sequencer on the Ion Torrent platform is sequenced on the computer. The specific implementation steps are as follows:
将上述纯化并质检后文库稀释,利用Ion 520 TM&Ion 530 TM Kit–OT,按照试剂盒操作规程进行,在IonTouch 2仪器上进行模板制备后,Ion GeneStudio TM S5 Plus基因测序仪上进行测序及数据分析。 Dilute the library after purification and quality inspection, use Ion 520 TM & Ion 530 TM Kit–OT, and proceed according to the kit operating procedures. After template preparation on the IonTouch 2 instrument, the Ion GeneStudio TM S5 Plus gene sequencer is used for sequencing and data analysis.
2、Illumina平台Miseq DX测序仪上机测序,具体实施步骤如下:2. The Miseq DX sequencer on the Illumina platform is used for online sequencing. The specific implementation steps are as follows:
将上述纯化并质检后文库稀释,利用Miseq DX Reagent Kit v3,按照试剂盒操作规程进行,在Miseq DX基因测序仪上进行测序及数据分析。After the above-mentioned purification and quality inspection, the library was diluted, and Miseq DX Reagent Kit v3 was used to proceed in accordance with the kit operating procedures. Sequencing and data analysis were performed on the Miseq DX gene sequencer.
如下进行测序数据结果分析,具体如下:The results of sequencing data are analyzed as follows, and the details are as follows:
对于Ion GeneStudio TM S5 Plus平台: For Ion GeneStudio TM S5 Plus platform:
1、分析通用高通量测序文库浓度,文库片段长度分布浓度满足测序后续要求;1. Analyze the concentration of the general high-throughput sequencing library, and the fragment length distribution concentration of the library meets the subsequent requirements of sequencing;
2、分析Ion GeneStudio TM S5 Plus平台下机结果,主要包含≥Q20碱基数,读段数,读段平均读长,On Target,Uniformity。具体见下表: 2. Analyze the offline results of the Ion GeneStudio TM S5 Plus platform, which mainly include base number ≥Q20, number of reads, average read length, On Target, and Uniformity. See the table below for details:
Figure PCTCN2020092418-appb-000004
Figure PCTCN2020092418-appb-000004
3、以上两个样本读段平均长度≥200bp,表明样本所有样本均读通即待测目标片段首尾之间的碱基均可识别;Mean depth(平均测序深度)均≥500×,表明待测目标片段均被测序500次以上;On Target(中靶率)均≥95%,表明所测得碱基序列中有95%可比对到待测目标区域范围内;Uniformity(均一性)均≥90%,表明待测目标区域中每个读段扩增效率及连接通用高通量接头效率相近。以上参数均表明两样本待测目标区段两端成功连接通用高通量测序接头,且测序成功;表明连接通用高通量测序接头的文库可在Ion GeneStudio TM S5 Plus基因测序仪进行测序。 3. The average read length of the above two samples is ≥200bp, indicating that all samples in the sample are read through, that is, the bases between the beginning and the end of the target fragment to be tested can be identified; Mean depth (average sequencing depth) is ≥500×, indicating that the test is to be tested The target fragments have been sequenced more than 500 times; On Target (target rate) are ≥95%, indicating that 95% of the measured base sequences can be compared to the target area to be tested; Uniformity (uniformity) is ≥90 %, indicating that the amplification efficiency of each read in the target area to be tested and the efficiency of connecting the universal high-throughput adapter are similar. The above parameters all indicate that the two ends of the target segment to be tested are successfully connected to the universal high-throughput sequencing adapter, and the sequencing is successful; it indicates that the library connected to the universal high-throughput sequencing adapter can be sequenced on the Ion GeneStudio TM S5 Plus gene sequencer.
对于Miseq DX平台:For Miseq DX platform:
1、分析Miseq DX平台下机结果,主要包含数据产量,Reads数量,Q30百分比。见下表所示。1. Analyze the results of Miseq DX platform decommissioning, mainly including data output, the number of Reads, and the percentage of Q30. See the table below.
Figure PCTCN2020092418-appb-000005
Figure PCTCN2020092418-appb-000005
2、以上两样本数据产量均≥0.5G,Reads数据均≥3M,Q30占比≥75%,表明两样本测序成功;表明两样本待测目标区段两端成功连接通用高通量测序接头,且连接通用高通量测序接头的文库可在Miseq DX基因测序仪进行测序。2. The data output of the above two samples are both ≥0.5G, the Reads data are both ≥3M, and the proportion of Q30 is ≥75%, indicating that the two samples are successfully sequenced; indicating that the two samples are successfully connected to the universal high-throughput sequencing adapter at both ends of the target segment to be tested. And the library connected to the universal high-throughput sequencing adapter can be sequenced on the Miseq DX gene sequencer.
综上所述,利用本发明制备的测序接头建库,能够同时满足Ion GeneStudio TM S5 Plus平台和Miseq DX平台的测序要求,即同时满足Ion Torrent平台和Illumina平台两种主流测序平台的要求,因此,本发明的测序接头具备通用型建库接头属性。 In summary, the use of the sequencing adapters prepared by the present invention to build a library can meet the sequencing requirements of the Ion GeneStudio TM S5 Plus platform and the Miseq DX platform at the same time, that is, meet the requirements of the two mainstream sequencing platforms of the Ion Torrent platform and the Illumina platform at the same time. The sequencing adapter of the present invention has the properties of a universal library-building adapter.
另外,由于Ion Torrent测序平台内所有型号测序原理及过程一致,适用于Ion GeneStudio TM S5Plus测序仪的文库即可适用Ion Torrent平台其他型号测序仪,如PGM、Proton等。同理,Illumina测序平台内所有型号测序原理及过程一致,适用于Miseq DX基因测序仪的文库即可适用于Illumina平台其他型号测序仪,MiniSeq、NextSeq等。因此,可以明确,连接本发明的通用测序接头的文库可适用于Ion Torrent平台及Illumina平台所有型号测序仪。 In addition, since all models in the Ion Torrent sequencing platform have the same sequencing principles and processes, the library applicable to the Ion GeneStudio TM S5Plus sequencer can be applied to other Ion Torrent platform sequencers, such as PGM, Proton, etc. In the same way, all models in the Illumina sequencing platform have the same sequencing principles and processes, and the library applicable to the Miseq DX gene sequencer can be applied to other types of sequencers on the Illumina platform, such as MiniSeq, NextSeq, etc. Therefore, it can be clarified that the library connected with the universal sequencing adapter of the present invention can be applied to all types of sequencers on the Ion Torrent platform and the Illumina platform.
实施例4、低频突变真实性判断Example 4 Judgment of the authenticity of low-frequency mutations
本实施例进一步验证本发明测序接头在低频检测中的应用,具体提供一种低频突变真实性判断的检测方法,可校正index hopping引入的测序错误。技术线路图见图4所示,具体包含以下步骤:This embodiment further verifies the application of the sequencing adapter of the present invention in low-frequency detection, and specifically provides a detection method for judging the authenticity of low-frequency mutations, which can correct sequencing errors introduced by index hopping. The technical circuit diagram is shown in Figure 4, which specifically includes the following steps:
1、样本稀释1. Sample dilution
(1)样本为商品化肿瘤SNV 5%gDNA标准品(GW-OGTM005),利用商品化人基因组DNA(G304A)梯度稀释至突变频率为2.5%,1.25%,0.5%,命名为样本1,样本2,样本3,样本4。(1) The sample is a commercial tumor SNV 5% gDNA standard (GW-OGTM005), which is serially diluted with commercial human genomic DNA (G304A) to a mutation frequency of 2.5%, 1.25%, and 0.5%, named as sample 1, sample 2. Sample 3, Sample 4.
(2)利用Qubit 4.0进行双链DNA浓度测定。将DNA稀释至5ng/ul备用。(2) Use Qubit 4.0 to measure the concentration of double-stranded DNA. Dilute the DNA to 5ng/ul for use.
2、文库构建2. Library construction
待检目标区域为EGFR(L858R/T790M/△E746_△A750)/PIK3CA(E545K)/KRAS(G12D/G13D/A146T)/NRAS(Q61K)基因指定热点区域。目标检测区域的多重PCR引物池,采用Thermo Fisher公司的Ampliseq colon&lung panel。每个样本3重复。The target area to be inspected is the designated hot spot area of EGFR(L858R/T790M/△E746_△A750)/PIK3CA(E545K)/KRAS(G12D/G13D/A146T)/NRAS(Q61K) gene. The multiple PCR primer pool of the target detection area uses Thermo Fisher's Ampliseq colon&lung panel. 3 replicates for each sample.
(1)目标片段扩增,具体实施如下:(1) Target fragment amplification, the specific implementation is as follows:
组分Component 反应体积Reaction volume
多重PCR master mixMultiplex PCR master mix 4uL4uL
引物池Primer pool 10uL10uL
基因组DNA(5ng/uL)Genomic DNA (5ng/uL) 2uL2uL
无核酸酶水Nuclease-free water 4uL4uL
总体积total capacity 20uL20uL
反应条件Reaction conditions
Figure PCTCN2020092418-appb-000006
Figure PCTCN2020092418-appb-000006
(2)消化反应,具体实施如下:(2) Digestion reaction, the specific implementation is as follows:
在上述PCR产物中加入2uL消化反应预混液,反应条件见下:Add 2uL of the digestion reaction premix to the above PCR product, and the reaction conditions are as follows:
反应温度temperature reflex 反应时间Reaction time
50℃50℃ 10min10min
55℃55°C 10min10min
60℃60℃ 20min20min
10℃10℃ 保持Keep
(3)连接通用高通量测序接头:(3) Connect the universal high-throughput sequencing adapter:
I.制备通用高通量测序接头组:所述高通量测序接头组采用实施例1所述的PN1/AN1和PN2/AN2,如下以PN2/AN2试验数据为例,制备4组接头组,其中样本序列标签分别为ATCACG;CGATGT;TTAGGC;TGACCA,具体制备方法参见实施例1。I. Preparation of universal high-throughput sequencing adapter set: The high-throughput sequencing adapter set adopts the PN1/AN1 and PN2/AN2 described in Example 1. Taking the PN2/AN2 test data as an example, 4 sets of adapter sets are prepared as follows: The sample sequence tags are ATCACG; CGATGT; TTAGGC; TGACCA. For specific preparation methods, refer to Example 1.
II.连接通用高通量测序接头,具体实施如下:II. Connect the universal high-throughput sequencing adapter, the specific implementation is as follows:
反应体系reaction system
组分Component 反应体积Reaction volume
连接酶Ligase 2uL2uL
连接缓冲液Connection buffer 4uL4uL
通用高通量测序接头P接头(10uM)Universal high-throughput sequencing adapter P adapter (10uM) 1uL1uL
通用高通量测序接头A接头(10uM)Universal high-throughput sequencing adapter A adapter (10uM) 1uL1uL
酶切后的PCR产物PCR product after digestion 22uL22uL
总体积total capacity 30uL30uL
反应条件Reaction conditions
反应温度temperature reflex 反应时间Reaction time
22℃22°C 30min30min
68℃68°C 5min5min
72℃72°C 5min5min
10℃10℃ 保持Keep
(4)纯化及扩增,具体实施如下:(4) Purification and amplification, the specific implementation is as follows:
利用Ampure磁珠纯化连接后产物,对纯化后的产物进行PCR扩增。Use Ampure magnetic beads to purify the ligated product, and perform PCR amplification on the purified product.
反应体系reaction system
组分Component 反应体积Reaction volume
PCR MIXPCR MIX 25uL25uL
上游引物(5uM)Upstream primer (5uM) 5uL5uL
下游引物(5uM)Downstream primer (5uM) 5uL5uL
纯化后连接产物Ligation product after purification 20uL20uL
总体积total capacity 50uL50uL
反应条件Reaction conditions
Figure PCTCN2020092418-appb-000007
Figure PCTCN2020092418-appb-000007
Figure PCTCN2020092418-appb-000008
Figure PCTCN2020092418-appb-000008
(5)文库纯化及定量,具体实施如下:(5) Library purification and quantification, the specific implementation is as follows:
利用Ampure磁珠对扩增后的文库进行纯化,纯化后文库利用QUBIT 4.0进行定量。The amplified library was purified using Ampure magnetic beads, and the purified library was quantified using QUBIT 4.0.
根据稀释倍数计算得到文库浓度,文库浓度高于1ng/uL可进行后续实验步骤,低于1ng/uL建库失败。The library concentration is calculated according to the dilution factor. The library concentration higher than 1ng/uL can be used for subsequent experimental steps, and the library construction fails when the library concentration is lower than 1ng/uL.
3、Illumina平台Miseq DX测序仪上机测序,具体实施步骤如下:3. The Miseq DX sequencer on the Illumina platform for online sequencing, the specific implementation steps are as follows:
将上述纯化并质检后文库稀释,利用Miseq DX Reagent Kit v3,按照试剂盒操作规程进行,在Miseq DX基因测序仪上进行测序及数据分析。After the above-mentioned purification and quality inspection, the library was diluted, and Miseq DX Reagent Kit v3 was used to proceed in accordance with the kit operating procedures. Sequencing and data analysis were performed on the Miseq DX gene sequencer.
4、测序数据分析,主要包含以下内容:4. Sequencing data analysis mainly includes the following contents:
(1)利用标签序列识别同一样本来源的数据,将具有相同标签序列的测序数识别为同一样本来源的数据;(1) Use tag sequences to identify data from the same sample source, and identify sequencing numbers with the same tag sequence as data from the same sample source;
(2)对于上述归入同一样本来源的测序数据,进一步利用测序标签序列与通用高通量测序接头AN接头双链端标签序列碱基序列构成一致识别样本交叉污染及标签跳跃(index hopping)引入的测序错误。(2) For the above-mentioned sequencing data classified into the same sample source, further use the sequencing tag sequence and the universal high-throughput sequencing adapter AN adapter double-stranded end tag sequence base sequence to form a consistent identification of sample cross-contamination and tag hopping (index hopping) introduction The sequencing error.
(3)对于上述归入同一样本来源的测序数据,进一步利用突变位点正义链,反义链应含有相同碱基构成的标签序列,即AN端与PN端应含有相同碱基构成的标签序列。(3) For the above-mentioned sequencing data classified into the same sample source, further use the sense strand of the mutation site, and the antisense strand should contain the tag sequence composed of the same base, that is, the AN end and the PN end should contain the tag sequence composed of the same base .
具体结果见下表:The specific results are shown in the table below:
样本1(预期突变频率5%)Sample 1 (expected mutation frequency 5%)
基因Gene 突变位点Mutation site 突变频率Mutation frequency 正向读段数Number of forward reads 负向读段数Number of negative reads
EGFR EGFR L858RL858R 5%5% 198198 201201
EFGREFGR T790MT790M 5.5%5.5% 213213 187187
EGFREGFR ΔE746_A750ΔE746_A750 4.7%4.7% 217217 168168
PIK3CAPIK3CA E545KE545K 6.8%6.8% 204204 196196
KRASKRAS G12DG12D 5.3%5.3% 199199 200200
KRASKRAS G13DG13D 4.5%4.5% 184184 216216
KRASKRAS A146TA146T 7%7% 214214 186186
NRASNRAS Q61KQ61K 4.3%4.3% 204204 193193
样本2(预期突变频率2.5%)Sample 2 (Expected mutation frequency 2.5%)
基因Gene 突变位点Mutation site 突变频率Mutation frequency 正向读段数Number of forward reads 负向读段数Number of negative reads
EGFREGFR L858RL858R 2.8%2.8% 207207 191191
EFGREFGR T790MT790M 1.3%1.3% 212212 187187
EGFREGFR ΔE746_A750ΔE746_A750 2.1%2.1% 244244 139139
PIK3CAPIK3CA E545KE545K 3.8%3.8% 177177 223223
KRASKRAS G12DG12D 2.5%2.5% 188188 212212
KRASKRAS G13DG13D 2.2%2.2% 197197 203203
KRASKRAS A146TA146T 4%4% 211211 189189
NRASNRAS Q61KQ61K 1.3%1.3% 199199 197197
样本3(预期突变频率1.25%)Sample 3 (expected mutation frequency 1.25%)
基因Gene 突变位点Mutation site 突变频率Mutation frequency 正向读段数Number of forward reads 负向读段数Number of negative reads
EGFREGFR L858RL858R 1.5%1.5% 216216 180180
EFGREFGR T790MT790M 2.5%2.5% 245245 152152
EGFREGFR ΔE746_A750ΔE746_A750 1%1% 249249 143143
PIK3CAPIK3CA E545KE545K 2%2% 225225 175175
KRASKRAS G12DG12D 1.3%1.3% 199199 199199
KRASKRAS G13DG13D 1.8%1.8% 192192 208208
KRASKRAS A146TA146T 1%1% 203203 197197
NRASNRAS Q61KQ61K 1.8%1.8% 194194 204204
样本4(预期突变频率0.5%)Sample 4 (Expected mutation frequency 0.5%)
基因Gene 突变位点Mutation site 突变频率Mutation frequency 正向读段数Number of forward reads 负向读段数Number of negative reads
EGFREGFR L858RL858R 1.5%1.5% 216216 181181
EFGREFGR T790MT790M 0.5%0.5% 245245 155155
EGFREGFR ΔE746_A750ΔE746_A750 0.3%0.3% 227227 154154
PIK3CAPIK3CA E545KE545K 1%1% 218218 182182
KRASKRAS G12DG12D 0.5%0.5% 182182 218218
KRASKRAS G13DG13D 1.5%1.5% 200200 200200
KRASKRAS A146TA146T 0.8%0.8% 205205 195195
NRASNRAS Q61KQ61K 0.5%0.5% 217217 182182
在文库构建过程中采用通用高通量测序接头,测序完成后,对得到的测序数据进行分析。首先利用标签序列识别同一样本来源数据,将样本拆分为4个突变频率的样本1,样本2,样本3,样本4。然后识别读段接头双链部分标签序列与样本标签序列是否相同,排除index hopping问题。而后通过对带有同一标签序列的正向读段及负向读段是否带有相同的突变位点进一步识别突变位点的真实性。而对于只存在正向读段或负向读段的突变或读段内标签序列与样本标签不一致的突变加以排除,从而实现低频突变的正确识别。In the library construction process, a universal high-throughput sequencing adapter is used. After the sequencing is completed, the obtained sequencing data is analyzed. First, the tag sequence is used to identify the source data of the same sample, and the sample is divided into four mutation frequencies of sample 1, sample 2, sample 3, and sample 4. Then identify whether the double-stranded partial tag sequence of the read linker is the same as the sample tag sequence, and eliminate the index hopping problem. Then, the authenticity of the mutation site is further recognized by whether the positive read and negative read with the same tag sequence have the same mutation site. However, mutations in which only positive or negative reads exist, or mutations in which the tag sequence in the read is inconsistent with the sample tag are excluded, so as to realize the correct identification of low-frequency mutations.
5、结果表明:通过采用本发明的通用高通量测序接头进行建库测序,能够有效检测频率低于5%的低频突变,对于突变频率为0.5%的低频突变也可有效检出,进一步降低了低频突变的检测限。5. The results show that by using the universal high-throughput sequencing adapter of the present invention for library building sequencing, low-frequency mutations with a frequency of less than 5% can be effectively detected, and low-frequency mutations with a mutation frequency of 0.5% can also be effectively detected, further reducing The detection limit of low-frequency mutations is improved.
以上对本申请具体实施方式的描述并不限制本申请,本领域技术人员可以根据本申请做出各种改变或变形,只要不脱离本申请的精神,均应属于本申请所附权利要求的范围。The above description of the specific embodiments of the application does not limit the application, and those skilled in the art can make various changes or modifications according to the application, as long as they do not deviate from the spirit of the application, they shall fall within the scope of the appended claims of the application.

Claims (24)

  1. 一种Y型高通量测序接头,其特征在于,所述测序接头包括第一单链和第二单链;A Y-type high-throughput sequencing adapter, characterized in that the sequencing adapter includes a first single strand and a second single strand;
    所述第一单链和第二单链分别包含:The first single strand and the second single strand respectively include:
    1)自由臂,1) Free arm,
    2)双链互补区,其中,2) Double-stranded complementary region, where,
    所述自由臂包含文库扩增引物结合区和载体结合区;The free arm includes a library amplification primer binding region and a carrier binding region;
    所述双链互补区中包含两种或两种以上测序平台的测序引物结合区。The double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
  2. 权利要求1所述的高通量测序接头,其特征在于,所述第一单链和第二单链的自由臂序列不互补,所述第一单链和第二单链经退火可形成Y型结构双链。The high-throughput sequencing adapter of claim 1, wherein the free arm sequences of the first single strand and the second single strand are not complementary, and the first single strand and the second single strand can be annealed to form Y Double-stranded structure.
  3. 权利要求1-2任一所述的高通量测序接头,其特征在于,所述双链互补区中包含标签序列,所述标签序列位于双链互补区远离自由臂一端。The high-throughput sequencing adapter according to any one of claims 1-2, wherein the double-stranded complementary region contains a tag sequence, and the tag sequence is located at the end of the double-stranded complementary region away from the free arm.
  4. 权利要求1-3任一所述的高通量测序接头,其特征在于,所述测序平台包括但不限于Illumina、Ion Torrent、PacBio、Roche、Helicos和ABI平台;优选的,所述测序平台为Ion Torrent和Illumina平台。The high-throughput sequencing adapter according to any one of claims 1 to 3, wherein the sequencing platform includes but not limited to Illumina, Ion Torrent, PacBio, Roche, Helicos, and ABI platforms; preferably, the sequencing platform is Ion Torrent and Illumina platforms.
  5. 权利要求1-4任一所述的高通量测序接头,其特征在于,所述第二单链自由臂中还包含标签序列。The high-throughput sequencing adapter of any one of claims 1 to 4, wherein the second single-stranded free arm further contains a tag sequence.
  6. 权利要求5所述的高通量测序接头,其特征在于,所述自由臂中的标签序列与双链互补区中标签序列相同;优选的,所述自由臂中标签序列靠近双链互补区端。The high-throughput sequencing adapter of claim 5, wherein the tag sequence in the free arm is the same as the tag sequence in the double-stranded complementary region; preferably, the tag sequence in the free arm is close to the end of the double-stranded complementary region. .
  7. 权利要求1-6任一所述的高通量测序接头,其特征在于,所述第一单链和第二单链的双链互补区长度为40-58bp;所述第一单链自由臂长度为30-45bp,所述第二单链自由臂长度为35-56bp;所述标签序列为6~12bp的随机碱基组成。The high-throughput sequencing adapter according to any one of claims 1-6, wherein the length of the double-strand complementary region of the first single-strand and the second single-strand is 40-58 bp; the free arm of the first single-strand The length is 30-45bp, the length of the second single-stranded free arm is 35-56bp; the tag sequence is composed of random bases of 6-12bp.
  8. 权利要求1-7任一所述的高通量测序接头,其特征在于,所述第一或第二单链的自由臂3’末端进行稳定性修饰;优选的,进行硫代修饰;更优选的,在3’末端最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。The high-throughput sequencing adaptor according to any one of claims 1-7, wherein the 3'end of the free arm of the first or second single strand is modified for stability; preferably, thiomodified; more preferably Yes, the phosphodiester bond between the last 3 bases at the 3'end is replaced by phosphorothioate.
  9. 权利要求1-8任一所述的高通量测序接头,其特征在于,所述第一单链序列如下:The high-throughput sequencing adapter of any one of claims 1-8, wherein the first single-stranded sequence is as follows:
    自由臂序列:Free arm sequence:
    5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;
    双链互补区域序列:Double-stranded complementary region sequence:
    所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
    3’3’
    所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
  10. 权利要求9所述的高通量测序接头,其特征在于,所述第二单链序列如下:The high-throughput sequencing adapter of claim 9, wherein the second single-stranded sequence is as follows:
    自由臂序列:Free arm sequence:
    5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’;5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’;
    双链互补区域序列:Double-stranded complementary region sequence:
    5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGNNNNNNNNNNATCGGAAGAGC-3’;5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGNNNNNNNNNNATCGGAAGAGC-3’;
    所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
    所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
  11. 一种高通量测序接头组,其特征在于,所述测序接头组包括权利要求1-10任一所述的高通量测序接头。A high-throughput sequencing adapter set, characterized in that the sequencing adapter set comprises the high-throughput sequencing adapter according to any one of claims 1-10.
  12. 权利要求11所述的高通量测序接头组,其特征在于,所述高通量测序接头组还包括如下Y型高通量测序接头:The high-throughput sequencing adapter set of claim 11, wherein the high-throughput sequencing adapter set further comprises the following Y-type high-throughput sequencing adapters:
    所述Y型高通量测序接头包含第三、第四单链,其序列仅双链互补区序列与权利要求1-10所述高通量测序接头序列不同;The Y-type high-throughput sequencing adapter comprises a third and a fourth single strand, and its sequence is only different from the sequence of the high-throughput sequencing adapter of claims 1-10 only in the sequence of the double-strand complementary region;
    其中,所述第三单链的双链互补区域序列如下:Wherein, the sequence of the double-strand complementary region of the third single-strand is as follows:
    5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;
    所述第四单链的双链互补区序列与第三单链双链互补区序列互补;The sequence of the double-strand complementary region of the fourth single-stranded is complementary to the sequence of the third single-stranded double-strand complementary region;
    所述“XXXXXX”表示6~12个随机碱基组成的标签序列;The "XXXXXX" represents a tag sequence composed of 6-12 random bases;
    所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。The "N" represents any base of A, T, C, G, or NA (no base).
  13. 权利要求12所述的高通量测序接头组,其特征在于,所述Y型高通量测序接头的单链序列分别如下:The high-throughput sequencing adapter set of claim 12, wherein the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
    第一单链序列:The first single-stranded sequence:
    5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
    第二单链序列:The second single-stranded sequence:
    5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’;5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’;
    第三单链序列:The third single-stranded sequence:
    5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
    第四单链序列:Fourth single-stranded sequence:
    5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’。5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’.
  14. 权利要求12所述的高通量测序接头组,其特征在于,所述Y型高通量测序接头的单链序列分别如下:The high-throughput sequencing adapter set of claim 12, wherein the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
    第一单链序列:The first single-stranded sequence:
    5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
    第二单链序列:The second single-stranded sequence:
    5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTTTG-3’;5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTTTG-3’;
    第三单链序列:The third single-stranded sequence:
    5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
    第四单链序列:Fourth single-stranded sequence:
    5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTTTG-3’。5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTTTG-3’.
  15. 一种组合物,其特征在于,所组合物包含权利要求1-10任一所述的高通量测序接头,或权利要求11-14任一所述的高通量测序接头组。A composition, characterized in that the composition comprises the high-throughput sequencing adapter according to any one of claims 1-10, or the high-throughput sequencing adapter set according to any one of claims 11-14.
  16. 一种复合物,其特征在于,所复合物连接于权利要求1-10任一所述的高通量测序接头,或权利要求11-14任一所述的高通量测序接头组。A complex, characterized in that the complex is connected to the high-throughput sequencing adapter according to any one of claims 1-10, or the high-throughput sequencing adapter set according to any one of claims 11-14.
  17. 一种试剂盒,其特征在于,所组合物包含权利要求1-10任一所述的高通量测序接头,或权利要求11-14任一所述的高通量测序接头组。A kit, characterized in that the composition comprises the high-throughput sequencing adapter according to any one of claims 1-10, or the high-throughput sequencing adapter set according to any one of claims 11-14.
  18. 权利要求17所述的试剂盒,其特征在于,所述试剂盒为高通量测序建库试剂盒或基因序列富集试剂盒。The kit of claim 17, wherein the kit is a high-throughput sequencing library building kit or a gene sequence enrichment kit.
  19. 权利要求1-10任一所述高通量测序接头的制备方法,其特征在于,包括如下步骤:The preparation method of the high-throughput sequencing adapter according to any one of claims 1-10, characterized in that it comprises the following steps:
    S1分别合成第一链和第二链单链序列;S1 synthesizes the first strand and the second strand single-stranded sequence respectively;
    S2将S1所述两条单链序列进行特异性退火,得到所述高通量测序接头。S2 specifically anneals the two single-stranded sequences of S1 to obtain the high-throughput sequencing adapter.
  20. 一种测序文库的构建方法,其特征在于,A method for constructing a sequencing library, which is characterized in that:
    S1制备待测样本目标片段;S1 prepares the target fragment of the sample to be tested;
    S2将权利要求1-10任一所述高通量测序接头或权利要求11-14任一所述的高通量测序接头组连接于S1的目标片段获得连接产物;S2 connect the high-throughput sequencing adapter of any one of claims 1-10 or the high-throughput sequencing adapter set of any one of claims 11-14 to the target fragment of S1 to obtain a ligation product;
    S3扩增S1连接产物,纯化后获得所述待测样本的测序文库。S3 amplifies the S1 ligation product, and obtains the sequencing library of the sample to be tested after purification.
  21. 一种基因低频突变的检测方法,且特征在于,包括如下步骤:A method for detecting low-frequency gene mutations, and is characterized in that it comprises the following steps:
    S1制备权利要求1-10任一所述高通量测序接头或权利要求11-14任一所述的高通量测序接头组,针对同一样本,所述标签序列相同;S1 prepares the high-throughput sequencing adapter according to any one of claims 1-10 or the high-throughput sequencing adapter set according to any one of claims 11-14, and the tag sequence is the same for the same sample;
    S2对待测样本进行目标片段扩增,消化引物;S2 performs target fragment amplification on the sample to be tested and digests the primers;
    S3将S2所述消化产物连接S1所述中高通量测序接头或接头组,获得连接产物,扩增连接产物,纯化后获得测序文库;S3 connects the digested product of S2 to the mid-to-high-throughput sequencing adapter or adapter set of S1 to obtain a ligation product, amplify the ligation product, and obtain a sequencing library after purification;
    S4将S3所述测序文库进行测序,根据高通量测序接头的标签序列校正所述测序数据,基于矫正后的测序数据进行突变分析。S4 sequence the sequencing library of S3, correct the sequencing data according to the tag sequence of the high-throughput sequencing adapter, and perform mutation analysis based on the corrected sequencing data.
  22. 权利要求21所述的基因低频突变的检测方法,且特征在于,所述步骤S4中的突变分析为:基于某一特定突变在同一读段的正义链和反义链均出现则判定为真低频突变。The method for detecting low-frequency gene mutations according to claim 21, and is characterized in that the mutation analysis in step S4 is: based on the fact that a specific mutation appears in both the sense strand and the anti-sense strand of the same read, it is determined as true low frequency. mutation.
  23. 权利要求21-22任一所述的基因低频突变的检测方法,且特征在于,所述待测样本为基因组DNA。The method for detecting low-frequency gene mutations according to any one of claims 21-22, wherein the sample to be tested is genomic DNA.
  24. 权利要求1-10任一所述高通量测序接头、权利要求11-14任一所述的高通量测序接头组,权利要求15所述组合物,权利要求16所述复合物或权利要求17-18所述的试剂盒的如下应用:The high-throughput sequencing adapter according to any one of claims 1-10, the high-throughput sequencing adapter set according to any one of claims 11-14, the composition according to claim 15, the complex according to claim 16 or claims The following applications of the kit described in 17-18:
    a、在测序文库构建中或制备测序文库的产品中的应用;a. Application in the construction of sequencing libraries or in the preparation of sequencing library products;
    b、在高通量测序中或在制备高通量测序产品中的应用;b. Application in high-throughput sequencing or in the preparation of high-throughput sequencing products;
    c、在基因低频突变检测中或在制备基因低频突变检测产品中的应用;c. Application in gene low-frequency mutation detection or in the preparation of gene low-frequency mutation detection products;
    d、在体外诊断或在制备体外诊断产品中的应用;d. Application in in vitro diagnostics or in the preparation of in vitro diagnostic products;
    e、在用于目标基因或扩增富集中的应用。e. In the application of target gene or amplification enrichment.
PCT/CN2020/092418 2020-05-14 2020-05-26 Universal high-throughput sequencing adapter and application thereof WO2021227129A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010407833.5A CN111471754B (en) 2020-05-14 2020-05-14 Universal high-throughput sequencing joint and application thereof
CN202010407833.5 2020-05-14

Publications (1)

Publication Number Publication Date
WO2021227129A1 true WO2021227129A1 (en) 2021-11-18

Family

ID=71759877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092418 WO2021227129A1 (en) 2020-05-14 2020-05-26 Universal high-throughput sequencing adapter and application thereof

Country Status (2)

Country Link
CN (1) CN111471754B (en)
WO (1) WO2021227129A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831233A (en) * 2023-02-07 2023-03-21 杭州联川基因诊断技术有限公司 mTag-based targeted sequencing data preprocessing method, equipment and medium
WO2023092872A1 (en) * 2021-11-26 2023-06-01 广州达安基因股份有限公司 High-throughput sequencing method based on internal reference of known tag
WO2023092601A1 (en) * 2021-11-29 2023-06-01 京东方科技集团股份有限公司 Umi molecular tag and application, adapter, adapter ligation reagent, and kit thereof, and library construction method

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112301432B (en) * 2020-12-29 2021-04-06 北京贝瑞和康生物技术有限公司 Method and kit for constructing whole genome high-throughput sequencing library
CN115029425B (en) * 2022-05-26 2023-04-18 北京爱普益生物科技有限公司 High-throughput sequencing STR detection kit compatible with various sequencing platforms and application thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107858414A (en) * 2017-10-18 2018-03-30 广州漫瑞生物信息技术有限公司 A kind of high-flux sequence joint, its preparation method and its application in ultralow frequency abrupt climatic change
CN111118001A (en) * 2019-12-31 2020-05-08 苏州贝康医疗器械有限公司 Universal joint for multiple sequencing platforms, library construction method suitable for multiple sequencing platforms and kit

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201615486D0 (en) * 2016-09-13 2016-10-26 Inivata Ltd Methods for labelling nucleic acids
CN108893466B (en) * 2018-06-04 2021-04-13 上海奥根诊断技术有限公司 Sequencing joint, sequencing joint group and detection method of ultralow frequency mutation
CN110827920B (en) * 2018-08-14 2022-11-22 武汉华大医学检验所有限公司 Sequencing data analysis method and equipment and high-throughput sequencing method
CN110257480A (en) * 2019-07-04 2019-09-20 北京京诺玛特科技有限公司 Nucleic acid sequence sequence measuring joints and its method for constructing sequencing library
CN110734908B (en) * 2019-11-15 2021-06-08 福州福瑞医学检验实验室有限公司 Construction method of high-throughput sequencing library and kit for library construction
CN111073961A (en) * 2019-12-20 2020-04-28 苏州赛美科基因科技有限公司 High-throughput detection method for gene rare mutation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107858414A (en) * 2017-10-18 2018-03-30 广州漫瑞生物信息技术有限公司 A kind of high-flux sequence joint, its preparation method and its application in ultralow frequency abrupt climatic change
CN111118001A (en) * 2019-12-31 2020-05-08 苏州贝康医疗器械有限公司 Universal joint for multiple sequencing platforms, library construction method suitable for multiple sequencing platforms and kit

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023092872A1 (en) * 2021-11-26 2023-06-01 广州达安基因股份有限公司 High-throughput sequencing method based on internal reference of known tag
WO2023092601A1 (en) * 2021-11-29 2023-06-01 京东方科技集团股份有限公司 Umi molecular tag and application, adapter, adapter ligation reagent, and kit thereof, and library construction method
CN115831233A (en) * 2023-02-07 2023-03-21 杭州联川基因诊断技术有限公司 mTag-based targeted sequencing data preprocessing method, equipment and medium

Also Published As

Publication number Publication date
CN111471754A (en) 2020-07-31
CN111471754B (en) 2021-01-29

Similar Documents

Publication Publication Date Title
WO2021227129A1 (en) Universal high-throughput sequencing adapter and application thereof
CN108893466B (en) Sequencing joint, sequencing joint group and detection method of ultralow frequency mutation
CN107190329B (en) Fusion based on DNA is quantitatively sequenced and builds library, detection method and its application
US11286524B2 (en) Multi-position double-tag connector set for detecting gene mutation and preparation method therefor and application thereof
WO2019114146A1 (en) Method for enriching gene target regions and library construction kit
CN105442054B (en) The method that storehouse is built in the amplification of multiple target site is carried out to plasma DNA
CN109971827B (en) Method and kit for constructing blood plasma DNA library
CN109844137B (en) Barcoded circular library construction for identification of chimeric products
WO2019144582A1 (en) Probe and method for high-throughput sequencing targeted capture target region used for detecting gene mutations as well as known and unknown gene fusion types
CN110643680B (en) Joint suitable for ultra-trace DNA sequencing and application thereof
CN110036117A (en) Increase the method for the treating capacity of single-molecule sequencing by multi-joint short dna segment
CN106939344B (en) Linker for next generation sequencing
CN113502287A (en) Molecular tag joint and construction method of sequencing library
CN113005121A (en) Linker elements, kits and uses related thereto
CN110004225B (en) Tumor chemotherapeutic drug individualized gene detection kit, primers and method
CN110564838A (en) Multiplex PCR primer system for neonatal glycogen accumulation disease genotyping and application thereof
CN113249437A (en) Library construction method for sRNA sequencing
CN116445581A (en) Preparation method of oligodendroglioma related gene high-throughput amplicon library, multiple PCR primer pair and application
CN113337501B (en) Hairpin type joint and application thereof in double-end index library construction
CN113046835A (en) Sequencing library construction method for detecting lentivirus insertion site and lentivirus insertion site detection method
CN111808855B (en) Construction method of universal gene detection library for hereditary familial hypercholesterolemia and kit thereof
CN108728515A (en) A kind of analysis method of library construction and sequencing data using the detection ctDNA low frequencies mutation of duplex methods
CN116246704B (en) System for noninvasive prenatal detection of fetuses
CN110423805B (en) Multiplex PCR primer system for genotyping of newborn mucopolysaccharidoses and use thereof
WO2020232635A1 (en) Method and system for constructing sequencing library on the basis of methylated dna target region, and use thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20935873

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20935873

Country of ref document: EP

Kind code of ref document: A1