WO2018041062A1 - 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用 - Google Patents

一种用于检测基因突变的多定位双标签接头组及其制备方法和应用 Download PDF

Info

Publication number
WO2018041062A1
WO2018041062A1 PCT/CN2017/099255 CN2017099255W WO2018041062A1 WO 2018041062 A1 WO2018041062 A1 WO 2018041062A1 CN 2017099255 W CN2017099255 W CN 2017099255W WO 2018041062 A1 WO2018041062 A1 WO 2018041062A1
Authority
WO
WIPO (PCT)
Prior art keywords
linker
double
joint
seq
sequence
Prior art date
Application number
PCT/CN2017/099255
Other languages
English (en)
French (fr)
Inventor
金保雷
李旭超
林清华
施伟杰
葛会娟
阮力
Original Assignee
厦门艾德生物医药科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 厦门艾德生物医药科技股份有限公司 filed Critical 厦门艾德生物医药科技股份有限公司
Priority to JP2018558751A priority Critical patent/JP6830496B2/ja
Priority to US16/322,340 priority patent/US11286524B2/en
Priority to EP17845364.3A priority patent/EP3505640A4/en
Publication of WO2018041062A1 publication Critical patent/WO2018041062A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1068Template (nucleic acid) mediated chemical library synthesis, e.g. chemical and enzymatical DNA-templated organic molecule synthesis, libraries prepared by non ribosomal polypeptide synthesis [NRPS], DNA/RNA-polymerase mediated polypeptide synthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • the invention relates to the technical field of nucleic acid sequencing, in particular to a multi-position double-label joint set for detecting gene mutation, a preparation method and application thereof.
  • the current second-generation sequencing is due to sample preparation (library preparation) and the instrument system itself (oxidative damage or deamination damage of DNA itself, mutations introduced by PCR enzymes during database construction, and instrument reading when sequencing) When the error is introduced, etc.), the probability of each base generated by sequencing is between 1/1000-1/100, that is, 1 to 10 error bases appear every 1000 bases.
  • the ratio of the generated mutation sites in the sample is only 0%, 50% and 100%. Therefore, the systematic base reading error can be the same in the data analysis.
  • the region's overlap reads are corrected to achieve high sequencing accuracy;
  • the UMI unique molecule identifier developed later can effectively solve this problem.
  • each molecule is uniquely labeled, and then each molecule is amplified and finally sequenced during the database construction process, and can be removed by bioinformatics analysis.
  • the mutations (errors) generated during partial database construction and sequencing reduced the base error rate of sequencing to 1 ⁇ 10 -5 . Assuming that the tumor mutation detection requires 10 times the signal-to-noise ratio, this method can accurately detect 1 ⁇ 10. -4 mutation rate.
  • Another object of the present invention is to provide a method of preparing the above-described multi-positioning dual-label joint set.
  • a multi-position dual-label connector set for detecting genetic mutations including dual-label joint A and double-label joint B And double-labeled linker C, double-labeled linker A, double-labeled linker B and double-labeled linker C are respectively modified with a biotin-based linker primer P7-A, a linker primer P7-B and a linker primer by a linker primer P5 and a 5' end, respectively.
  • P7-C is synthesized, among which:
  • the linker primer P5 is obtained by SEQ ID NO: 01 linking the sequence shown by SEQ ID NO: 02 with an I5 index sequence.
  • the linker primer P7-A is FFFFFEEEEEJJJJJNNNNNNNNNNNNNNNNNNNN, which is ligated to the 5' end of SEQ ID NO: 03, and SEQ ID NO: 03 is further ligated to the sequence shown by SEQ ID NO: 04 by the I7 index sequence;
  • the linker primer P7-B is FFFFFEEEEEKKKKKNNNNNNNNNNNNNN, which is ligated to the 5' end of SEQ ID NO: 03, and SEQ ID NO: 03 is further ligated to the sequence shown by SEQ ID NO: 04 by the I7 index sequence;
  • the linker primer P7-C is FFFFFEEEEELLLLLNNNNNNNNNNNN, which is ligated to the 5' end of SEQ ID NO: 03, and SEQ ID NO: 03 is further ligated to the sequence shown by SEQ ID NO: 04 by the I7 index sequence;
  • the FFFFF is a restriction enzyme protection base
  • EEEEE is a restriction site
  • JJJJJJ, KKKKK and LLLLL are positioning tag sequences and JJJJJ, KKKKK and LLLLL are different
  • NNNNNNNNNNNN is a random molecular tag sequence
  • FFFFF, JJJJJ, KKKKK LLLLL and EEEEE include, but are not limited to, five identical bases, the I7index sequence is 6-8 bases
  • NNNNNNNNNNNNNN is 4 to 12 random bases, and there are no four consecutive identical bases.
  • the NNNNNNNNNNNNNN is represented as BDHVBDHV, wherein B indicates that the position is a base other than A, the D position indicates that the position is a base other than C, and H indicates that the position is G except The outer base, V, indicates that the position is a base other than T.
  • the I5index sequence is selected from the group consisting of SEQ ID NOs: 05 to 12; the I7index sequence is selected from the group consisting of SEQ ID NOs: 13 to 23; and the sequences of the JJJJJ, KKKKK and LLLLL can be The sequences of EEEEE overlap or overlap completely. When partially or completely overlapping, the bases of the overlapping parts appear only once.
  • Annealing Mixing the primer primer P5, the linker primer P7-A, the linker primer P7-B, the linker primer P7-C, and the buffer with an appropriate amount of deionized water, and then annealing to obtain annealed joint A, annealed joint B, and Annealed joint C;
  • extension annealing joint the obtained annealing joint A, annealing joint B and annealing joint C polymerase extension to obtain extended joint A, extended joint B and extended joint C;
  • the first precipitation the obtained extension joint A, the extension joint B and the extension joint C are respectively subjected to ethanol or isopropanol precipitation purification to obtain the purified extension joint A, the extension joint B and the extension joint C;
  • the second precipitation the obtained enzyme cleavage linker A, the enzyme cleavage linker B and the enzyme cleavage linker C are purified by ethanol or isopropanol precipitation to obtain a double-labeled linker A, a double-labeled linker B and a double-labeled linker C;
  • Biotin purification biotin affinity purification of the double tag linker A, the double tag linker B and the double tag linker C obtained in the step (5);
  • the multi-positioned double-labeled junction group is obtained by subjecting the product obtained in the step (6) to ethanol or isopropanol precipitation purification.
  • a library construction method comprises: after 10 ng to 1 ug of the DNA to be detected is broken into a DNA fragment of 200-500 bp, the DNA fragment is added to the terminal repair enzyme for end repair, and the A tail is added, and the multi-position double-label joint group is added. Connection, after the connection is completed, use Ampure magnetic beads or cut glue for 340-660 bp fragment selection.
  • a sequencing method comprising the following steps:
  • a method of determining a nucleic acid sequence comprising the steps of:
  • the method for determining the result includes the following steps:
  • I5index sequence A part of the I5index sequence, the I7index sequence and the EEEEE sequence are listed in Table 1 and Table 2; however, it is not limited thereto.
  • I5 index sequence code I5 index sequence I7 index sequence code I7 index sequence I501 (SEQ ID NO: 05) TATAGCCT I701 (SEQ ID NO: 12) ATTACTCG I502 (SEQ ID NO: 06) ATAGAGGC I702 (SEQ ID NO: 13) TCCGGAGA I503 (SEQ ID NO: 07) CCTATCCT I703 (SEQ ID NO: 14) CGCTCATT I504 (SEQ ID NO: 08) GGCTCTGA I704 (SEQ ID NO: 15) GAGATTCC I505 (SEQ ID NO: 09) AGGCGAAG I705 (SEQ ID NO: 16) ATTCAGAA I506 (SEQ ID NO: 10) TAATCTTA I706 (SEQ ID NO: 17) GAATTCGT I507 (SEQ ID NO: 11) CAGGACGT I707 (SEQ ID NO: 18) CTGAAGCT I508 (SEQ ID NO: 12) GTACTGAC I708 (SEQ ID
  • the dual-tag library used in the present invention can reduce the base error rate of sequencing by simultaneously introducing two different UMIs on the DNA double strand, utilizing the double-stranded property of the DNA, and correcting the information obtained by sequencing using the two strands. Up to 2.4 ⁇ 10 -6 , it can accurately detect the mutation rate of 1 ⁇ 10 -5 gene, effectively improve the sensitivity of gene mutation detection, combined with the flux of high-throughput sequencing, and can detect multiple mutations of multiple genes in one sequencing. point.
  • the sequencing result will be formatted first, and the sequencing quality of the sequencing sequence is evaluated by the positioning base at the tail of the linker. If the located base cannot be found, a pair of sequencing sequences are completely discarded. At the same time, the random base sequences of the front ends of a pair of sequencing sequences are excised and merged into the sequence ID.
  • the filtered sequence will be aligned with the reference genome (Hg19, GRCh37, etc.), and the unqualified sequencing sequences (filters) are filtered according to the set parameters (mapping quality is too low, multi-site matching, Read1 and Read2 sequences are not Matching, etc., finally yielding high quality unique reads that can be used for analysis.
  • Using a double index connector increases the number of samples that are sequenced on the machine (reducing the cost of sequencing). At the same time, the double-ended index can distinguish different samples more effectively, which is very important in the detection of low-frequency mutations in genes, because the general situation
  • the mutation rate of the gene mutation site detected is about one thousandth to one hundredth. If cross-contamination occurs in different samples with different mutation sites, it is easy to be problematic when the final mutation site is determined.
  • the connector used is a long connector, that is, the related sequence (P5, P7) on the linker combined with the flowcell on the sequencer after sequencing, and PCR amplification is not required to introduce P5 and P7 sequences, and PCR can be completed.
  • the -free library avoids base errors (mutations) introduced by PCR during library construction and amplified fragment preferences, as well as non-native chimeric sequences generated by PCR.
  • the structure of the linker of the present invention is shown in Figure 1.
  • the Y-form on the left side (excluding the molecular tag and the localization tag) is identical to the standard linker of the Illumina sequencing platform; wherein the parallel portion of the Y-type linker is complementary to each other, splitting Partial bases have no pairing sequences; the reverse complementation of P5 and P7 (P7-A, P7-B and P7-C) requires hybridization with probes on the sequencing chip of the Illumina sequencer for subsequent bridge amplification Amplify the signal; the I5index sequence and the I7index sequence serve as labels for different sequencing libraries constructed to distinguish libraries constructed from different samples; Read1 sequencing sequences and Read2 sequencing sequences are used in combination with sequencing primers for sequencing while sequencing; molecular tags are The NNNNNNNNNNNN random tag sequence is used to add high-throughput sequenced DNA library templates with different markers; since the molecular tag sequences are random, it is necessary to add a fixed sequence of positioning bases later to determine the molecular tags during data
  • the present invention utilizes a random tag sequence on a double-tag linker to add a different sequence tag to each DNA template during the ligation step in the construction of a high-throughput sequencing library, and then each of the subsequent PCR enrichment processes
  • the original template is replicated multiple times along with its tag sequence, resulting in multiple copies; high-throughput sequencing of these copies, identifying the source of the sequenced fragments by sequence tag (used to distinguish the repeats generated during the building process - duplication
  • sequence correction amplification error and sequencer base recognition error
  • the sequence is corrected again by the pairwise reverse complementary pairing of the tag sequences (deamination, oxidation, etc., which occur during DNA construction and during the database construction process).
  • the positioning tag of the double-label connector is used to determine the position of the molecular tag sequence, which is crucial for the identification of the molecular tag.
  • the sequence usually uses a fixed sequence such as ACT, GACT, TGACT, and the like.
  • the Illumina sequencing platform (including Nextseq500, CN500, Miseq, Nextseq) calculates the PF value of the template cluster based on the base sequencing of the first 25 cycles at the beginning of sequencing (the final retained high quality template clusters account for the total template cluster) Ratio) due to sequencing
  • the cluster density of the chip is limited, and the PF value determines the yield of valid data in the sequencing.
  • the positioning tag of the double-labeled linker is located within about 9-15 cycles of the start of sequencing. If a single sequence is used, the position is too low (4 base ratio) at the time of sequencing, which will result in serious PF value. The decline ultimately affects data production.
  • the linker primer P7-A, the linker primer P7-B and the linker primer P7-C respectively use three different sequence positioning tag sequences JJJJJ, KKKKK and LLLLL to ensure the positioning of the base from 3' to 5'.
  • Each base in the direction is different, which increases the base diversity of the linker tag, which effectively increases the PF value of the sequencing, thereby significantly increasing the yield of valid data for sequencing.
  • the blunt-end linker has an OH group at the 3' end of the blunt-end linker P5 sequence and at the 5' end of the P7 sequence.
  • this part of the blunt-ended linker is linked by the 3'-end OH group of the P5 sequence and the 5'-end phosphate group of the DNA double-stranded template (the other chain is unable to be linked because of the OH group).
  • both ends of the double-stranded DNA template are blunt-ended, there is a gap between the junctions at both ends of the template, and subsequent PCR amplification cannot be performed, resulting in loss of partial DNA template;
  • the DNA template is ligated with a blunt-end linker at one end and the normal linker at the other end.
  • the blunt-end linker P5 sequence-DNA template single-strand-normal linker P7-end sequence template is amplified as an effective PCR template, and another strand of the DNA template Because the joints on both sides have gaps that cannot be amplified and are lost. (1) and (2) both cause loss of the sample DNA template.
  • the loss of one strand in the DNA double strand causes the template DNA to be unable to find a complementary strand when the double-stranded random label is corrected, thus affecting the double The performance of strand correction; the residue of the 3' end of the P5 chain of the blunt-end linker will contaminate the template sequence, resulting in the waste of part of the Read1 sequence and the loss of sequencing information.
  • the present invention introduces biotin modification at the 5' end of the P7 linker primer, and the P5 and P7 linker primers are biotinylated at the 5' end of the P7 chain after annealing and extension; the normal linker biotin label is lost after enzymatic cleavage
  • the blunt-end linker ie, the residue of the extension product
  • the enzyme-cut linker is purified by the avidin magnetic bead to remove the residual blunt-end linker, thereby effectively removing the enzyme cut.
  • the linker protection base sequence residue which is not completely caused is shown in Fig. 1.
  • FIG. 1 is a schematic structural view of a single positioning double label joint prepared in Embodiment 1 of the present invention.
  • FIG. 2 is a schematic diagram showing the effect of a blunt-end linker (extension of an extension linker) on a sequencing library in the present invention
  • FIG. 3 is a schematic diagram of a double tag joint for introducing Index by PCR in the present invention
  • FIG. 5 is a schematic flow chart of identifying a cell mutation in a single-positioned double-label joint according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic diagram of a preparation process of a multi-positioning dual-label joint set according to Embodiment 4 of the present invention.
  • Linker primer P5 linker primer P7 two primers
  • linker primer P5 is shown by SEQ ID NO: 1 by SEQ ID NO: 1 linked to SEQ ID NO: 2
  • linker primer P7 is SEQ ID NO: 3 by SEQ ID NO: 3 by SEQ ID NO: The sequence obtained by NO:4 is shown; wherein FFFFFEEEEEDDDDDNNNNNNNNNN is sequentially linked to the 5' end of SEQ ID NO: 3; the synthetic manufacturer: Bioengineering (Shanghai) Co., Ltd.) is diluted to 100 ⁇ M with ddH 2 O (or TE buffer).
  • ddH 2 O or TE buffer
  • FFFFF is a restriction site protection base
  • EEEEE is a restriction site
  • DDDDD is a localization tag sequence
  • NNNNNNNNNNNN is a random molecular tag sequence
  • the I5index sequence is selected from SEQ ID NOs: 5-12; From SEQ ID NOs: 12-23.
  • FFFFF/DDDDD/EEEEE/ includes but is not limited to 5 identical bases; NNNNNNNNNN is 4 to 12 random bases, and there are no four consecutive identical bases.
  • Second precipitation adding 1/10 volume of NaAC (3M) and 2.5 volumes of absolute ethanol to the product obtained in step (4), mixing and placing at -20 ° C for 2 h; 4 ° C, 13000 g centrifugation 30 min; de-clear, add 600 ⁇ L of 70% ethanol to rinse the pellet, centrifuge at 13000 g for 30 min at 4 ° C; remove the supernatant, dry the DNA for 5-10 min at room temperature, and resuspend the DNA with 26 ⁇ L TE low buffer, which is the final single-position double-labeled adaptor. (25 ⁇ M, structure shown in Figure 2), 5 ⁇ L sub-package, -80 ° C frozen for use.
  • the single-positioned double-labeled linker prepared in Example 1 is used, and the protected base is TCTTCT; the sequence of the cleavage site is (In the box, the base is located, the cleavage site and the locating base are partially overlapped); the molecular tag is BDHVBDHV.
  • the combination of the I5index sequence and the I7index sequence may be: I501-I701, I502-I702, I503-I703, I504-I704, I505-I705, I506-I706, I507-I707, I508-I708, I501-I707, I502-I708, I503-I709, I504-I710.
  • the base sequence corresponding to the serial number is shown in Table 1
  • Sample selection and quality control Take 5 lung cancer patient plasma samples, use QIAGEN plasma DNA extraction kit to extract plasma DNA, use spectrophotometer to determine DNA sample purity (require A260/280 between 1.8-20); then use Qubit2 .0 Determine the DNA concentration (total between 5-15 ng), use D1000 chip (Agilent) to detect DNA sample fragment distribution (about 160-200 bp), and use the digital PCR (Bio-rad) to determine the tumor sample EGFR gene T790M locus Mutation rates (1.9%, 0.8%, 0.18%, 0.12%, and 1.44%).
  • Library construction The library was built using the KAPA DNA library, and all DNA samples were used to build the library.
  • DNA sample end repair add 7 ⁇ L 10 ⁇ end repair buffer, 5 ⁇ L end repair enzyme, 20°C, 30min
  • A-taling enzyme plus A tail 5 ⁇ L 10 ⁇ end repair buffer, 3 ⁇ L end repair enzyme 30°C, 30 min
  • the product was divided into two parts after purification, and the single positioning double label joint prepared in Example 2 was used in the jointing step respectively.
  • the experimental group was the single-positioned double-labeled adaptor prepared by the addition of Example 2; the control was the same as that of the experimental group except that the linker sequence used was different.
  • the combination of the upstream and downstream amplification primers used in the conventional library-connected sample set is the universal primer (SEQ ID NO: 5) and the Index primer (SEQ ID NO: 6), and the single-position double-labeled joint sample set prepared in Example 2 is used.
  • the upstream and downstream primer combinations are PCR-P5 primer + PCR-P7 primer;
  • SEQ ID NO: 27 is shown by the sequence obtained by ligating SEQ ID NO: 28 with the I7index sequence
  • I7Index sequence is selected from the group consisting of SEQ ID Nos: 12-23.
  • the P5 primer and P7 primer sequences corresponding to the single-positioned double-labeled linker prepared in Example 1 When PCR was carried out using the single-positioned double-labeled linker prepared in Example 1, and the PCR was carried out, the following primer sequences were used:
  • the library was targeted for capture. After the library was qualified (Agilent 2100/2200), the size of the library fragment was determined. For example, when the library was inserted, the size of the insert (template) was 200-350 bp. The upper end of the joint is P5, P7 increases 140bp, the library size distribution should be 340bp-490bp; QPCR judges the capture effect - the average enrichment factor is less than 10, indicating that the capture fails, need to be recaptured for sequencing.
  • the sequencing depth of each sample was 20000 ⁇ .
  • the raw data obtained after sequencing was 8.20G
  • the clean data Q20 was 94.25%
  • the Q30 was 0, 3%
  • the mapping rate was 99.9%
  • the coverage was 99.89%.
  • the common joint sample group can accurately detect the mutation sites of 1.9% and 1.44% of the two samples, and the single-position double-labeled joint sample group can detect 1.9%, 0.8%, 0.18%, 0.12% and 1.44.
  • NCI-H1650 and HCT two cell lines were selected as experimental materials, and NCI-H1650 cell DNA was incorporated into HCT cell DNA at a mass ratio of 10%, 1%, and 0.1%, respectively, and NCI-H1650 and HCT cells were 100. %DNA was taken as two samples, which were recorded as 10%, 1%, 0.1%, NCI-H1650 and HCT groups, respectively.
  • the NCI-H1650 and HCT groups are only used to determine the genetic background of the cell line DNA used for mixing ratios - that is, allelic locus information, such as heterozygous homozygous, etc., through the sequencing information of these two samples, find out some Homozygous base sites, and then select sites with different bases at the same site as the analytical statistical sites of other sample groups).
  • each DNA library was prepared (KAPA DNA Library Kit). After the fragment plus A tail step, 10%, 1%, and 0.1% samples were divided into two groups and divided into two groups.
  • a common linker (such as the sequences shown in SEQ ID NO: 3 and SEQ ID NO: 4) and the single-positioned double tag linker prepared in Example 1 (shown in Figure 5) were added to the ligation step, respectively, followed by subsequent libraries.
  • the preparation step and the capture step were carried out using the Roche SeqCap EZ custom kit (250k), and finally the sequencing was performed on the machine. The sequencing depth was 20,000 ⁇ , and the sequencing results were detected by SNP using the filtered Q30 unique mapping reads.
  • the single-position double-label joint has FFFFF as TCTTCT, EEEEE as ACAGT, and DDDDD as AGT; and overlaps with the above EEEEE sequence.
  • NNNNNNNNNNNNNNNN is the sequence shown by BDHVBDHV, where B indicates that the position is a base other than A, D position indicates that the position is a base other than C, H indicates that the position is a base other than G, and V indicates that the position is Bases other than T.
  • Results First analyze the data of two samples of NCI-H1650 and HCT, and find out according to the SNP detection information.
  • Roche captures the base MAF (minor allele base frequency) in the 250K bp capture region of the chip, and screens for a 0% base site (SNP homozygous negative) and 100% base site (SNP pure) Positive (the actual criterion is to specify a threshold, such as 0.1%, if the MAF value of a certain site is less than 0.1%, the site is considered to be 0% base site, that is, SNP homozygous negative site; 100 % sites and so on); screen out the corresponding sites in the two cell lines (the same position in the genome) one is homozygous positive, and the other is homozygous negative sites, these sites as follow-up other sample groups
  • the analysis site is divided into sample statistical detection rate and false positive, false negative and other information.
  • a total of 178 homozygous allelic SNP loci were detected in the NCI-H1650 and HCT groups (100%) (ie, each locus was homozygous negative in one cell line and homozygous positive in another cell line). Then 10%, 1%, and 0.1% samples of different linkers were used to analyze the 178 sites, and the mutation rate (heterozygous ratio) of 178 sites in different proportions of samples was 10%, 1%, and 0.1%, respectively.
  • the positive detection rate of the common connector in the 10% sample group was 100%, the detection rate in the 1% group was 98.86%, and the detection rate in the 0.1% group was 81.29%.
  • the single positioning double label prepared in Example 1 The detection rate of the joints in the 10%, 1% and 0.1% groups was 100%; the false positive rate: at the sensitivity of 1%, the false positive rate of the common joint was 0.01%, at 0.1% sensitivity, the common joint The false positive rate is above 5%; while the false positive rate of the single-positioned double-labeled joint prepared in Example 1 is 0.001% at 0.1% sensitivity (the point at which the sensitivity value exceeds a certain threshold for the base variation frequency is considered to be
  • the detected mutation site for example, 1% sensitivity refers to a base mutation frequency threshold of 1%, greater than 1% of the site Is detected as a mutation site).
  • linker Primer P5-A, the linker primer P7-A, the linker primer P7-B, and the linker primer P7-C were modified with ddH 2 respectively. O diluted to 100 ⁇ M;
  • the linker primer P5 is obtained by SEQ ID NO: 01 linking the sequence shown by SEQ ID NO: 02 with an I5 index sequence.
  • the linker primer P7-A is FFFFFEEEEEJJJJJNNNNNNNNNNNNNNNNNNNN, which is ligated to the 5' end of SEQ ID NO: 03, and SEQ ID NO: 03 is further ligated to the sequence shown by SEQ ID NO: 04 by the I7 index sequence;
  • the linker primer P7-B is FFFFFEEEEEKKKKKNNNNNNNNNNNNNN, which is ligated to the 5' end of SEQ ID NO: 03, and SEQ ID NO: 03 is further ligated to the sequence shown by SEQ ID NO: 04 by the I7 index sequence;
  • the linker primer P7-C is FFFFFEEEEELLLLLNNNNNNNNNNNN, which is ligated to the 5' end of SEQ ID NO: 03, and SEQ ID NO: 03 is further ligated to the sequence shown by SEQ ID NO: 04 by the I7 index sequence;
  • the FFFFF is a restriction enzyme protection base
  • EEEEE is a restriction site
  • JJJJJJ, KKKKK and LLLLL are positioning tag sequences and JJJJJ, KKKKK and LLLLL are different
  • NNNNNNNNNNNN is a random molecule.
  • Tag sequences; FFFFF, JJJJJ, KKKKK, LLLLL, and EEEEE include, but are not limited to, five identical bases, I7index sequences are 6-8 bases; NNNNNNNNNNNN is 4 to 12 random bases, and there are no four consecutive identical bases base.
  • the NNNNNNNNNNNN is represented as BDHVBDHV, wherein B indicates that the position is a base other than A, the D position indicates that the position is a base other than C, H indicates that the position is a base other than G, and V indicates This position is a base other than T.
  • the I5index sequence is selected from the group consisting of SEQ ID NOs: 05 to 12; and the I7index sequence is selected from the group consisting of SEQ ID NOs: 13-23.
  • sequences of the JJJJJ, KKKKK, and LLLLL may partially overlap with the sequence of the EEEEE, or may overlap completely.
  • the bases of the overlapping portion appear only once.
  • Annealing Mixing the primer primer P5, the linker primer P7-A, the linker primer P7-B, the linker primer P7-C, and the buffer with an appropriate amount of deionized water, and then annealing to obtain annealed joint A, annealed joint B, and Annealing joint C, specifically:
  • the following system was prepared in a 15 mL centrifuge tube: linker primer P5: 1 mL, linker primer P7-A: 334 ⁇ L, linker primer P7-B: 334 ⁇ L, linker primer P7-C: 334 ⁇ L, NEB buffer 2: 300 ⁇ L , ddH 2 O: 700 ⁇ L; total 3 mL.
  • the following reaction is carried out: 95 ° C in a water bath, 5 min; then immediately placed in a beaker containing 95 ° C hot water, slowly cooled to 24-27 ° C at room temperature;
  • Extension annealing joint polymerase extension of the obtained annealed joint A, annealed joint B and annealed joint C to obtain an extended joint A, an extended joint B and an extended joint C, specifically: in the original 15 mL centrifuge tube: 10 ⁇ NEB Buffer: 200 ⁇ L, 25 mM dNTP mix: 200 ⁇ L, 500 mM DTT: 6 ⁇ L, Klenow exo-(5 U/ ⁇ L): 100 ⁇ L, make up the volume to 5 mL with ddH 2 O, mix well, rotate and mix in a 37 ° C incubator, incubate for 1 h. ;
  • extension joint A, extension joint B and extension joint C were respectively subjected to ethanol or isopropanol precipitation purification to obtain purified extension joint A, extension joint B and extension joint C, specifically: Add 1/10 volume of NaAC (3M) and 2.5 times volume of absolute ethanol to the product obtained in step (2), mix and store at -20 ° C for 2 h; centrifuge at 13000 g for 30 min; remove the supernatant and add 5 mL of 70 vol% ethanol.
  • ethanol or isopropanol precipitation purification to obtain purified extension joint A, extension joint B and extension joint C, specifically: Add 1/10 volume of NaAC (3M) and 2.5 times volume of absolute ethanol to the product obtained in step (2), mix and store at -20 ° C for 2 h; centrifuge at 13000 g for 30 min; remove the supernatant and add 5 mL of 70 vol% ethanol.
  • Second precipitation the obtained enzyme-cleaved linker A, the enzyme-cut linker B and the enzyme-cut linker C are subjected to ethanol or isopropanol precipitation to obtain a double-labeled linker A, a double-labeled linker B and a double-labeled linker C, Specifically: adding 1/10 volume of NaAC and 2.5 volumes of absolute ethanol to the product obtained in the step (4), mixing and placing at -20 ° C for 2 hours; 4 ° C, centrifugation at 13000 g for 30 min; The precipitate was rinsed by adding 10 mL of 70% ethanol, centrifuged at 13,000 g for 30 min at 4 ° C; the supernatant was removed, and the DNA was dried at room temperature for 20-30 min, and resuspended with 2 mL of ddH 2 O;
  • Biotin purification Biotin affinity purification of the double-labeled linker A, the double-labeled linker B and the double-labeled linker C obtained in the step (5), specifically: taking 2 mL of Dynabeads MyOne Streptavidin C1 magnetic beads, using 1 ⁇ After rinsing the B&W buffer beads, resuspend the magnetic beads with 2 mL of 2 ⁇ B&W buffer, add 2 mL of the product obtained in step (5) to the magnetic beads, incubate at 4 ° C for 30 min, stand on a magnetic stand, and take the supernatant to a new 50 mL. In a centrifuge tube;
  • the product obtained in the step (6) is purified by ethanol or isopropanol precipitation to obtain the multi-positioned double-labeled joint group, specifically: adding 1 to the product obtained in the step (6) /10 volume of NaAC and 2.5 times the volume of absolute ethanol, mixed and placed at -20 ° C for 2h; 4 ° C, 13000g centrifugation for 30min; remove the supernatant, add 10mL 70% ethanol rinse precipitate, 4 ° C, 13000g centrifugation for 30min; The supernatant was dried at room temperature for 20-30 min, and resuspended in 1.5 mL TE low buffer, which is the multi-position double-label joint group, and the multi-position double-label joint group was subjected to quality control and then dispensed, -20 °C frozen for use.
  • the library was constructed with 30 ng of disrupted leukocyte DNA (average length 220 bp). The experiment was divided into two groups. One group was constructed using the single-positioned double-labeled linker prepared in Example 1, and the other group was prepared using the sample prepared in Example 4. Position the dual-tag connector to build the library. The NEBNext Ultra II DNA Library Prep Kit is used in the library kit. The library construction steps are as follows:
  • the constructed library was qualified and sequenced using the NextSeq500 platform.
  • the sequencing reagent was Mid Output kit (300 cycles), and the Phix incorporation ratio was 1%.
  • Each library was separately sequenced on the machine. The experiment was repeated 3 times and sequenced on the machine.
  • the sequencing platform was NextSeq500, the sequencing reagent was Mid Output kit (300 cycles), the Phix incorporation ratio was 1%, and the quality of the sequencing results was as follows:
  • the library was constructed with 30 ng of disrupted leukocyte DNA (average length 220 bp). The experiment was divided into two groups. One group was constructed using the single-positioned double-labeled linker prepared in Example 1, and the other group was prepared using the sample prepared in Example 4. Position the dual-label connector library, and use the NEBNext Ultra II DNA Library Prep Kit to build the library.
  • the library construction steps are as follows:
  • the constructed library was qualified and sequenced using the NextSeq500 platform.
  • the sequencing reagent was Mid Output kit (300 cycles), the Phix incorporation ratio was 1%, and the amount of data per library was 1 Gb.
  • the sequencing results were as follows:
  • the invention provides a multi-position double-label joint set for detecting gene mutation, a preparation method thereof and a specific application.
  • the mutation rate of 1 ⁇ 10 -5 gene can be accurately detected, the sensitivity of gene mutation detection can be effectively improved, and the flux of high-throughput sequencing can be combined, and multiple mutation sites of multiple genes can be detected by one sequencing.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biomedical Technology (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

一种用于检测基因突变的多定位双标签接头组及其制备方法和应用,该多定位双标签接头组包括双标签接头A、双标签接头B和双标签接头C,双标签接头A、双标签接头B和双标签接头C分别由接头引物P5分别与5'端均修饰有生物素的接头引物P7-A、接头引物P7-B和接头引物P7-C合成而得到。利用该多定位双标签接头组,可以准确检测1×10 -5基因突变率,有效提高基因突变检测的灵敏度,结合高通量测序的通量,一次测序可以检测多个基因的多个突变位点。

Description

一种用于检测基因突变的多定位双标签接头组及其制备方法和应用 技术领域
本发明涉及核酸测序技术领域,具体涉及一种用于检测基因突变的多定位双标签接头组及其制备方法和应用。
背景技术
目前的二代测序由于样品制备(文库制备)和仪器系统本身的原因(DNA本身的氧化损伤或脱氨基损伤等、建库过程中PCR酶复制时本身引入的突变、测序时仪器读取碱基时引入的错误等),测序得到的每个碱基出现错误的概率在1/1000-1/100之间,即每1000个碱基就会出现1到10个错误碱基。
在germline mutation(生殖系细胞突变)检测中由于产生突变位点在样本中占的比率只有0%,50%和100%3种情况,因此系统性的碱基读取错误可以通过数据分析中同一区域的overlap reads来校正,从而达到很高的测序准确度;
而对于肿瘤细胞突变等体细胞突变(somatic mutation)等突变位点细胞之间具有很大的异质性(每个细胞的突变位点都可能不同),这类突变在样本中占的比例很低(低于1%),这类突变无法使用传统的生物信息学方法区分(系统碱基错误率作为噪音和肿瘤突变位点的信号之间信噪比过低),因此肿瘤位点突变用常规的测序方法无法准确检测。
后来发展出来的分子标签(UMI unique molecule identifier)可以有效地解决这一问题。通过在样本原始的DNA分子上引入随机序列标签,给每一个分子都做上独特的标记,然后每个分子在建库过程中得到扩增并最后测序,通过生物信息学分析,可以去除掉大部分建库和测序过程中产生的突变(错误),使测序的碱基错误率降低到1×10-5,假设肿瘤突变检测需要10倍的信噪比,此种方法可以准确检测1×10-4的突变率。
如何提高肿瘤突变率检测的灵敏度,是一个迫切需要解决的问题。
发明内容
本发明的目的在于提供一种用于检测基因突变的多定位双标签接头组。
本发明的另一目的在于提供上述多定位双标签接头组的制备方法。
本发明的再一目的在于提供上述多定位双标签接头组的具体应用。
一种用于检测基因突变的多定位双标签接头组,包括双标签接头A、双标签接头B 和双标签接头C,双标签接头A、双标签接头B和双标签接头C分别由接头引物P5分别与5’端均修饰有生物素的接头引物P7-A、接头引物P7-B和接头引物P7-C合成而得到,其中:
接头引物P5为SEQ ID NO:01通过I5index序列连接SEQ ID NO:02所示序列所得,
接头引物P7-A为FFFFFEEEEEJJJJJNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
接头引物P7-B为FFFFFEEEEEKKKKKNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
接头引物P7-C为FFFFFEEEEELLLLLNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
所述FFFFF为酶切位点保护碱基,EEEEE为酶切位点,JJJJJ、KKKKK和LLLLL为定位标签序列且JJJJJ、KKKKK和LLLLL各不相同,NNNNNNNNNNNN为随机分子标签序列;FFFFF、JJJJJ、KKKKK、LLLLL和EEEEE包括但不限于5个相同碱基,I7index序列为6~8个碱基;NNNNNNNNNNNN为4到12个随机碱基,且没有四个连续相同的碱基。
在本发明的一个优选实施方案中,所述NNNNNNNNNNNN表示为BDHVBDHV,其中B表示该位置是除A外的碱基,D位置表示该位置是除C外的碱基,H表示该位置是除G外的碱基,V表示该位置是除T外的碱基。
在本发明的一个优选实施方案中,所述I5index序列选自SEQ ID NO:05~12;所述I7index序列选自SEQ ID NO:13~23;所述JJJJJ、KKKKK和LLLLL的序列可以和所述EEEEE的序列部分重叠或完全重叠,当部分或完全重叠的时候,重叠部分的碱基只出现一次。
一种上述用于检测基因突变的多定位双标签接头组的制备方法,包括如下步骤:
(1)退火:将接头引物P5、接头引物P7-A、接头引物P7-B、接头引物P7-C及缓冲液和适量去离子水混合后,进行退火处理得到退火接头A、退火接头B和退火接头C;
(2)延伸退火接头:对所得退火接头A、退火接头B和退火接头C进行聚合酶延伸得到延伸接头A、延伸接头B和延伸接头C;
(3)第一次沉淀:对所得延伸接头A、延伸接头B和延伸接头C分别进行乙醇或异丙醇沉淀纯化得到纯化后的延伸接头A、延伸接头B和延伸接头C;
(4)酶切:对纯化后的延伸接头A、延伸接头B和延伸接头C分别加入能够产生3’T突出末端的限制性内切酶进行酶切得到酶切接头A、酶切接头B和酶切接头C;
(5)第二次沉淀:将所得酶切接头A、酶切接头B和酶切接头C进行乙醇或异丙醇沉淀纯化后即得到双标签接头A、双标签接头B和双标签接头C;
(6)生物素纯化:将步骤(5)所得的双标签接头A、双标签接头B和双标签接头C进行生物素亲和纯化;
(7)第三次沉淀:将步骤(6)所得的产物进行乙醇或异丙醇沉淀纯化后即得到所述多定位双标签接头组。
上述多定位双标签接头组的具体应用如下:
一种文库构建方法,包括:将10ng-1ug的待检测DNA打断成200-500bp的DNA片段后,DNA片段加入末端修复酶进行末端修复,加A尾,加入上述多定位双标签接头组进行连接,连接完成后使用Ampure磁珠或切胶进行340-660bp的片段选择。
一种测序方法,包括如下步骤:
(1)用上述文库构建方法构建文库;
(2)对所述测序文库进行测序。
一种确定核酸序列的方法,包括如下步骤:
(1)用上述文库构建方法构建文库;
(2)对所述测序文库进行测序;
(3)根据测序结果进行结果判定;
所述结果判定的方法包括如下步骤:
a、根据设置的参数选取比对后的碱基Q值大于30的测序唯一匹配序列;
b、根据随机标签序列进行Duplication判定,从而进行碱基的重新校正;
c、使用SNP calling软件进行SNP位点检测,统计SNP位点信息,最终得到的SNP位点及对应的MAF信息;
d、对检测到的SNP位点及MAF信息和对照组的突变位点以及群体基因组变异信息库进行比较,过滤掉相同的突变位点,最终留下的突变位点信息即为最终检测到的突变位点信息。
表1和表2中列举了一部分I5index序列,I7index序列和EEEEE序列;但不限于此。
表1 部分I5index序列和I7index序列表
I5 index序列代号 I5 index序列 I7 index序列代号 I7 index序列
I501(SEQ ID NO:05) TATAGCCT I701(SEQ ID NO:12) ATTACTCG
I502(SEQ ID NO:06) ATAGAGGC I702(SEQ ID NO:13) TCCGGAGA
I503(SEQ ID NO:07) CCTATCCT I703(SEQ ID NO:14) CGCTCATT
I504(SEQ ID NO:08) GGCTCTGA I704(SEQ ID NO:15) GAGATTCC
I505(SEQ ID NO:09) AGGCGAAG I705(SEQ ID NO:16) ATTCAGAA
I506(SEQ ID NO:10) TAATCTTA I706(SEQ ID NO:17) GAATTCGT
I507(SEQ ID NO:11) CAGGACGT I707(SEQ ID NO:18) CTGAAGCT
I508(SEQ ID NO:12) GTACTGAC I708(SEQ ID NO:19) TAATGCGC
    I709(SEQ ID NO:20) CGGCTATG
    I710(SEQ ID NO:21) TCCGCGAA
    I711(SEQ ID NO:22) TCTCGCGC
    I712(SEQ ID NO:23) AGCGATAG
表2 可用的限制性内切酶及酶切位点表(部分)
Figure PCTCN2017099255-appb-000001
Figure PCTCN2017099255-appb-000002
本发明使用的双标签文库测序,通过在DNA双链上同时引入两个不同的UMI,利用DNA的双链特性,使用两条链相互校正测序得到的信息,可以使测序的碱基错误率降低到2.4×10-6,因此可以准确检测1×10-5的基因突变率,有效提高基因突变检测的灵敏度,结合高通量测序的通量,一次测序可以检测多个基因的多个突变位点。
1.测序结果先会进行格式转换,通过接头尾部的定位碱基对测序序列的测序质量进行评估,如果无法找到定位碱基,则一对测序序列全部丢弃。同时,将一对测序序列前端的随机碱基序列切除并合并到序列ID当中去。
2.过滤后的序列会和参考基因组进行比对(Hg19,GRCh37等),根据设置的参数过滤掉不合格的测序序列(reads)(mapping quality过低,多位点匹配,Read1和Read2序列不匹配等),最后得到可以用于分析的高质量唯一匹配序列(unique mapping reads)。
3.通过使用第1步中添加到ID位置的随机标签序列进行Duplication判定,比对到同一位置且具有相同标签的序列被认为来源一同一个起始DNA模板,将会被归为一簇进行碱基的重新校正。
4.使用SNP calling软件进行SNP位点检测,统计SNP位点信息,最终得到的SNP位点及相关的MAF信息。
对检测到的基因突变信息和对照组(同一病人来源的健康组织DNA)的突变位点以及群体基因组变异信息库进行比较,过滤掉相同的突变位点,最终留下的突变位点信息即为最终检测到的基因突变位点信息。
本发明的有益效果:
1、使用双Index接头,增加了一次上机测序的样本数量(降低测序成本),同时,双端Index可以更有效地区分不同样本,这一点在基因的低频突变检测中非常重要,因为一般情况下检测的基因突变位点的突变率是千分之一到百分之一左右,如果具有不同突变位点的不同样本出现了交叉污染,那么在最终突变位点判定时就容易出问题。
2、使用的接头为长接头,即接头上带有测序时和测序仪上flowcell结合的相关序列(P5、P7),PCR连接之后不需要再进行PCR扩增引入P5、P7序列,可以完成PCR-free建库,避免了文库构建过程中PCR引入的碱基错误(突变)和扩增的片段偏好性,以及PCR产生的非天然的嵌合体序列。
本发明的接头的结构如图1所示,左侧的Y型结构(不包括分子标签和定位标签)和Illumina测序平台的标准接头相同;其中,Y型接头平行部分碱基互补配对,开叉部分碱基无配对序列;其中P5和P7(P7-A、P7-B和P7-C)反向互补需要用于和Illumina测序仪的测序芯片上的探针进行杂交,后续进行桥式扩增放大信号;I5index序列和I7index序列作为构建的不同测序文库的标签,用来区分不同样本构建的文库,;Read1测序序列和Read2测序序列用来和测序引物进行结合进行边合成边测序;分子标签即为NNNNNNNNNNNN随机标签序列,用来给高通量测序的DNA文库模板加上不同的标记;因为分子标签序列随机,因此需要在后面加入固定序列的定位碱基用来在数据分析时判断分子标签的位置及序列。
本发明利用双标签接头上的随机标签序列,在高通量测序文库构建过程中的加接头步骤中,给每一个DNA模板加上不同的序列标签,然后在后续的PCR富集过程中每一个原始模板连同其标签序列被多次复制,产生多个拷贝(duplications);对这些拷贝进行高通量测序,通过序列标签识别测序片段来源(用来区分建库过程中产生的重复序列——duplication,以便数据分析时对测序结果进行校正),再利用模板的拷贝进行序列校正(扩增错误以及测序仪碱基识别错误),第一次校正后再利用DNA的两条链反向互补的结构,通过标签序列的两两反向互补配对,再次对序列进行校正(DNA建库前及建库过程中产生的脱氨基、氧化等损伤)。
3、双标签接头的定位标签用于确定分子标签序列的位置,对分子标签的识别至关重要,其序列通常使用固定的序列,如ACT、GACT、TGACT等。Illumina测序平台(包括Nextseq500,CN500,Miseq,Nextseq)在测序的起始阶段会根据前25个循环的碱基测序情况来计算模板簇的PF值(最终保留的高质量的模板簇占总模板簇的比例),由于测序 芯片的簇密度有限,PF值决定了测序中有效数据的产量。
双标签接头的定位标签位于测序起始的约9-15个循环内,若使用单一的序列,测序时该位置由于碱基多样性过低(4种碱基的比例),会导致PF值严重下降,最终影响数据产量。
本发明在退火步骤中,接头引物P7-A、接头引物P7-B和接头引物P7-C分别使用3种不同序列的定位标签序列JJJJJ、KKKKK和LLLLL,保证定位碱基从3’到5’方向的每位碱基均不相同,增加了接头定位标签的碱基多样性,有效地提高了测序的PF值,从而显著提高了测序有效数据的产量。
4、在接头的制作过程中,由于酶促反应很难反应彻底,因此在酶切过程中会有部分延伸产物未被酶切掉,最终残留一部分带有酶切位点保护碱基(约8bp)的平末端接头,平末端接头P5序列的3’末端和P7序列的5’末端端各有一个OH基团。在接头的连接过程中(图2),这部分平末端接头会通过P5序列的3’端OH基和DNA双链模板的5’端磷酸基团连接(另一条链因为都是OH基无法连接)。(1)如果双链DNA模板的两端都加上平末端接头,则由于模板两端的接头连接处各有一个缺口而无法进行后续的PCR扩增,导致部分DNA模板的损失;(2)如果DNA模板一端连上平末端接头,而另一端连上正常接头,则平末端接头P5序列-DNA模板单链-正常接头P7端序列模板会作为有效PCR模板而扩增,DNA模板的另外一条链因为两侧接头连接处有缺口而无法扩增,因而丢失。(1)和(2)都会造成样本DNA模板的损失,另外(2)的情况下,DNA双链中一条链的丢失会导致双链随机标签校正时模板DNA找不到互补链,从而影响双链校正的性能;平末端接头P5链3’端保护碱基的残留会污染模板序列,造成测序结果Read1序列部分数据的浪费和测序信息的丢失。
本发明在P7接头引物的5’端引入生物素修饰,P5、P7接头引物经退火和延伸后均在P7链的5’端带有生物素标记;经酶切后正常的接头生物素标记丢失,而未被酶切的平末端接头(即接头延伸产物残留)仍然带有生物素标记,酶切接头经亲和素磁珠纯化,即可去除残留的平末端接头,进而有效去除了酶切不彻底导致的接头保护碱基序列残留,其示意图如图1所示。
附图说明
图1是本发明实施例1制备的单定位双标签接头的结构示意图;
图2是本发明中平末端接头(延伸接头残留)对测序文库的影响示意图;
图3为本发明中通过PCR引入Index的双标签接头示意图;
图4为本发明实施例2中的单定位双标签接头文库构建流程图;
图5为本发明实施例3中单定位双标签接头鉴别细胞突变的流程示意图。
图6为本发明实施例4中多定位双标签接头组制备过程示意图。
具体实施方式
以下通过具体实施方式结合附图对本发明的技术方案进行进一步的说明和描述。
实施例1:单定位双标签接头的制备
将接头引物P5,接头引物P7两个引物(接头引物P5为SEQ ID NO:1通过I5index序列连接SEQ ID NO:2所得序列所示;接头引物P7为SEQ ID NO:3通过I7index序列连接SEQ ID NO:4所得序列所示;其中FFFFFEEEEEDDDDDNNNNNNNNNNNN依次连接在SEQ ID NO:3的5’端;合成厂家:生工生物工程(上海)股份有限公司)用ddH2O(或TE缓冲液)稀释至100μM;
其中FFFFF为酶切位点保护碱基,EEEEE为酶切位点,DDDDD为定位标签序列,NNNNNNNNNNNN为随机分子标签序列,所述I5index序列选自SEQ ID NO:5-12;所述I7index序列选自SEQ ID NO:12-23。
同时FFFFF/DDDDD/EEEEE/包括但不限于5个相同碱基;NNNNNNNNNNNN为4到12个随机碱基,且没有四个连续相同的碱基。
单定位双标签接头的制备方法的步骤如下(如图3所示):
(1)退火:在0.2mL EP管中配制以下体系:接头引物P5:10μL,接头引物P7:10μL,NEB buffer2:3μL,ddH2O:7μL;共30μL。将此体系在PCR仪上进行退火反应:95℃,5min;95℃-24℃0.2-0.5℃/s梯度降温;24℃维持;
(2)扩增退火片段:在原PCR管中加入:10×NEB buffer:2μL,10mM dNTP mix:5μL,ddH2O:8μL,Klenow exo-(5U/μL):5μL,共50μL,混匀后,37℃放置1h。
(3)第一次沉淀:向步骤(2)所得产物中加入1/10体积的NaAC(3M)和2.5倍体积的无水乙醇,混匀后置于-20℃2h;13000g离心30min;去上清,加入600μL 70%乙醇漂洗沉淀,4℃,13000g离心30min;去上清,室温晾干DNA 5-10min,用30μL ddH2O 重悬DNA。
(4)酶解(以HpyCH4III内切酶为例,酶切位点:ACNGT,相应的接头引物P7序列EEEEE则为ACAGT):取步骤(3)所得产物30μL,加入10×NEB CutSmart buffer:5μL,ddH2O:10μL,HpyCH4III(5U/μL):5μL,共50μL,混匀后,37℃酶解16h。
(5)第二次沉淀:向步骤(4)所得产物中加入1/10体积的NaAC(3M)和2.5倍体积的无水乙醇,混匀后置于-20℃2h;4℃,13000g离心30min;去上清,加入600μL 70%乙醇漂洗沉淀,4℃,13000g离心30min;去上清,室温晾干DNA 5-10min,用26μL TE low buffer重悬DNA,即为最终单定位双标签接头(25μM,结构如图2所示),5μL分装,-80℃冻存备用。
实施例2:单定位双标签接头血浆DNA突变率检测
本实施例中:使用实施例1制备的单定位双标签接头,其保护碱基为TCTTCT;酶切位点序列为
Figure PCTCN2017099255-appb-000003
(方框内为定位碱基,酶切位点和定位碱基部分重叠);分子标签为BDHVBDHV。
I5index序列和I7index序列的组合可以是:I501-I701,I502-I702,I503-I703,I504-I704,I505-I705,I506-I706,I507-I707,I508-I708,I501-I707,I502-I708,I503-I709,I504-I710。(序号所对应的碱基序列见表1)
样本的选取和质控:取5份肺癌病人血浆样本,使用QIAGEN血浆DNA提取试剂盒提取血浆DNA,使用分光光度计测定DNA样品纯度(要求A260/280在1.8-20之间);然后使用Qubit2.0测定DNA浓度(总量在5-15ng之间),使用D1000 chip(安捷伦)检测DNA样本片段分布(160-200bp左右),使用数字PCR(Bio-rad)测定肿瘤样本EGFR基因T790M位点突变率(1.9%,0.8%,0.18%,0.12%和1.44%)。
文库构建:使用KAPA DNA建库试剂盒建库,所有DNA样本全部用来建库。
KAPA HTP Library Preparation Kit
Figure PCTCN2017099255-appb-000004
platforms,以下试验所用的末端修复酶,末端修复buffer等均来自于该试剂盒。
DNA样本末端修复(加入7μL 10×末端修复buffer,5μL末端修复酶,20℃,30min),产物纯化后用A-taling酶加A尾(5μL 10×末端修复buffer,3μL末端修复酶30℃,30min),产物纯化后均分成两份,在加接头步骤中分别使用实施例2制备的单定位双标签接头(按 照10:1的摩尔比向加A尾的片段中加入单定位双标签接头)建库(如图4所示)或普通的建库接头(其序列如SEQ ID N:24和25所示),加入10μL 5×连接buffer+5μL T4 DNA连接酶,20℃,20min连接),连接产物经两步1×Ampure磁珠纯化,纯化产物使用KAPA高保真酶mix(25μL)及上下游扩增引物(25μM)各1μL进行扩增;
其中加入普通的建库接头,作为对照。实验组为加入实施例2制备的单定位双标签接头;对照与实验组的步骤相同,只是用的接头序列不同。
普通的建库接头样品组使用的上下游扩增引物组合为通用引物(SEQ ID NO:5)和Index引物(SEQ ID NO:6),实施例2制备的单定位双标签接头样品组使用的上下游引物组合为PCR-P5引物+PCR-P7引物;
普通的建库接头序列信息:
Figure PCTCN2017099255-appb-000005
普通的建库接头对应的上下游引物序列:
通用引物:
Figure PCTCN2017099255-appb-000006
(-s-表示硫代,以下均同此)SEQ ID NO:26
Index引物:SEQ ID NO:27通过I7index序列连接SEQ ID NO:28所得序列所示,
Figure PCTCN2017099255-appb-000007
其中I7Index序列选自SEQ ID NO:12~23。
实施例1制备的单定位双标签接头对应的P5引物和P7引物序列:当使用实施例1制备的单定位双标签接头加完接头后进行PCR,则用以下引物序列:
Figure PCTCN2017099255-appb-000008
捕获:按照Roche SeqCap EZ custom kit(250k)进行文库靶向捕获,捕获文库质检合格后(安捷伦2100/2200判断文库片段大小分布,如建库时插入片段(模板)大小为200-350bp,加上两端接头即P5,P7后增加140bp,文库大小分布应该在340bp-490bp;QPCR判断捕获效果——平均富集倍数小于10时说明捕获失败,需要重新捕获)进行测序。
结果:每个样本的测序深度为20000×,测序后得到的样品raw data为8.20G,clean data Q20为94.25%,Q30为0,3%,mapping rate为99.9%,coverage为99.89%;检出结果方面,普通接头样本组可以准确检测出1.9%,1.44%两个样本的突变位点,而所述单定位双标签接头样本组可以检测出1.9%,0.8%,0.18%,0.12%和1.44%所有样本突变位点(根据建库前的样本数字PCR检测到的突变位点及突变率信息,高通量测序的数据通过软件分析(FastQC,samtools,BWA/bowtie2,GATK,Freebayes/picard等)分析这些位点是否有突变及突变率,与数字PCR的结果进行比较,确定检出率),检出率为100%(跟数字PCR的检测结果比较,假如数字PCR在这5个样本中检测到10个低频突变位点,如果高通量测序可以检测到全部10个位点,那么检出率就是100%,如果检测到5个位点,那么检出率就是50%)。
实施例3:单定位双标签接头细胞系突变率检测
选择NCI-H1650和HCT两个细胞系DNA做为实验材料,NCI-H1650细胞DNA分别按照10%,1%,0.1%的质量比例掺入到HCT细胞DNA中,另外NCI-H1650和HCT细胞100%DNA分别做为两个样本,分别对应记为10%,1%,0.1%,NCI-H1650和HCT组。(NCI-H1650和HCT组只是为了确定用来混比例的细胞系DNA的遗传背景——即等位基因位点信息,如杂合纯合等,通过这两个样本的测序信息,找出一些纯合碱基位点,然后挑出同一位点碱基不同的位点做为其它样本组的分析统计位点)。
DNA样本充分混匀后各取2ug进行DNA文库制备(KAPA DNA建库试剂盒),其中片段加A尾步骤之后10%,1%,0.1%样本各均分为两份后分为两组,分别在加接头步骤中添加普通接头(如SEQ ID NO:3和SEQ ID NO:4所示序列)和实施例1制备的单定位双标签接头(如图5所示),然后进行后续的文库制备步骤和捕获步骤,捕获使用Roche SeqCap EZ custom kit(250k),最后进行上机测序,测序深度为20000×,测序结果以过滤的Q30unique mapping reads进行SNP检出。
其中单定位双标签接头中FFFFF为TCTTCT,EEEEE为ACAGT;DDDDD为AGT;与上述的EEEEE序列重叠。
NNNNNNNNNNNN为BDHVBDHV所示序列,其中B表示该位置是除A外的碱基,D位置表示该位置是除C外的碱基,H表示该位置是除G外的碱基,V表示该位置是除T外的碱基。
结果:首先分析NCI-H1650和HCT两个样本的数据,根据SNP检出信息找到在 Roche捕获芯片250K bp捕获区域中的碱基MAF(次要等位碱基频率),筛选出MAF为0%的碱基位点(SNP纯合阴性)及100%的碱基位点(SNP纯合阳性)(实际判断标准为规定一个阈值,如0.1%,如果某个位点的MAF值低于0.1%即认为该位点为0%碱基位点,即SNP纯合阴性位点;100%位点依次类推);筛选出两个细胞系中对应的位点(基因组中同一位置)一个为纯合阳性,另一个为纯合阴性的位点,这些位点做为后续的其它样本组的分析位点分样本统计检出率及假阳性、假阴性等信息。
NCI-H1650和HCT组(100%)总共检测出178个纯合等位SNP位点(即每个位点在一个细胞系中为纯合阴性,在另一个细胞系中为纯合阳性),然后不同接头的10%,1%,0.1%样本分别分析这178个位点,178个位点在不同比例样本中的突变率(杂合比例)分别为10%,1%和0.1%,结果显示普通接头在10%样本组的阳性检出率为100%,在1%组的检出率为98.86%,在0.1%组的检出率为81.29%;实施例1制备的单定位双标签接头在10%,1%和0.1%组的检出率均为100%;假阳性率:在1%的灵敏度下,普通接头的假阳性率为0.01%,在0.1%灵敏度下,普通接头的假阳性率在5%以上;而实施例1制备的单定位双标签接头的假阳性率在0.1%灵敏度下则为0.001%(灵敏度值超过某个阈值的碱基变异频率的位点即认为是检出的突变位点,例如1%灵敏度是指碱基变异频率阈值为1%,大于1%的位点认为就是检测出的突变位点)。
实施例4:多定位双标签接头组制备:
将接头引物P5、5’端均修饰有生物素的接头引物P7-A、接头引物P7-B和接头引物P7-C(合成厂家:生工生物工程(上海)股份有限公司)分别用ddH2O稀释至100μM;
接头引物P5为SEQ ID NO:01通过I5index序列连接SEQ ID NO:02所示序列所得,
接头引物P7-A为FFFFFEEEEEJJJJJNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
接头引物P7-B为FFFFFEEEEEKKKKKNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
接头引物P7-C为FFFFFEEEEELLLLLNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
所述FFFFF为酶切位点保护碱基,EEEEE为酶切位点,JJJJJ、KKKKK和LLLLL为定位标签序列且JJJJJ、KKKKK和LLLLL各不相同,NNNNNNNNNNNN为随机分子 标签序列;FFFFF、JJJJJ、KKKKK、LLLLL和EEEEE包括但不限于5个相同碱基,I7index序列为6~8个碱基;NNNNNNNNNNNN为4到12个随机碱基,且没有四个连续相同的碱基。
优选的,所述NNNNNNNNNNNN表示为BDHVBDHV,其中B表示该位置是除A外的碱基,D位置表示该位置是除C外的碱基,H表示该位置是除G外的碱基,V表示该位置是除T外的碱基。所述I5index序列选自SEQ ID NO:05~12;所述I7index序列选自SEQ ID NO:13~23。
任选的,所述JJJJJ、KKKKK和LLLLL的序列可以和所述EEEEE的序列部分重叠,也可以完全重叠,当部分或者完全重叠的时候,重叠部分的碱基只出现一次。
该多定位双标签接头组的制备方法的步骤如下(图6):
(1)退火:将接头引物P5、接头引物P7-A、接头引物P7-B、接头引物P7-C及缓冲液和适量去离子水混合后,进行退火处理得到退火接头A、退火接头B和退火接头C,具体的:在15mL离心管中配制以下体系:接头引物P5:1mL,接头引物P7-A:334μL,接头引物P7-B:334μL,接头引物P7-C:334μL,NEB buffer2:300μL,ddH2O:700μL;共3mL。将此体系混匀后进行如下反应:水浴锅中95℃,5min;然后立刻放入装有95℃热水的烧杯中,室温条件下缓慢降温至24-27℃;
(2)延伸退火接头:对所得退火接头A、退火接头B和退火接头C进行聚合酶延伸得到延伸接头A、延伸接头B和延伸接头C,具体的:在原15mL离心管中加入:10×NEB buffer:200μL,25mM dNTP mix:200μL,500mM DTT:6μL,Klenow exo-(5U/μL):100μL,用ddH2O补足体积至5mL,混匀后,37℃恒温箱中旋转混匀,孵育1h;
(3)第一次沉淀:对所得延伸接头A、延伸接头B和延伸接头C分别进行乙醇或异丙醇沉淀纯化得到纯化后的延伸接头A、延伸接头B和延伸接头C,具体的:向步骤(2)所得产物中加入1/10体积的NaAC(3M)和2.5倍体积的无水乙醇,混匀后置于-20℃2h;13000g离心30min;去上清,加入5mL 70体积%乙醇漂洗沉淀,4℃,13000g离心30min;去上清,室温晾干DNA 20-30min,用3mL ddH2O重悬DNA,并用Quantus测浓度;
(4)酶切:对纯化后的延伸接头A、延伸接头B和延伸接头C分别加入能够产生3’T突出末端的限制性内切酶进行酶切得到酶切接头A、酶切接头B和酶切接头C,具体的:(以HpyCH4III内切酶为例,酶切位点:ACNGT,相应的接头引物P7序列EEEEE改为 ACAGT):取上述步骤(3)所得产物,根据其质量x(ug)加入10×NEB CutSmart buffer:2xμL,HpyCH4III(5U/μL):2xμL,用ddH2O补足体积至20xμL,混匀后,37℃恒温箱中旋转孵育,酶解16h;
(5)第二次沉淀:将所得酶切接头A、酶切接头B和酶切接头C进行乙醇或异丙醇沉淀纯化后即得到双标签接头A、双标签接头B和双标签接头C,具体的:向步骤(4)所得的产物中加入1/10体积的NaAC和2.5倍体积的无水乙醇,混匀后置于-20℃,2h;4℃,13000g离心30min;去上清,加入10mL 70%乙醇漂洗沉淀,4℃,13000g离心30min;去上清,室温晾干DNA 20-30min,用2mL ddH2O重悬;
(6)生物素纯化:将步骤(5)所得的双标签接头A、双标签接头B和双标签接头C进行生物素亲和纯化,具体的:取2mL Dynabeads MyOne Streptavidin C1磁珠,使用1×B&W buffer磁珠漂洗后用2mL 2×B&W buffer重悬磁珠,将2mL步骤(5)所得产物加入到磁珠中,4℃孵育30min,磁力架上静置,取上清至一新的50mL离心管中;
(7)第三次沉淀:将步骤(6)所得的产物进行乙醇或异丙醇沉淀纯化后即得到所述多定位双标签接头组,具体的:向步骤(6)所得的产物中加入1/10体积的NaAC和2.5倍体积的无水乙醇,混匀后置于-20℃2h;4℃,13000g离心30min;去上清,加入10mL70%乙醇漂洗沉淀,4℃,13000g离心30min;去上清,室温晾干DNA 20-30min,用1.5mL TE low buffer重悬,即为所述多定位双标签接头组,所述多定位双标签接头组经质控合格后进行分装,-20℃冻存备用。
实施例5:所述多定位双标签接头组测序PF值改善情况
以30ng打断后白细胞DNA(平均长度220bp)起始建库,实验分两组,一组使用实施例1制备的单定位双标签接头进行文库构建,另一组使用经实施例4制备的多定位双标签接头组建库。建库试剂盒使用NEBNext Ultra II DNA Library Prep Kit,文库构建步骤如下:
(1)取30ng打断DNA加入7μL NEBNext Ultra II End Prep Reaction Buffer和3μL NEBNext Ultra II End Prep Enzyme Mix,用去离子水补足体积至60μL,PCR仪上20℃,30min→65℃,30min→4℃维持;
(2)上述体系加入1μL接头(上述单定位双标签接头或上述多定位双标签接头组),然后加入30μL NEBNext Ultra II Ligation Master Mix和1μL NEBNext Ligation Enhancer,混匀后20℃反应15min。连接产物使用0.9×Ampure磁珠进行纯化,纯化后使用23μL纯化水洗脱;
(3)PCR管中加入23μL上述连接产物,I5、I7index引物各1μL(25μM),和25μL NEBNext Ultra IIQ5 Master Mix,混匀后PCR仪上进行如下反应:
98℃,30s;
98℃,10s→65℃,75s(8个循环);
65℃,5min;
4℃维持
PCR完成后使用0.9×磁珠进行纯化,然后使用Qubit2.0(或Quantus)和Agilent 2100bioanalyzer(或Agilent 2200TapeStation)进行质控;
构建的文库质控合格后使用NextSeq500平台进行测序,测序试剂为Mid Output kit(300cycles),Phix掺入比例为1%,每个文库单独进行上机测序。实验重复3次,并分别上机测序,测序平台为NextSeq500,测序试剂为Mid Output kit(300cycles),Phix掺入比例为1%,测序结果质控如下:
实验组 上机簇密度 Phix比例(%) PF值 Q30
单定位双标签接头文库-1 190K/mm2 1.2% 33.80% 90.4%
单定位双标签接头文库-2 186K/mm2 0.8% 31.50% 85.8%
单定位双标签接头文库-3 200K/mm2 1.5% 33.20% 82.0%
多定位双标签接头组文库-1 200K/mm2 0.9% 87% 88.6%
多定位双标签接头组文库-2 181K/mm2 1.2% 90% 87.3%
多定位双标签接头组文库-3 210K/mm2 1.3% 91% 90.1%
实施例6:生物素纯化后接头建库接头序列残余:
以30ng打断后白细胞DNA(平均长度220bp)起始建库,实验分两组,一组使用实施例1制备的单定位双标签接头进行文库构建,另一组使用经实施例4制备的多定位双标签接头组建库,建库试剂盒使用NEBNext Ultra II DNA Library Prep Kit,文库构建步骤如下:
(1)取30ng打断DNA加入7μL NEBNext Ultra II End Prep Reaction Buffer和3μL NEBNext  Ultra II End Prep Enzyme Mix,用去离子水补足体积至60μL,PCR仪上20℃,30min→65℃,30min→4℃维持;
(2)上述体系加入1μL接头(上述单定位双标签接头或上述多定位双标签接头组),然后加入30μL NEBNext Ultra II Ligation Master Mix和1μL NEBNext Ligation Enhancer,混匀后20℃反应15min。连接产物使用0.9×Ampure磁珠进行纯化,纯化后使用23μL纯化水洗脱;
(3)PCR管中加入23μL上述连接产物,I5、I7index引物各1μL(25μM),和25μL NEBNext Ultra IIQ5 Master Mix,混匀后PCR仪上进行如下反应:
98℃,30s;
98℃,10s→65℃,75s(8个循环);
65℃,5min;
4℃维持
PCR完成后使用0.9×磁珠进行纯化,然后使用Qubit2.0(或Quantus)和Agilent 2100 bioanalyzer(或Agilent 2200 TapeStation)进行质控;
构建的文库质控合格后使用NextSeq500平台进行测序,测序试剂为Mid Output kit(300cycles),Phix掺入比例为1%,每个文库数据量为1Gb,测序结果如下:
Figure PCTCN2017099255-appb-000009
以上所述,仅为本发明的较佳实施例而已,故不能依此限定本发明实施的范围,即依本发明专利范围及说明书内容所作的等效变化与修饰,皆应仍属本发明涵盖的范围内。
工业实用性
本发明本发明提供一种用于检测基因突变的多定位双标签接头组及其制备方法与具体应用。可以准确检测1×10-5基因突变率,有效提高基因突变检测的灵敏度,结合高通量测序的通量,一次测序可以检测多个基因的多个突变位点。
Figure PCTCN2017099255-appb-000010
Figure PCTCN2017099255-appb-000011
Figure PCTCN2017099255-appb-000012
Figure PCTCN2017099255-appb-000013
Figure PCTCN2017099255-appb-000014
Figure PCTCN2017099255-appb-000015
Figure PCTCN2017099255-appb-000016

Claims (7)

  1. 一种用于检测基因突变的多定位双标签接头组,其特征在于:包括双标签接头A、双标签接头B和双标签接头C,双标签接头A、双标签接头B和双标签接头C分别由接头引物P5分别与5’端均修饰有生物素的接头引物P7-A、接头引物P7-B和接头引物P7-C合成而得到,其中:
    接头引物P5为SEQ ID NO:01通过I5index序列连接SEQ ID NO:02所示序列所得,
    接头引物P7-A为FFFFFEEEEEJJJJJNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
    接头引物P7-B为FFFFFEEEEEKKKKKNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
    接头引物P7-C为FFFFFEEEEELLLLLNNNNNNNNNNNN依次连接在SEQ ID NO:03的5’端,SEQ ID NO:03再通过I7index序列连接SEQ ID NO:04所示序列所得;
    所述FFFFF为酶切位点保护碱基,EEEEE为酶切位点,JJJJJ、KKKKK和LLLLL为定位标签序列且JJJJJ、KKKKK和LLLLL各不相同,NNNNNNNNNNNN为随机分子标签序列;FFFFF、JJJJJ、KKKKK、LLLLL和EEEEE包括但不限于5个相同碱基,I7index序列为6~8个碱基;NNNNNNNNNNNN为4到12个随机碱基,且没有四个连续相同的碱基。
  2. 如权利要求1所述用于检测基因突变的多定位双标签接头组,其特征在于:所述NNNNNNNNNNNN表示为BDHVBDHV,其中B表示该位置是除A外的碱基,D位置表示该位置是除C外的碱基,H表示该位置是除G外的碱基,V表示该位置是除T外的碱基。
  3. 如权利要求1所述用于检测基因突变的多定位双标签接头组,其特征在于:所述I5index序列选自SEQ ID NO:05~12;所述I7index序列选自SEQ ID  NO:13~23;所述JJJJJ、KKKKK和LLLLL的序列可以和所述EEEEE的序列部分重叠或完全重叠,当部分或完全重叠的时候,重叠部分的碱基只出现一次。
  4. 一种权利要求1至3中任一项权利要求所述的用于检测基因突变的多定位双标签接头组的制备方法,其特征在于:包括如下步骤:
    (1)退火:将接头引物P5、接头引物P7-A、接头引物P7-B、接头引物P7-C及缓冲液和适量去离子水混合后,进行退火处理得到退火接头A、退火接头B和退火接头C;
    (2)延伸退火接头:对所得退火接头A、退火接头B和退火接头C进行聚合酶延伸得到延伸接头A、延伸接头B和延伸接头C;
    (3)第一次沉淀:对所得延伸接头A、延伸接头B和延伸接头C分别进行乙醇或异丙醇沉淀纯化得到纯化后的延伸接头A、延伸接头B和延伸接头C;
    (4)酶切:对纯化后的延伸接头A、延伸接头B和延伸接头C分别加入能够产生3’T突出末端的限制性内切酶进行酶切得到酶切接头A、酶切接头B和酶切接头C;
    (5)第二次沉淀:将所得酶切接头A、酶切接头B和酶切接头C进行乙醇或异丙醇沉淀纯化后即得到双标签接头A、双标签接头B和双标签接头C;
    (6)生物素纯化:将步骤(5)所得的双标签接头A、双标签接头B和双标签接头C进行生物素亲和纯化;
    (7)第三次沉淀:将步骤(6)所得的产物进行乙醇或异丙醇沉淀纯化后即得到所述多定位双标签接头组。
  5. 一种文库构建方法,其特征在于:包括:将10ng-1ug的待检测DNA打断成200-500bp的DNA片段后,DNA片段加入末端修复酶进行末端修复,加A尾,加入权利要求1至3中任一项权利要求所述的多定位双标签接头组进行连接,连接完成后使用Ampure磁珠或切胶进行340-660bp的片段选择。
  6. 一种测序方法,其特征在于:包括如下步骤:
    (1)用如权利要求5所述的文库构建方法构建文库;
    (2)对所述测序文库进行测序。
  7. 一种确定核酸序列的方法,其特征在于:包括如下步骤:
    (1)用如权利要求5所述的文库构建方法构建文库;
    (2)对所述测序文库进行测序;
    (3)根据测序结果进行结果判定;
    所述结果判定的方法包括如下步骤:
    a、根据设置的参数选取比对后的碱基Q值大于30的测序唯一匹配序列;
    b、根据随机标签序列进行Duplication判定,从而进行碱基的重新校正;
    c、使用SNP calling软件进行SNP位点检测,统计SNP位点信息,最终得到的SNP位点及对应的MAF信息;
    d、对检测到的SNP位点及MAF信息和对照组的突变位点以及群体基因组变异信息库进行比较,过滤掉相同的突变位点,最终留下的突变位点信息即为最终检测到的突变位点信息。
PCT/CN2017/099255 2016-08-29 2017-08-28 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用 WO2018041062A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2018558751A JP6830496B2 (ja) 2016-08-29 2017-08-28 遺伝子突然変異を検出するマルチポジショニングダブルタグアダプターセット、及びその調製方法と応用
US16/322,340 US11286524B2 (en) 2016-08-29 2017-08-28 Multi-position double-tag connector set for detecting gene mutation and preparation method therefor and application thereof
EP17845364.3A EP3505640A4 (en) 2016-08-29 2017-08-28 MULTI-POSITION DUAL LABEL CONNECTOR ASSEMBLY FOR DETECTION OF GENE MUTATION AND PREPARATION METHOD AND APPLICATION THEREOF

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610754636.4 2016-08-29
CN201610754636.4A CN106367485B (zh) 2016-08-29 2016-08-29 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用

Publications (1)

Publication Number Publication Date
WO2018041062A1 true WO2018041062A1 (zh) 2018-03-08

Family

ID=57900571

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/099255 WO2018041062A1 (zh) 2016-08-29 2017-08-28 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用

Country Status (5)

Country Link
US (1) US11286524B2 (zh)
EP (1) EP3505640A4 (zh)
JP (1) JP6830496B2 (zh)
CN (1) CN106367485B (zh)
WO (1) WO2018041062A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110331187A (zh) * 2019-08-12 2019-10-15 天津华大医学检验所有限公司 组合标签、组合标签接头及其应用
WO2020072829A3 (en) * 2018-10-04 2020-08-13 Bluestar Genomics, Inc. Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample
JP2022502343A (ja) * 2018-07-03 2022-01-11 中国医学科学院腫瘤医院 肝癌によく見られた複数の変異を同時に検出するctDNAライブラリーの構築及びシークエンシングデータの分析の方法
US11286524B2 (en) 2016-08-29 2022-03-29 Amoy Diagnostics Co., Ltd. Multi-position double-tag connector set for detecting gene mutation and preparation method therefor and application thereof

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106906211B (zh) * 2017-04-13 2020-11-20 苏州普瑞迈德医学检验所有限公司 一种分子接头及其应用
CN107190067B (zh) * 2017-06-13 2019-12-13 厦门艾德生物医药科技股份有限公司 一种改进的二代测序用随机标签接头制作方法
CN107385030A (zh) * 2017-07-14 2017-11-24 广州精科医学检验所有限公司 分子标签、接头及确定含有低频突变核酸序列的方法
EP3707273B1 (en) * 2017-11-08 2024-04-10 Twinstrand Biosciences, Inc. Reagents and adapters for nucleic acid sequencing and methods for making such reagents and adapters
CN107944223B (zh) * 2017-11-10 2019-12-31 深圳裕策生物科技有限公司 基于二代测序的点突变检测过滤方法、装置和存储介质
CN108893466B (zh) * 2018-06-04 2021-04-13 上海奥根诊断技术有限公司 测序接头、测序接头组和超低频突变的检测方法
JP2021530219A (ja) 2018-07-12 2021-11-11 ツインストランド・バイオサイエンシズ・インコーポレイテッドTwinstrand Biosciences, Inc. ゲノム編集、クローン増殖、および関連用途を特徴付けるための方法および試薬
CN109735900A (zh) * 2019-03-20 2019-05-10 嘉兴菲沙基因信息有限公司 一种适用于Hi-C的小片段DNA文库构建方法
CN111748613A (zh) * 2019-03-27 2020-10-09 华大数极生物科技(深圳)有限公司 一种双标签接头设计方法及制备方法
CN110129415B (zh) * 2019-05-17 2023-08-18 迈杰转化医学研究(苏州)有限公司 一种ngs建库分子接头及其制备方法和用途
CN110257480A (zh) * 2019-07-04 2019-09-20 北京京诺玛特科技有限公司 核酸序列测序接头及其构建测序文库的方法
CN110409001B (zh) * 2019-07-25 2022-11-15 北京贝瑞和康生物技术有限公司 一种构建捕获文库的方法和试剂盒
CN113257351A (zh) * 2020-02-12 2021-08-13 赛纳生物科技(北京)有限公司 一种用于多碱基基因测序的基因文库及其构建方法
CN111471746A (zh) * 2020-04-14 2020-07-31 深圳市新合生物医疗科技有限公司 检测低突变丰度样本的ngs文库制备接头及其制备方法
CN112226514B (zh) * 2020-11-23 2021-08-03 苏州京脉生物科技有限公司 用于早期胃癌检测的标志物组合、试剂盒及其应用
CN113005188A (zh) * 2020-12-29 2021-06-22 阅尔基因技术(苏州)有限公司 用一代测序评估样本dna中碱基损伤、错配和变异的方法
CN113981043B (zh) * 2021-11-22 2024-04-16 广州迈景基因医学科技有限公司 一种制备二代测序接头的方法

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007052006A1 (en) * 2005-11-01 2007-05-10 Solexa Limited Method of preparing libraries of template polynucleotides
CN103882530A (zh) * 2014-03-26 2014-06-25 清华大学 用随机序列标记质粒对dna片段进行高通量两端测序的方法
CN104263726A (zh) * 2014-09-25 2015-01-07 天津诺禾致源生物信息科技有限公司 适用于扩增子测序文库构建的引物及扩增子测序文库的构建方法
CN104561294A (zh) * 2014-12-26 2015-04-29 北京诺禾致源生物信息科技有限公司 基因分型测序文库的构建方法和测序方法
CN105586427A (zh) * 2016-03-10 2016-05-18 厦门艾德生物医药科技股份有限公司 检测人类brca1和brca2基因突变的引物、试剂盒及方法
KR101651817B1 (ko) * 2015-10-28 2016-08-29 대한민국 Ngs 라이브러리 제작용 프라이머 세트 및 이를 이용한 ngs 라이브러리 제작방법 및 키트
CN106086162A (zh) * 2015-11-09 2016-11-09 厦门艾德生物医药科技股份有限公司 一种用于检测肿瘤突变的双标签接头序列及检测方法
CN106282177A (zh) * 2016-08-23 2017-01-04 厦门基源医疗科技有限公司 一种16S rDNA高通量测序文库的构建方法
CN106367485A (zh) * 2016-08-29 2017-02-01 厦门艾德生物医药科技股份有限公司 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
CN106893774A (zh) * 2017-01-22 2017-06-27 苏州首度基因科技有限责任公司 用多分子标签检测dna变异水平的方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10027218A1 (de) * 2000-05-31 2001-12-06 Hubert Bernauer Artifizielle genetische Markierung mit synthetischer DNA
CN101967476B (zh) * 2010-09-21 2012-11-14 深圳华大基因科技有限公司 一种基于接头连接的DNA PCR-Free标签文库构建方法
CN102978205B (zh) * 2012-11-19 2014-08-20 北京诺禾致源生物信息科技有限公司 一种应用于标记开发的高通量测序的接头及其运用方法
CN103667273B (zh) * 2013-12-05 2016-01-20 北京诺禾致源生物信息科技有限公司 双链接头、其应用及构建末端配对dna文库的方法
AU2015222268B2 (en) * 2014-02-26 2017-06-29 Ventana Medical Systems, Inc. Photo-selective method for biological sample analysis field
CN104313699A (zh) 2014-10-31 2015-01-28 天津诺禾致源生物信息科技有限公司 测序文库的构建方法及用于测序文库构建的试剂盒

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007052006A1 (en) * 2005-11-01 2007-05-10 Solexa Limited Method of preparing libraries of template polynucleotides
CN103882530A (zh) * 2014-03-26 2014-06-25 清华大学 用随机序列标记质粒对dna片段进行高通量两端测序的方法
CN104263726A (zh) * 2014-09-25 2015-01-07 天津诺禾致源生物信息科技有限公司 适用于扩增子测序文库构建的引物及扩增子测序文库的构建方法
CN104561294A (zh) * 2014-12-26 2015-04-29 北京诺禾致源生物信息科技有限公司 基因分型测序文库的构建方法和测序方法
KR101651817B1 (ko) * 2015-10-28 2016-08-29 대한민국 Ngs 라이브러리 제작용 프라이머 세트 및 이를 이용한 ngs 라이브러리 제작방법 및 키트
CN106086162A (zh) * 2015-11-09 2016-11-09 厦门艾德生物医药科技股份有限公司 一种用于检测肿瘤突变的双标签接头序列及检测方法
CN105586427A (zh) * 2016-03-10 2016-05-18 厦门艾德生物医药科技股份有限公司 检测人类brca1和brca2基因突变的引物、试剂盒及方法
CN106282177A (zh) * 2016-08-23 2017-01-04 厦门基源医疗科技有限公司 一种16S rDNA高通量测序文库的构建方法
CN106367485A (zh) * 2016-08-29 2017-02-01 厦门艾德生物医药科技股份有限公司 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
CN106893774A (zh) * 2017-01-22 2017-06-27 苏州首度基因科技有限责任公司 用多分子标签检测dna变异水平的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Manual NEBNext Multiplex Oligos for Illumina (Dual Index Primers Set 1) E7600S, E7600L", NEW ENGLAND BIOLABS, INC, 31 December 2016 (2016-12-31), XP055587420, Retrieved from the Internet <URL:https://www.neb.com/-/media/catalog/Datacards%20or%20Manuals/manualE7600.pdf> *
See also references of EP3505640A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11286524B2 (en) 2016-08-29 2022-03-29 Amoy Diagnostics Co., Ltd. Multi-position double-tag connector set for detecting gene mutation and preparation method therefor and application thereof
JP2022502343A (ja) * 2018-07-03 2022-01-11 中国医学科学院腫瘤医院 肝癌によく見られた複数の変異を同時に検出するctDNAライブラリーの構築及びシークエンシングデータの分析の方法
JP7336461B2 (ja) 2018-07-03 2023-08-31 中国医学科学院腫瘤医院 肝癌によく見られた複数の変異を同時に検出するctDNAライブラリーの構築及びシークエンシングデータの分析の方法
WO2020072829A3 (en) * 2018-10-04 2020-08-13 Bluestar Genomics, Inc. Simultaneous, sequencing-based analysis of proteins, nucleosomes, and cell-free nucleic acids from a single biological sample
CN110331187A (zh) * 2019-08-12 2019-10-15 天津华大医学检验所有限公司 组合标签、组合标签接头及其应用

Also Published As

Publication number Publication date
US11286524B2 (en) 2022-03-29
CN106367485B (zh) 2019-04-26
EP3505640A1 (en) 2019-07-03
JP2019523638A (ja) 2019-08-29
JP6830496B2 (ja) 2021-02-17
CN106367485A (zh) 2017-02-01
US20200010892A1 (en) 2020-01-09
EP3505640A4 (en) 2020-04-01

Similar Documents

Publication Publication Date Title
WO2018041062A1 (zh) 一种用于检测基因突变的多定位双标签接头组及其制备方法和应用
CN106086162B (zh) 一种用于检测肿瘤突变的双标签接头序列及检测方法
US10023906B2 (en) Method for constructing nucleic acid single-stranded cyclic library and reagents thereof
EP2631336B1 (en) Dna library and preparation method thereof, and method and device for detecting snps
CN106591441B (zh) 基于全基因捕获测序的α和/或β-地中海贫血突变的检测探针、方法、芯片及应用
JP2018501776A (ja) 連続性を維持した転位
CN110079592B (zh) 用于检测基因突变和已知、未知基因融合类型的高通量测序靶向捕获目标区域的探针和方法
CN104357918A (zh) 血浆游离dna文库的构建方法
CN111041563B (zh) 一种靶向序列捕获及pcr建库方法
CN108517567B (zh) 用于cfDNA建库的接头、引物组、试剂盒和建库方法
WO2015145133A1 (en) Nucleic acid preparation method
CN114616343A (zh) 用于在甲基化分区测定中分析无细胞dna的组合物和方法
CN113373524A (zh) 一种ctDNA测序标签接头、文库、检测方法和试剂盒
CN114317728B (zh) 用于检测sma中多种突变的引物组、试剂盒、方法和系统
CN111575347A (zh) 构建用于同时获得血浆中游离dna甲基化和片段化模式信息的文库的方法
CN109686404B (zh) 检测样本混淆的方法及装置
CN108359723B (zh) 一种降低深度测序错误的方法
CN105803054A (zh) 试剂盒及其在检测唇腭裂相关基因中的用途
CN111575349B (zh) 一种接头序列及其应用
US20190218606A1 (en) Methods of reducing errors in deep sequencing
CN115369159A (zh) 一种基于双端测序重叠片段和dna双链互补片段的超低频突变检测方法
CN116445581A (zh) 少突胶质细胞瘤相关基因高通量扩增子文库的制备方法、多重pcr引物对及应用
CN112301432B (zh) 一种构建全基因组高通量测序的文库的方法和试剂盒
CN114032287A (zh) Dna甲基化测序文库及其构建方法和检测方法
WO2014086037A1 (zh) 构建核酸测序文库的方法及其应用

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018558751

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17845364

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017845364

Country of ref document: EP

Effective date: 20190329