WO2017024784A1 - 一种血浆中游离的目标dna低频突变富集测序方法 - Google Patents

一种血浆中游离的目标dna低频突变富集测序方法 Download PDF

Info

Publication number
WO2017024784A1
WO2017024784A1 PCT/CN2016/074058 CN2016074058W WO2017024784A1 WO 2017024784 A1 WO2017024784 A1 WO 2017024784A1 CN 2016074058 W CN2016074058 W CN 2016074058W WO 2017024784 A1 WO2017024784 A1 WO 2017024784A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
mutation
sequence
probe
library
Prior art date
Application number
PCT/CN2016/074058
Other languages
English (en)
French (fr)
Inventor
吕小星
易鑫
赵美茹
管彦芳
刘涛
杨玲
Original Assignee
北京吉因加科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京吉因加科技有限公司 filed Critical 北京吉因加科技有限公司
Priority to US15/751,722 priority Critical patent/US11001837B2/en
Publication of WO2017024784A1 publication Critical patent/WO2017024784A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61PSPECIFIC THERAPEUTIC ACTIVITY OF CHEMICAL COMPOUNDS OR MEDICINAL PREPARATIONS
    • A61P35/00Antineoplastic agents
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1072Differential gene expression library synthesis, e.g. subtracted libraries, differential screening
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1058Directional evolution of libraries, e.g. evolution of libraries is achieved by mutagenesis and screening or selection of mixed population of organisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1065Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6853Nucleic acid amplification reactions using modified primers or templates
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/08Liquid phase synthesis, i.e. wherein all library building blocks are in liquid phase or in solution during library creation; Particular methods of cleavage from the liquid support
    • C40B50/10Liquid phase synthesis, i.e. wherein all library building blocks are in liquid phase or in solution during library creation; Particular methods of cleavage from the liquid support involving encoding steps
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1003Extracting or separating nucleic acids from biological samples, e.g. pure separation or isolation methods; Conditions, buffers or apparatuses therefor
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • C40B40/08Libraries containing RNA or DNA which encodes proteins, e.g. gene libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the invention belongs to the field of high-throughput sequencing technology of bioinformatics, and particularly relates to a low-frequency mutation enrichment sequencing method for free target DNA in plasma.
  • preMiDTM fusion mutation bias amplification ARMS real-time PCR and high-resolution melting curve analysis HRM three technologies in one, to achieve plasma micro-mutation detection of non-cellular system, but its detection sensitivity can only reach about 1%, and only for Some hotspot mutations are used for genetic analysis; the technical principle of CAPP-Seq is to combine high-throughput sequencing technology with target region capture technology for plasma ctDNA, target capture and then deep-sequence, based on relevant data filtering, Not only can more information on genetic variation be obtained, but also more than 0.2% and 98% of high-specific low-frequency mutations can be obtained, but the distance is still based on early screening of plasma ctDNA.
  • Duplex Sequencing performs positive and negative double-strand error correction based on the UID (unique identifier) tag, which can correct almost all types of sequencing errors.
  • the detected mutation frequency can reach 10 -7 , but the technology has a huge limitation. Higher sample throughput compared to conventional sequencing, and high-throughput sequencing of plasma ctDNA to address rare mutation detection of around 0.01%, huge sample demand is also a challenge.
  • the invention provides a low frequency mutation enrichment sequencing method for free target DNA in plasma to overcome the deficiencies of the prior art.
  • the invention provides a low-frequency mutation enrichment sequencing method for free target DNA in plasma, comprising the following steps:
  • the plasma described in the step (1) is derived from human peripheral blood, and the library construction method is followed by a 3-step enzymatic reaction, that is, terminal repair, and "A" and a library linker are added.
  • the primers used in the library linker are:
  • the step (2) universal library TT-COLD PCR amplification enrichment comprises the following steps: 1) determining the Tm value of the library;
  • the library Tm value is determined by the following method: a library of plasma target DNA is subjected to fluorescence quantitative PCR using a pair of primers, and a library Tm value is obtained according to a dissolution curve analysis; the sequence of the primer is:
  • the one pair of universal primers is a universal library TT-COLD PCR primer, and the nucleotide sequence thereof is:
  • Upstream primer AATGATAGCGCACCCACCGAGATCTACACTCTTTCC
  • the probe enrichment capture in the step (3) is: after the quality of the amplified library is qualified, the hybridization probe chip is used for hybridization capture, and the hybridization capture product is subjected to PCR amplification, and then performed. Sequencing on the machine;
  • the design method of the enriched probe chip is: determining the chip capture interval based on the use of the target gene, and referring to the database to which the target DNA belongs, determining at least one of the most important hot spot mutation sites within a certain base range, and targeting the bit Multiple mutation types present at the point, with several major types as reference, based on the corresponding frequency of occurrence as the proportion of total probe coverage at that site; for hotspot variation, based on the human genome reference sequence hg19
  • the probe is replaced with a probe based on a mutated base design, and the other site probes are unchanged, and the ratio of the total coverage of the hot spot mutation probe to the normal probe coverage of other regions is not less than 3:1, thereby achieving capture. Enrichment of hotspot variability.
  • step (4) positive and negative double-strand error correction low-frequency information analysis (RealSeq Pipeline) is:
  • the first 12 bp base of the sequencing sequence 1 and the first 12 bp base of the sequencing sequence 2 in the paired sequencing sequence are taken as labels, and the first label is connected to a 24 bp one with a smaller label according to the alphabetical order. Index, and select the positive and negative chains according to the arrangement of the labels.
  • step 4 screening the repeating clusters of the same DNA template obtained in step 3), and if the number of sequencing sequences of the positive strand and the reverse strand are both 2 or more, subsequent analysis is performed;
  • the base type distribution of each site in the capture region is obtained, and the target region coverage size, the average sequencing depth, the positive and negative chain intermix ratio, and the low frequency mutation rate are obtained;
  • control site mutation rate ⁇ 2% control site mutation rate ⁇ 2%; number of mutation sequencing sequences after error correction ⁇ 2; mutation prediction p value ⁇ 0.05;
  • Variation annotation the function of annotation variation, the number of variant sequencing sequences supported, the frequency of mutation, the variation of amino acids, and the variation in the existing variation database.
  • a sequence base based on both ends of the insert is used as a tag, and the insert is a DNA fragment linked to the linker primer in the library, and each segment is formed into a pair by double-end sequencing.
  • Sequencing sequence; the first 12 bp base of sequencing sequence 1 of the paired sequencing sequence and the first 12 bp base of sequencing sequence 2 are used as labels, and the alphabetical order is arranged by a small label to be connected to an index of 24 bp, and the 24 bp is used as the index.
  • Index of paired sequencing sequences The tag of sequencing sequence 1 was previously labeled as a positive strand; the tag of sequencing sequence 2 was previously labeled as an anti-strand.
  • the invention provides a low-frequency mutation enrichment sequencing kit for free target DNA in plasma, which comprises an enriched probe chip, which replaces a probe designed based on the human genome reference sequence hg19 with a mutation-based Base-designed probes, other site probes are unchanged, and the difference between the total coverage of hot-spot mutation probes and the normal probe coverage of other regions is at least 3:1;
  • the principle of designing a probe based on the target DNA mutated base is: determining the chip capture interval based on the use of the target gene, and determining at least one of the most important hot spot variability sites within a certain base range with reference to the database to which the target DNA belongs, For the various types of mutations present at this locus, several major types are used as a reference, based on the corresponding frequency of occurrence as the proportion of their total probe coverage at that locus.
  • the invention provides a plasma ctDNA low frequency mutation enrichment sequencing system, comprising the following operation unit:
  • the operation unit (1) plasma ctDNA extraction and library construction specific operation: extract 5-10mL of peripheral blood of early patients, stored in EDTA anticoagulation tube at room temperature or 4 ° C, and separate peripheral blood within 4-6 hours To obtain plasma and white blood cells, the DNA extracted by leukocytes will be used as a control for the detection of somatic mutations; the extraction and quantification of plasma cfDNA/ctDNA; 3 steps of enzymatic reaction according to the conventional database construction method: end repair, plus "A "Connected to the library connector.
  • the normal human plasma-ligated library was subjected to real-time library primers using real-time PCR, and the TM value of the library was obtained from the dissolution curve analysis;
  • the universal library of the operating unit (2) TT-COLD PCR amplification unit is based on universal primers to achieve first-order mutation enrichment amplification for all types of mutations; the nucleotide sequence of the universal primer is:
  • Upstream primer AATGATAGCGCACCCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
  • Downstream primer CAAGCAGAAGACGGCATACGAGATxxxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, where xxxxxxxx is the index tag.
  • the probe enrichment capture unit of the operation unit (3) is the second enrichment capture for the hot spot variation, the operation list
  • the probe enrichment capture of element (3) is achieved by using a self-designed tumor-enriched probe chip, followed by amplification and sequencing of the hybridization capture product.
  • the design method of the tumor-enriched probe chip is:
  • the positive and negative double strand error correction low frequency information (RealSeq Pipeline) analysis unit of the operation unit (4) is completed by the following steps:
  • the first 12 bp base of the sequencing sequence 1 of the paired sequencing sequence and the first 12 bp base of the sequencing sequence 2 are used as labels, and the alphabetical order is arranged by a small label to be connected to an index of 24 bp, and the 24 bp is used as the index.
  • the tag of sequencing sequence 1 is labeled as a positive strand before; the tag of sequencing sequence 2 is labeled as an anti-strand before.
  • step 4 screening the replication cluster of the same DNA template obtained in step 3), if the positive strand After the number of sequencing sequences of the anti-strand and the anti-strand reached more than 2 pairs, subsequent analysis was performed;
  • the base type distribution of each site in the capture region is obtained, and the target region coverage size, average sequencing depth, positive and negative chain intermix ratio, low frequency mutation rate, etc. are obtained;
  • control site mutation rate ⁇ 2% control site mutation rate ⁇ 2%; number of mutation sequencing sequences after error correction ⁇ 2; mutation prediction p value ⁇ 0.05;
  • Variation annotation the function of annotation variation, the number of variant sequencing sequences supported, the frequency of mutation, the variation of amino acids, and the variation in the existing variation database.
  • the use of the target DNA low frequency mutation enrichment sequencing method in the present invention or the plasma ctDNA low frequency mutation enrichment sequencing system provided by the present invention in the preparation of an early disease screening kit belongs to the protection scope of the present invention.
  • the disease is a tumor.
  • the low-frequency mutation enrichment sequencing method of the target DNA in plasma of the present invention or the plasma ctDNA low-frequency mutation enrichment sequencing system provided by the present invention is used for preparing a post-surgical monitoring kit.
  • the disease is a tumor.
  • the low-frequency mutation enrichment sequencing method of the target DNA in the plasma of the present invention or the plasma ctDNA low-frequency mutation enrichment sequencing system provided by the present invention is used in the preparation of a disease medication guiding kit.
  • the disease is a tumor.
  • the invention also provides an early screening chip for lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, ovarian cancer, endometrial cancer, thyroid cancer, cervical cancer, esophageal cancer and liver cancer, named For ONCOcare-ZS, the chip includes the Driver Gene, high frequency mutation gene, and 12 genes involved in cancer-related high-risk cancer, a total of 228 genes, 680Kb, a total of 5220 hotspot mutations.
  • the genes corresponding to the needle are:
  • the above-mentioned chip can be used to achieve tumors (lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, ovary) by the aforementioned low-frequency mutation enrichment sequencing method of target DNA in the present invention.
  • tumors lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, ovary
  • Early screening for cancer, endometrial cancer, thyroid cancer, cervical cancer, esophageal cancer, and liver cancer with accurate screening results and high sensitivity, can detect high specificity of 0.01% low frequency variation.
  • the invention also provides an ONCOcare-Drug for a tumor individualized medication guiding probe chip, which comprises: 12 common cancer high frequency genes, important genes in 12 cancer signaling pathways, common target drugs and chemotherapeutic drugs genes, etc. A total of 559 genes, 850KB, a total of 2400 hot target variants.
  • the genes corresponding to the probes contained in the chip are as follows:
  • 12 common tumors can be achieved by using the above-mentioned chip in the plasma low-frequency mutation enrichment sequencing method of the present invention.
  • the invention also provides postoperative monitoring chips for tumors (lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, ovarian cancer, endometrial cancer, thyroid cancer, cervical cancer, esophageal cancer, liver cancer, etc.) ONCOcare-JK, the chip includes Driver Gene, high frequency mutation gene, and important genes in 12 cancer-related signaling pathways of common high-risk cancers, a total of 508 genes, 500Kb, a total of 4800 hotspot mutations.
  • the genes corresponding to the probes contained in the chip are as follows:
  • 12 kinds of common tumors can be realized by the above-mentioned chip in the plasma low-frequency mutation enrichment sequencing method of the present invention.
  • the invention provides a low frequency mutation enrichment sequencing method (ER-seq, Enrich & Rare mutation Sequencing) for target DNA in plasma, which is a universal library TT-COLD PCR, probe enrichment capture and unique positive and negative chain error correction information.
  • Realization Technology (RealSeq Pipeline) combines three technologies to achieve efficient, simple and practical detection of low-frequency variability of plasma ctDNA.
  • the present invention has the following excellent effects: (1) High sensitivity: ER-seq adopts The unique universal library TT-COLD PCR, probe enrichment capture technology can achieve different degrees of enrichment for all mutation types and hotspot mutations, so that only 5-10mL peripheral blood samples can be needed, and the pair can be efficiently 0.01% rare mutations are detected; (2) High specificity: Based on mutation enrichment and low frequency positive and negative chain error correction analysis strategies, accurate detection of low frequency variation can be more effectively achieved, with an average specificity of over 98%; 3) High-throughput: Combined with high-throughput sequencing (NGS) target region capture sequencing, not only can one-time scans of related genes of interest, Take more comprehensive information of the examinee to obtain more accurate relevant predictions, and can carry out multiple sample tests simultaneously in a short period of time, thereby compressing costs and facilitating clinical promotion; (4) Multi-dimensional application : This method can fully exploit the application potential of plasma ctDNA, and can be used for a variety of related tumors (lung).
  • Figure 1 is a flow chart of the method of the present invention.
  • Figure 2 is a Tm value of a normal human plasma-ligated library.
  • the chemical reagents used in the examples are conventional commercially available reagents, and the technical means used in the examples are conventional means well known to those skilled in the art.
  • the sequencing device used in the embodiment of the present invention is Illumina HiSeq 2500, and the sequencing step of the present invention is not limited to the sequencing device.
  • the gene names are all officially named in NCBI-Gene.
  • Synonymous mutations of the invention are those in which a codon that represents an amino acid is mutated to another codon due to a change in a certain base, but still encodes the same amino acid.
  • the missense mutation a codon encoding an amino acid is replaced by a base, and becomes a codon encoding another amino acid, thereby changing the amino acid type and sequence of the polypeptide chain.
  • Certain missense mutations can cause the polypeptide chain to lose its original function, and many protein abnormalities are caused by missense mutations.
  • the stop codon obtains a mutation: also referred to as a nonsense mutation, which means that a codon that represents an amino acid is mutated to a stop codon due to a change in a certain base, thereby prematurely terminating the peptide chain synthesis.
  • the stop codon loss mutation according to the present invention means that the stop codon is mutated from other codons due to a change in a certain base, so that the peptide chain synthesis cannot be terminated normally.
  • Tm value of the library is determined by the following method: a library of plasma target DNA is obtained by using a pair of primers using real-time PCR, and the Tm value of the library is obtained according to the dissolution curve analysis; the sequence of the primer is:
  • Upstream primer AATGATAGCGCACCCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
  • Downstream primer CAAGCAGAAGACGGCATACGAGATxxxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, where xxxxxxxx is the index tag.
  • the one pair of universal primers is a universal library TT-COLD PCR primer, and the nucleotide sequence thereof is: upstream primer: AATGATAGCGCACCCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, and downstream primer: CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT, wherein xxxxxxxx is an index tag.
  • the one series of cycle conditions are:
  • probe enrichment capture is the quality control of the amplified library, hybridization capture using the enriched probe chip, and hybridization
  • the captured product is subjected to PCR amplification and then subjected to sequencing on the machine;
  • the design method of the enriched probe chip is: determining the chip capture interval based on the use of the target gene, and referring to the database to which the target DNA belongs, determining at least one of the most important hot spot mutation sites within a certain base range, and targeting the bit Multiple mutation types present at the point, with several major types as reference, based on the corresponding frequency of occurrence as the proportion of total probe coverage at that site; for hotspot variation, based on the human genome reference sequence hg19
  • the probe is replaced with a probe based on a mutated base design, and the other site probes are unchanged, and the ratio of the total coverage of the hot spot mutation probe to the normal probe coverage of other regions is not less than 3:1, thereby achieving capture. Enrichment of hotspot variability.
  • the first 12 bp base of sequencing sequence 1 and the first 12 bp base of sequencing sequence 2 are used as tags, and the alphabetical order is first linked to an index of 24 bp with a small tag, and the 24 bp is used as an index of the paired sequencing sequence, and sequenced.
  • the tag of sequence 1 is labeled as a positive strand before; the tag of sequencing sequence 2 is labeled as an anti-strand before;
  • step 4 screening the repeating clusters of the same DNA template obtained in step 3), and if the number of sequencing sequences of the positive strand and the reverse strand are both 2 or more, subsequent analysis is performed;
  • the base type distribution of each site in the capture region is obtained, and the target region coverage size, the average sequencing depth, the positive and negative chain intermix ratio, and the low frequency mutation rate are obtained;
  • control site mutation rate ⁇ 2% control site mutation rate ⁇ 2%; number of mutation sequencing sequences after error correction ⁇ 2; mutation prediction p value ⁇ 0.05;
  • Variation annotation the function of annotation variation, the number of variant sequencing sequences supported, the frequency of mutation, the variation of amino acids, and the variation in the existing variation database.
  • sample library cfDNA extracted from plasma, followed by a three-step enzymatic reaction according to the KAPA LTP Library Preparation Kit.
  • the linker primers are shown in the first and second strands of the linker in Table 1. Then, 50 ⁇ L of PEG/NaCl SPRI solution was added twice, and magnetic beads were purified twice, and finally 25 ⁇ L of ddH 2 O was dissolved.
  • a universal library primer was used for normal human plasma-ligated library using real-time PCR, and the reaction reagents included: KAPA HiFi HotStart ReadyMix and SYBR dye. From the dissolution curve analysis, the Tm value (DNA melting temperature) of the library was obtained as shown in FIG. 2; the universal library primers are shown in Table 1.
  • the capture interval refer to TCGA, COSMIC and other related databases to identify one of the most important hotspot mutation sites (SNV>3) in the range of 200BP; and for the multiple mutation types present at the site, The main type is used as a reference, based on its corresponding frequency of occurrence as its proportion of the total probe coverage level at that location;
  • the original REF-based probes were all replaced with mutated bases for the relevant hotspot variation, and the other probes were unchanged.
  • the library was quality-controlled and hybridization capture was performed using the above-described tumor-enriched probe chip with reference to the instructions provided by the chip manufacturer (Roche). Finally eluted back to dissolve 21 ⁇ L of ddH 2 O band hybrid eluting magnetic beads.
  • Amplification system for hybrid capture products :
  • PCR reaction conditions initial denaturation 98 ° C 45 sec; denaturation 98 ° C 15 sec, annealing 65 ° C 30 sec, extension 72 ° C 30 sec, a total of 10 cycles; 72 ° C extension 60 sec, 4 ° C preservation.
  • Primer 2 is the primer that comes with the Hiseq test platform, which is used to amplify the captured DNA template to obtain sufficient yield to meet the requirements of the machine.
  • the first 12 bp base of sequencing sequence 1 and the first 12 bp base of sequencing sequence 2 are used as tags, and the alphabetical order is first linked to an index of 24 bp with a small tag, and the 24 bp is used as an index of the paired sequencing sequence, and sequenced.
  • the tag of sequence 1 is labeled as a positive strand before; the tag of sequencing sequence 2 is labeled as an anti-strand before;
  • step 4 screening the replication cluster of the same DNA template obtained in step 3), if the number of sequencing sequences of the positive strand and the reverse strand are both two or more, then performing subsequent analysis;
  • the base type distribution of each site in the capture region is obtained, and the target region coverage size, the average sequencing depth, the positive and negative chain intermix ratio, and the low frequency mutation rate are obtained;
  • control site mutation rate ⁇ 2% control site mutation rate ⁇ 2%; number of mutation sequencing sequences after error correction ⁇ 2; mutation prediction p value ⁇ 0.05;
  • Variation annotation the function of annotation variation, the number of variant sequencing sequences supported, the frequency of mutation, the variation of amino acids, and the variation in the existing variation database.
  • the chip design is based on the principle of enriched probe chip design to complete the tumor (lung cancer, colorectal cancer, stomach cancer, breast cancer, kidney cancer, pancreatic cancer, ovarian cancer, endometrial cancer, thyroid cancer, cervical cancer, esophageal cancer and Liver cancer, etc.)
  • Early screening chip - ONCOcare - ZS which includes important genes related to common high-risk cancer Driver Gene, high-frequency mutation gene, and 12 signal pathways related to cancer, totaling 227 genes, 680Kb, a total of 5220 Hot spot variation. The list of genes is detailed in Table 3.
  • Example 4 One patient with small pulmonary nodules was sequenced and analyzed according to the method described in Example 1.
  • the probe enrichment and capture step was performed by the chip ONCOcare-ZS of the present example, and the statistical results of the sequencing data are shown in Table 4 below:
  • Positive and negative chain intermixing ratio based on the ratio of the total clusters on the clusters/3 sequencing sequences of the above three reverse sequencing sequences, to evaluate the positive and negative linkages in the available data
  • effective data utilization ratio of number of errors after sequencing based on at least 2+/2-clusters to total sequencing sequence number
  • low frequency error correction depth average coverage of bases in the target region after error correction based on valid data.
  • the ONCOcare-Drug a guide for individualized cancer.
  • the chip includes: 12 common cancer high frequency genes, important genes in 12 cancer signaling pathways, common target drugs and chemotherapy. Drug gene, etc., a total of 559 genes, 850KB, a total of 2400 hot target variants. The list of genes is shown in Table 6.
  • Example 7 One patient with advanced colorectal was analyzed according to the method described in Example 1.
  • the probe enrichment and capture step was performed by the chip ONCOcare-Drug of the present example, and the statistical results of the sequencing data are shown in Table 7 below:
  • Positive and negative chain intermixing ratio based on the ratio of the total clusters on the clusters/3 sequencing sequences of the above three reverse sequencing sequences, to evaluate the positive and negative linkages in the available data
  • effective data utilization ratio of number of errors after sequencing based on at least 2+/2-clusters to total sequencing sequence number
  • low frequency error correction depth average coverage of bases in the target region after error correction based on valid data.
  • the chip includes the Driver Gene, high frequency mutation gene, and important genes in 12 cancer-related signal pathways of common high-risk cancer, a total of 508 genes, 500Kb, a total of 4800 hotspot mutations. The list of genes is shown in Table 12.
  • Example 13 The patient was subjected to the procedure of Example 1 for 3 months after operation of the lung adenocarcinoma.
  • the probe enrichment and capture step was performed using the chip ONCOcare-JK of the present example.
  • the statistical results of the sequencing data are shown in Table 13 below:
  • Positive and negative chain intermixing ratio based on the ratio of the total clusters on the clusters/3 sequencing sequences of the above three reverse sequencing sequences, to evaluate the positive and negative linkages in the available data
  • effective data utilization ratio of number of errors after sequencing based on at least 2+/2-clusters to total sequencing sequence number
  • low frequency error correction depth average coverage of bases in the target region after error correction based on valid data.
  • the invention provides a low-frequency mutation enrichment sequencing method for free target DNA in plasma, which can realize low-frequency accurate detection of plasma DNA of 5-10mL peripheral blood samples, has simple operation, strong practicability, and has high sensitivity to 0.01% low frequency.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Medical Informatics (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Pathology (AREA)
  • Plant Pathology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)

Abstract

本发明提供了一种血浆中游离的目标DNA低频突变富集测序方法,包括血浆DNA提取与文库构建、通用文库TT COLD PCR扩增富集、探针富集捕获、捕获产物PCR及上机测序、正反双链纠错低频信息分析。

Description

一种血浆中游离的目标DNA低频突变富集测序方法 技术领域
本发明属于生物信息学高通量测序技术领域,具体涉及一种血浆中游离的目标DNA低频突变富集测序方法。
背景技术
近年来肿瘤患者血液中游离ctDNA(Cell-free Circulating Tumor DNA)的基因检测诊断已成为研究热点,研究显示血液中循环肿瘤DNA有可能成为一种新的肿瘤早期诊断,预后判断以及精确医疗的标志物。检测血液中循环游离DNA中的肿瘤标志物具有区别于传统组织肿瘤标志物的检测方式,具有无创、随时监控和早期筛查等优势,并且对循环游离DNA的取样检测避免了当前分子诊断需要采集癌组织作为标本来源的困难,是一种很有潜力的肿瘤标志物。然而在循环血中除了肿瘤游离DNA,也存在正常组织游离DNA,且因个体差异,肿瘤发生发展时期,治疗时期等原因,循环DNA的总量不定,且往往较癌组织相应频率低得多,尤其早期阶段的癌症血浆ctDNA的丰度甚至在0.01%水平,因此在血浆ctDNA的临床应用中,低频突变的精确检测是目前亟待解决的问题。
为高效实现对血浆ctDNA低频突变的精确检测以及应用潜能的充分发掘,富集扩增技术与高灵敏的检测技术的有力结合是必须的,然而目前相关技术如preMiDTM,CAPP-Seq,Duplex Sequencing等只能一定程度实现低频变异的检出,其相关实际应用或多或少仍存在一定局限性。preMiDTM融合突变偏向性扩增ARMS、荧光定量PCR和高分辨熔解曲线分析HRM 3种技术于一体,实现对非细胞体系的血浆微量突变检测,但是其检测灵敏度只能达到1%左右,而且只针对一些热点变异进行基因分析;CAPP-Seq的技术原理是将高通量测序技术与目标区域捕获技术结合起来应用于血浆ctDNA,对样本进行靶向捕获后再进行深度测序,基于相关数据过滤处理,不仅可以获得更多基因变异信息,而且可以得到0.2%以上,98%的高特异低频变异结果,但其距离基于血浆ctDNA的早期筛查, 仍具有不小的差距。Duplex Sequencing基于UID(unique identifier)标签进行正反双链纠错,几乎可以矫正所有类型的测序错误,其检测到的突变频率可以达到10-7,但是该技术存在一个巨大的限制性,其需要相对常规测序更高的测序通量,而且针对血浆ctDNA的高通量测序以解决0.01%左右的稀有突变检测,巨大的样品需求也是一个挑战。
发明内容
本发明提供一种血浆中游离的目标DNA低频突变富集测序方法以克服现有技术的不足。
本发明提供的一种血浆中游离的目标DNA低频突变富集测序方法,包括以下步骤:
(1)血浆目标DNA的提取与文库构建;
(2)通用文库TT-COLD PCR扩增富集;
(3)探针富集捕获、杂交捕获产物的扩增与上机测序;
(4)正反双链纠错低频信息分析。
本发明方法的流程图见图1。
其中,步骤(1)所述的血浆来自人类外周血,文库构建方法按照3步酶促反应,即末端修复,加“A”和文库接头连接。
文库接头使用的引物为:
接头第一链:TACACTCTTTCCCTACACGACGCTCTTCCGATCT,
接头第二链:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC。
本发明方法中,步骤(2)通用文库TT-COLD PCR扩增富集包括以下步骤:1)确定文库的Tm值;
2)绕过每个插入片段存在的特异Tc值,基于1对通用引物,在1个系列的循环条件下,对文库中所有片段上的各种突变类型进行富集;设定Tc min≈TM-2.5,之后Tc以0.5℃逐步递增,在每个Tc条件下分别进行FULL COLD PCR。
进一步地,文库Tm值通过以下方法来确定,对血浆目标DNA的文库采用一对引物使用荧光定量PCR,根据溶解曲线分析获得文库Tm值;所述引物的序列为:
上游引物:
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
下游引物:
CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
上述步骤2)中,所述1对通用引物为通用文库TT-COLD PCR引物,其核苷酸序列为:
上游引物:AATGATACGGCGACCACCGAGATCTACACTCTTTCC
CTACACGACGCTCTTCCGATCT,
下游引物:
CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
上述步骤2)中,所述1个系列循环条件为:
Figure PCTCN2016074058-appb-000001
Figure PCTCN2016074058-appb-000002
本发明方法中,步骤(3)所述探针富集捕获是将扩增后的文库质控合格后,采用富集探针芯片进行杂交捕获,并对杂交捕获产物进行PCR扩增,然后进行上机测序;
富集探针芯片的设计方法为:基于目的基因的用途确定芯片捕获区间,参考目标DNA所属的数据库,在一定碱基范围内,确定至少1个最重要的热点变异位点,同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于相应的发生频率作为其在该位点总探针覆盖水平所占的比例;针对热点变异,将基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,同时热点变异探针总覆盖度与其他区域正常探针覆盖度的差异比例不少于3:1,从而实现捕获时对热点变异的富集。
本发明方法中,步骤(4)正反双链纠错低频信息分析(RealSeq Pipeline)具体方法为:
1)基于测序结果,截取成对测序序列中的测序序列一的前12bp碱基和测序序列二的前12bp碱基作为标签,且根据字母序排列以较小的标签在前连接成24bp的一条索引,同时根据标签的排列组合方式,选定正链和反链
2)对索引进行外部排序,以达到将同一个DNA模板的所有测序重复测序序列聚集到一起的目的;
3)对聚集起来的拥有相同索引的测序序列进行中心聚类,根据其序列之间的汉明距离,将每个有相同索引的大簇聚集成若干个小簇,每个小 簇中任意两对成对测序序列的汉明距离不超过10,以达到区分开拥有相同索引却来自不同DNA模板的测序序列的目的;
4)对步骤3)中获得的同一个DNA模板的重复簇进行筛选,若正链和反链的测序序列数都达到2对以上,则进行后续分析;
5)对满足4)中条件的簇进行纠错,并产生一对无错的新测序序列.对于DNA模板的每一个测序碱基,若某种碱基型在正链的测序序列中的一致率达到80%,且在反链测序序列中的一致率也达到80%,则记新测序序列的这个碱基为此碱基型,否则记为N,这样便得到了代表原始DNA模板序列的新测序序列;
6)将新测序序列用bwa mem算法重新比对到基因组上,筛除比对质量小于30的测序序列;
7)根据6)中得到的测序序列进行统计,得到捕获区域内每个位点的碱基型分布,统计目标区域覆盖大小、平均测序深度,正反链互配率,低频突变率;
8)Call SNV/InDel/SV/CNV:根据患者样品与对照样品信息的比对,用mutect流程call somatic SNV变异;用gatk流程call somatic InDel变异;用contra.py流程call CNV;用somVar流程call SV;
所使用的筛选参数为:对照位点变异率≤2%;纠错后变异测序序列条数≥2;突变预测p值≤0.05;
9)变异注释:注释变异的功能、变异测序序列支持数、变异频率、氨基酸变异及已有变异数据库中的该变异的情况。
进一步地,上述步骤1)中,基于插入片段两端的序列碱基作为标签,所述插入片段是文库中与接头引物相连接的DNA片段,经双末端测序,每个片段将形成一对成对测序序列;将成对测序序列的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,字母序排列以较小的标签在前连接成24bp的一条索引,并且以这24bp作为成对测序序列的索引, 测序序列1的标签在前就标记成正链;测序序列2的标签在前就标记为反链。
本发明提供了一种血浆中游离的目标DNA低频突变富集测序试剂盒,其含有富集探针芯片,所述芯片上探针是将基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,且热点变异探针总覆盖度与其他区域正常探针覆盖度的差异至少为3:1;
基于目标DNA突变碱基设计探针的原则为:基于目的基因的用途确定芯片捕获区间,参考目标DNA所属的数据库,在一定碱基范围内,确定至少1个最重要的热点变异位点,同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于相应的发生频率作为其在该位点总探针覆盖水平所占的比例。
本发明提供了一种血浆中ctDNA低频突变富集测序系统,包括如下操作单元:
(1)血浆ctDNA的提取与文库构建单元;
(2)通用文库TT-COLD PCR扩增富集单元;
(3)探针富集捕获单元、杂交捕获产物的扩增单元与上机测序单元;
(4)正反双链纠错低频信息分析单元。
其中,操作单元(1)血浆ctDNA的提取与文库构建具体操作为:抽取早期患者外周血5-10mL,常温或4℃存于EDTA抗凝管中,在4-6小时内对外周血进行分离,得到血浆和白细胞,白细胞提取的DNA之后将作为对照用于体细胞突变的检出;血浆cfDNA/ctDNA的提取与定量;按照常规建库方法进行3步酶促反应:末端修复,加“A”和文库接头连接。
操作单元(2)通用文库TT-COLD PCR扩增富集的具体操作为:
基于相同的仪器和试剂,对正常人血浆连接文库采用通用文库引物使用荧光定量PCR,从溶解曲线分析,获得文库的TM值;
绕过每个插入片段存在的特异Tc值,基于1对通用引物,在1个系列的循环条件下,对文库中所有片段上的各种突变类型进行富集。该方法具 体为由经验公式给出Tc min≈TM-2.5,之后Tc以0.5℃逐步递增,在每个Tc条件下分别进行FULL COLD PCR。PCR反应程序设置,程序设置如下:
Figure PCTCN2016074058-appb-000003
操作单元(2)的通用文库TT-COLD PCR扩增富集单元基于通用引物对所有类型变异实现第一级突变富集扩增;通用引物的核苷酸序列为:
上游引物:AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
下游引物:CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
本发明提供的一种血浆中ctDNA低频突变富集测序系统中,操作单元(3)的探针富集捕获单元是针对热点变异进行第二次富集捕获,操作单 元(3)的探针富集捕获是采用自行设计的肿瘤富集探针芯片实现的,之后进行杂交捕获产物的扩增与上机测序。所述肿瘤富集探针芯片的设计方法为:
1)基于TCGA、ICGC、COSMIC等数据库和相关文献参考,参考常规芯片捕获探针设计原则,确定芯片捕获区间;
2)在捕获区间内,参考TCGA、COSMIC等相关数据库,在每200bp范围内,确定1个最重要的热点变异位点(SNV>3);同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于其相应的发生频率作为其在该位点总探针覆盖水平上所占的比例;
3)芯片设计时,针对相关热点变异,将基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,且热点变异探针总覆盖度与其他区域正常探针覆盖度的差异至少为3:1,从而实现捕获时对热点变异的富集。
本发明提供的一种血浆中ctDNA低频突变富集测序系统中,操作单元(4)的正反双链纠错低频信息(RealSeq Pipeline)分析单元是通过以下步骤完成的:
1)将成对测序序列的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,字母序排列以较小的标签在前连接成24bp的一条索引,并且以这24bp作为成对测序序列的索引,测序序列1的标签在前就标记成正链;测序序列2的标签在前就标记为反链。
2)对索引进行外部排序,以达到将同一个DNA模板的复制聚集到一起的目的;
3)对聚集起来的拥有相同索引的测序序列进行中心聚类,根据其序列之间的汉明距离,将每个有相同索引的大簇聚集成若干个小簇,每个小簇中任意两对成对测序序列的汉明距离不超过10,以达到区分开拥有相同索引却来自不同DNA模板的测序序列的目的;
4)对步骤3)中获得的同一个DNA模板的复制簇进行筛选,若正链 和反链的测序序列数都达到2对以上,则进行后续分析;
5)对满足4)中条件的簇进行纠错,并产生一对无错的新测序序列.对于DNA模板的每一个测序碱基,若某种碱基型在正链的测序序列中的一致率达到80%,且在反链测序序列中的一致率也达到80%,则记新测序序列的这个碱基为此碱基型,否则记为N,这样便得到了代表原始DNA模板序列的新测序序列;
6)将新测序序列用bwa mem算法重新比对到基因组上,筛除比对质量小于30的测序序列;
7)根据6)中得到的测序序列进行统计,得到捕获区域内每个位点的碱基型分布,统计目标区域覆盖大小、平均测序深度,正反链互配率,低频突变率等;
8)Call SNV/InDel/SV/CNV:根据患者样品与对照样品信息的比对,用mutect流程call somatic SNV变异;用gatk流程call somatic InDel变异;用contra.py流程call CNV;用somVar流程call SV;
所使用的筛选参数为:对照位点变异率≤2%;纠错后变异测序序列条数≥2;突变预测p值≤0.05;
9)变异注释:注释变异的功能、变异测序序列支持数、变异频率、氨基酸变异及已有变异数据库中的该变异的情况。
本发明的血浆中目标DNA低频突变富集测序方法或本发明提供的血浆中ctDNA低频突变富集测序系统在制备疾病早期筛查试剂盒中的应用属于本发明的保护范围。
所述的疾病为肿瘤。
本发明的血浆中目标DNA低频突变富集测序方法或本发明提供的血浆中ctDNA低频突变富集测序系统在制备疾病术后监控试剂盒中的应用。
所述的疾病为肿瘤。
本发明的血浆中目标DNA低频突变富集测序方法或本发明提供的血浆中ctDNA低频突变富集测序系统在制备疾病用药指导试剂盒中的应用。
所述的疾病为肿瘤。
本发明还提供了一种针对肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌的早期筛查芯片,命名为ONCOcare—ZS,该芯片包括了常见高发癌症的相关Driver Gene、高频突变基因、癌症相关12条信号通路中重要基因,共计228个基因,680Kb,总共5220个热点变异,该芯片含有的探针所对应的基因分别为:
Figure PCTCN2016074058-appb-000004
Figure PCTCN2016074058-appb-000005
在本发明的一个实施例中,通过本发明前述的血浆中目标DNA低频突变富集测序方法利用上述芯片可以实现对肿瘤(肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)的早期筛查,筛查结果准确,灵敏度高,可以对0.01%低频变异具有高特异性检测。
本发明还提供了肿瘤个体化用药指导探针芯片——ONCOcare-Drug,该芯片包括了:12种常见癌症高频基因,癌症12条信号通路中重要基因,常见靶药及化疗药物基因等,共计559个基因,850KB,总共2400个热点靶药变异。该芯片所含探针对应的基因如下:
Figure PCTCN2016074058-appb-000006
Figure PCTCN2016074058-appb-000007
Figure PCTCN2016074058-appb-000008
在本发明的一个实施例中,通过本发明前述的血浆中目标DNA低频突变富集测序方法利用上述芯片可以实现对12种常见肿瘤(肺癌、结直肠 癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)的个体化指导用药,且疗效确切。
本发明还提供了肿瘤(肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)术后监控芯片——ONCOcare—JK,该芯片包括了常见高发癌症的相关Driver Gene、高频突变基因、癌症相关12条信号通路中重要基因等,共计508个基因,500Kb,总共4800个热点变异。该芯片含有的探针所对应的基因分别如下:
Figure PCTCN2016074058-appb-000009
Figure PCTCN2016074058-appb-000010
Figure PCTCN2016074058-appb-000011
在本发明的一个实施例中,通过本发明前述的血浆中目标DNA低频突变富集测序方法利用上述芯片可以实现对12种常见肿瘤(肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)的术后监控,对于评估患者术后是否存在复发风险实现精确监控。
本发明提供的一种血浆中目标DNA的低频突变富集测序方法(ER-seq,Enrich&Rare mutation Sequencing),是将通用文库TT-COLD PCR,探针富集捕获以及独特的正反链纠错信息分析技术(RealSeq Pipeline)3种技术相融合,实现高效,简便,实用的血浆ctDNA低频变异精确检测,相对于其他血浆检测技术,本发明具有以下优异效果:(1)高灵敏度:ER-seq采用独有的通用文库TT-COLD PCR,探针富集捕获技术可以分别实现对所有突变类型以及热点变异进行不同程度的富集,从而可以仅仅只需要5-10mL外周血样本,并能够高效的对0.01%的稀有突变进行检测;(2)高特异性:基于突变富集以及低频正反链纠错分析策略,可以更有效的实现低频变异的精确检测,其特异性平均在98%以上;(3)高通量性:结合高通量测序技术(NGS)的目标区域捕获测序,不仅可以对相关感兴趣的基因,一次性扫描,获取更全面的受检者信息,以得出更准确的相关预测,而且能够在很短的时间内同时进行多例样本检测,从而压缩成本,有利于临床的推广;(4)多维度应用性:该方法能够充分发掘血浆ctDNA的应用潜能,可以为多种相关肿瘤(肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)的早期筛查,术后监控以及精准医疗奠定坚实的基础,从而有力的推动临床肿瘤事业的发展。
附图说明
图1为本发明方法的流程图。
图2为正常人血浆连接文库的Tm值。
具体实施方式
以下实施例进一步说明本发明的内容,但不应理解为对本发明的限制。在不背离本发明精神和实质的情况下,对本发明方法、步骤或条件所作的修改或替换,均属于本发明的范围。
若未特别指明,实施例中所用的化学试剂均为常规市售试剂,实施例中所用的技术手段为本领域技术人员所熟知的常规手段。本发明实施例中采用的测序装置为Illumina HiSeq2500,本发明测序步骤中,不限于该测序装置。
本发明实施例中,基因名称均采用NCBI-Gene里的官方命名(Official Symbol)。本发明所述的同义突变:指由于某个碱基的改变使代表某种氨基酸的密码子突变为其他密码子,但是仍然编码同一个氨基酸。所述的错义突变:编码某种氨基酸的密码子经碱基替换以后,变成编码另一种氨基酸的密码子,从而使多肽链的氨基酸种类和序列发生改变。某些错义突变能使多肽链丧失原有功能,许多蛋白质的异常就是由错义突变引起的。所述的终止密码子获得突变:也被称为无义突变,指由于某个碱基的改变使代表某种氨基酸的密码子突变为终止密码子,从而使肽链合成提前终止。本发明所述的终止密码子丧失突变:指由于某个碱基的改变使终止密码子突变未其他密码子,从而使肽链合成无法正常终止。
实施例1血浆中目标DNA低频突变富集测序方法(ER-seq方法)
(1)血浆目标DNA的提取与文库构建;所述的血浆来自人类外周血,文库构建方法按照3步酶促反应,即末端修复,加“A”和文库接头连接。文库接头使用的引物为:
接头第一链:TACACTCTTTCCCTACACGACGCTCTTCCGATCT,
接头第二链:GATCGGAAGAGCACACGTCTGAACTCCAGTCAC。
(2)通用文库TT-COLD PCR扩增富集;包括以下步骤:
1)确定文库的Tm值;文库Tm值通过以下方法来确定,对血浆目标DNA的文库采用一对引物使用荧光定量PCR,根据溶解曲线分析获得文库Tm值;所述引物的序列为:
上游引物:AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
下游引物:CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
2)绕过每个插入片段存在的特异Tc值,基于1对通用引物,在1个系列的循环条件下,对文库中所有片段上的各种突变类型进行富集;设定Tc min≈TM-2.5,之后Tc以0.5℃逐步递增,在每个Tc条件下分别进行FULL COLD PCR。
所述1对通用引物为通用文库TT-COLD PCR引物,其核苷酸序列为:上游引物:AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,下游引物:CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
所述1个系列循环条件为:
Figure PCTCN2016074058-appb-000012
Figure PCTCN2016074058-appb-000013
(3)探针富集捕获、杂交捕获产物的扩增与上机测序;探针富集捕获是将扩增后的文库质控合格后,采用富集探针芯片进行杂交捕获,并对杂交捕获产物进行PCR扩增,然后进行上机测序;
富集探针芯片的设计方法为:基于目的基因的用途确定芯片捕获区间,参考目标DNA所属的数据库,在一定碱基范围内,确定至少1个最重要的热点变异位点,同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于相应的发生频率作为其在该位点总探针覆盖水平所占的比例;针对热点变异,将基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,同时热点变异探针总覆盖度与其他区域正常探针覆盖度的差异比例不少于3:1,从而实现捕获时对热点变异的富集。
(4)正反双链纠错低频信息分析(RealSeq Pipeline)具体方法为:
1)基于插入片段两端的序列碱基作为标签,所述插入片段是文库中与接头引物相连接的DNA片段,经双末端测序,每个片段形成一对成对测序序列;将成对测序序列的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,字母序排列以较小的标签在前连接成24bp的一条索引,并且以这24bp作为成对测序序列的索引,测序序列1的标签在前就标记成正链;测序序列2的标签在前就标记为反链;
2)对索引进行外部排序,以达到将同一个DNA模板的所有测序重复测序序列聚集到一起的目的;
3)对聚集起来的拥有相同索引的测序序列进行中心聚类,根据其序列之间的汉明距离,将每个有相同索引的大簇聚集成若干个小簇,每个小簇中任意两对成对测序序列的汉明距离不超过10,以达到区分开拥有相同索引却来自不同DNA模板的测序序列的目的;
4)对步骤3)中获得的同一个DNA模板的重复簇进行筛选,若正链和反链的测序序列数都达到2对以上,则进行后续分析;
5)对满足4)中条件的簇进行纠错,并产生一对无错的新测序序列.对于DNA模板的每一个测序碱基,若某种碱基型在正链的测序序列中的一致率达到80%,且在反链测序序列中的一致率也达到80%,则记新测序序列的这个碱基为此碱基型,否则记为N,这样便得到了代表原始DNA模板序列的新测序序列;
6)将新测序序列用bwa mem算法重新比对到基因组上,筛除比对质量小于30的测序序列;
7)根据6)中得到的测序序列进行统计,得到捕获区域内每个位点的碱基型分布,统计目标区域覆盖大小、平均测序深度,正反链互配率,低频突变率;
8)Call SNV/InDel/SV/CNV:根据患者样品与对照样品信息的比对,用mutect流程call somatic SNV变异;用gatk流程call somatic InDel变异;用contra.py流程call CNV;用somVar流程call SV;
所使用的筛选参数为:对照位点变异率≤2%;纠错后变异测序序列条数≥2;突变预测p值≤0.05;
9)变异注释:注释变异的功能、变异测序序列支持数、变异频率、氨基酸变异及已有变异数据库中的该变异的情况。
实施例2血浆中ctDNA低频突变富集测序方法的建立
1、血浆ctDNA的提取与文库构建:
(1)抽取受检者外周血1-2管(5mL/管)于EDTA抗凝管中,轻柔上下颠倒(防止细胞破裂),6-8次充分混匀,在采血当天4-6小时内进行 以下处理;在4℃条件下1600g离心10分钟,离心后将上清(血浆)分装到多个1.5mL/2mL离心管中,在吸取过程中不能吸到中间层白细胞;在4℃条件下16000g离心10分钟,去除残余细胞,将上清(血浆)转移到新的1.5mL/2mL离心管中,不能吸到管底白细胞,即得到分离后所需血浆;血浆样本处理完后,分离得到的血浆及剩余血细胞均保存到-80℃冰箱中,避免反复冻融。
(2)血浆cfDNA/ctDNA的提取与定量:取分离出的血浆约2-3ml,按照QIAamp Circulating Nucleic Acid Kit(Qiagen)提取试剂说明书,进行血浆cfDNA的提取。Qubit(Invitrogen,the Quant-iT TM dsDNA HS Assay Kit)定量所提取的DNA,总量约为30~50ng。
(3)样品文库的制备:血浆中提取的cfDNA,之后按照KAPA LTP Library Preparation Kit建库说明书,进行3步酶促反应。
3.1末端修复
Figure PCTCN2016074058-appb-000014
充分混合,20℃孵育30min。
之后,加入Agencourt AMPure XP reagent 120μL,进行磁珠纯化,最后回溶42μL ddH2O,带磁珠进行下一步反应。
3.2加A
Figure PCTCN2016074058-appb-000015
Figure PCTCN2016074058-appb-000016
总体积
充分混合,30℃孵育30min
之后加入PEG/NaCl SPRI溶液90μL,充分混合,进行磁珠纯化,最后回溶(35-接头)μL ddH2O,带磁珠进行下一步反应。
3.3接头连接
Figure PCTCN2016074058-appb-000017
充分混合,16℃孵育16小时。
接头引物见表1中的接头第一、二链。之后分别加入PEG/NaCl SPRI溶液50μL 2次,进行2次磁珠纯化,最后回溶25μL ddH2O。
2、通用文库TT-COLD PCR:
1)基于相同的仪器和试剂,对正常人血浆连接文库采用通用文库引物使用荧光定量PCR,反应试剂包括:KAPA HiFi HotStart ReadyMix以及SYBR染料。从溶解曲线分析,获得文库的Tm值(DNA解链温度),如图2所示;所述通用文库引物见表1。
表1引物序列信息
Figure PCTCN2016074058-appb-000018
Figure PCTCN2016074058-appb-000019
注:xxxxxxxx:index标签
2)通用文库TT COLD PCR:反应体系为:
Figure PCTCN2016074058-appb-000020
充分混合。
绕过每个插入片段存在的特异Tc值,基于表1中的1对通用文库引物,在1个系列的循环条件下,对文库中所有片段上的各种突变类型进行富集。该方法具体为由经验公式给出Tc min≈TM-2.5,之后Tc以0.5℃逐步递增,在每个Tc条件下分别进行FULL COLD PCR。PCR反应程序设置,见表2。
表2
Figure PCTCN2016074058-appb-000021
Figure PCTCN2016074058-appb-000022
3、探针富集捕获与上机测序:
1)肿瘤富集探针芯片设计:
基于TCGA、ICGC、COSMIC等数据库和相关文献参考,参考常规芯片捕获探针设计原则,确定芯片捕获区间;
在捕获区间内,参考TCGA、COSMIC等相关数据库,在每200BP范围内,确定1个最重要的热点变异位点(SNV>3);同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于其相应的发生频率作为其在该位点总探针覆盖水平上所占的比例;
芯片设计时,针对相关热点变异,将原先基于REF设计的探针全部替换为基于突变碱基进行设计,其他探针不变,同时热点变异探针总覆盖度与其他区域正常探针覆盖度的差异至少为3:1,从而实现捕获时对热点变异的富集。
2)扩增后文库质控并进行富集探针捕获,之后进行杂交捕获产物的扩增与上机测序。
扩增后文库质控合格后并采用上述肿瘤富集探针芯片,参照芯片制造商(Roche)提供的说明书进行杂交捕获。最后洗脱回溶21μL ddH2O带杂交洗脱磁珠。
杂交捕获产物的扩增体系:
Figure PCTCN2016074058-appb-000023
PCR反应条件:初始变性98℃45sec;变性98℃15sec,退火65℃30sec,延伸72℃30sec,共10个循环;72℃延伸60sec,4℃保存。
FellowCell Primer 1、Primer 2为Hiseq上机测试平台自带的引物,以用于将捕获后的DNA模板进行扩增,得到足够产量满足上机要求。
先除去上一步磁珠,然后重新加入Agencourt AMPure XP reagent50μL,进行磁珠纯化,最后回溶25μL ddH2O,进行QC及上机。采用Illumina HiSeq2500PE101+8+101程序进行上机测序,测序实验操作按照制造商提供的操作说明书(参见Illumina/Solexa官方公布cBot)进行上机测序操作。
4、正反双链纠错低频信息分析(RealSeq Pipeline方法):
1)基于插入片段两端的序列碱基作为标签,所述插入片段是文库中与接头引物相连接的DNA片段,经双末端测序,每个片段形成一对成对测序序列;将成对测序序列的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,字母序排列以较小的标签在前连接成24bp的一条索引,并且以这24bp作为成对测序序列的索引,测序序列1的标签在前就标记成正链;测序序列2的标签在前就标记为反链;
2)对索引进行外部排序,以达到将同一个DNA模板的复制聚集到一起的目的;
3)对聚集起来的拥有相同索引的测序序列进行中心聚类,根据其序列之间的汉明距离,将每个有相同索引的大簇聚集成若干个小簇,每个小簇中任意两对成对测序序列的汉明距离不超过10,以达到区分开拥有相同索引却来自不同DNA模板的测序序列的目的;
4)对步骤3)中获得的同一个DNA模板的复制簇进行筛选,若正链和反链的测序序列数都达到2对以上,则进行后续分析;
5)对满足4)中条件的簇进行纠错,并产生一对无错的新测序序列.对于DNA模板的每一个测序碱基,若某种碱基型在正链的测序序列中的一致率达到80%,且在反链测序序列中的一致率也达到80%,则记新测序序列的这个碱基为此碱基型,否则记为N,这样便得到了代表原始DNA模板序列的新测序序列;
6)将新测序序列用bwa mem算法重新比对到基因组上,筛除比对质 量小于30的测序序列;
7)根据6)中得到的测序序列进行统计,得到捕获区域内每个位点的碱基型分布,统计目标区域覆盖大小、平均测序深度,正反链互配率,低频突变率;
8)Call SNV/InDel/SV/CNV:根据患者样品与对照样品信息的比对,用mutect流程call somatic SNV变异;用gatk流程call somatic InDel变异;用contra.py流程call CNV;用somVar流程call SV;
所使用的筛选参数为:对照位点变异率≤2%;纠错后变异测序序列条数≥2;突变预测p值≤0.05;
9)变异注释:注释变异的功能、变异测序序列支持数、变异频率、氨基酸变异及已有变异数据库中的该变异的情况。
实施例3肿瘤早期筛查
1、芯片设计基于富集探针芯片设计原则,完成肿瘤(肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)早期筛查芯片——ONCOcare—ZS,该芯片包括了常见高发癌症的相关Driver Gene、高频突变基因、癌症相关12条信号通路中重要基因,共计227个基因,680Kb,总共5220个热点变异。基因列表详见表3。
表3ONCOcare—ZS早筛芯片基因列表
Figure PCTCN2016074058-appb-000024
Figure PCTCN2016074058-appb-000025
2、测序结果分析
对1例肺部小结节患者按照实施例1记载的方法测序分析,其中,探针富集捕获步骤采用本实施例的芯片ONCOcare—ZS,测序数据统计结果如下表4所示:
表4测序结果
Figure PCTCN2016074058-appb-000026
Figure PCTCN2016074058-appb-000027
注释:正反链互配率:基于3条测序序列以上正反链均有的簇/3条测序序列上总的簇的比值,以评估可用数据中正反链互配情况;有效数据利用率:基于至少满足2+/2-簇的测序序列纠错后的个数与总测序测序序列数的比值;低频纠错深度:基于有效数据纠错后,对目标区域碱基的平均覆盖情况。
结果分析:在患者血浆中检测到了TP53p.[Val272Leu]和EGFR p.[Leu861Arg]2个Driver mutation变异,预示着患者具有较高的癌症风险率。后续临床病理确认为:患者为浸润性腺癌T1aN0M0,ⅠA。此外相应组织与血浆常规高通量测序分析以及血浆数字PCR验证结果显示:
表5
Figure PCTCN2016074058-appb-000028
实施例4肿瘤个体化用药指导
1、芯片设计
基于富集探针芯片设计原则,完成肿瘤个体化用药指导探针——ONCOcare-Drug,该芯片包括了:12种常见癌症高频基因,癌症12条信号通路中重要基因,常见靶药及化疗药物基因等,共计559个基因,850KB,总共2400个热点靶药变异。基因列表详见表6
表6ONCOcare-Drug个体化用药指导芯片基因列表
Figure PCTCN2016074058-appb-000029
Figure PCTCN2016074058-appb-000030
Figure PCTCN2016074058-appb-000031
Figure PCTCN2016074058-appb-000032
2、测序结果分析
对1例晚期结直肠患者按照实施例1记载的方法进行分析,其中,探针富集捕获步骤采用本实施例的芯片ONCOcare—Drug,测序数据统计结果如下表7所示:
表7
Figure PCTCN2016074058-appb-000033
注释:正反链互配率:基于3条测序序列以上正反链均有的簇/3条测序序列上总的簇的比值,以评估可用数据中正反链互配情况;有效数据利用率:基于至少满足2+/2-簇的测序序列纠错后的个数与总测序测序序列数的比值;低频纠错深度:基于有效数据纠错后,对目标区域碱基的平均覆盖情况。
结果分析:总共检出6个Exon区非同义突变且均与组织变异一致.变异详情见表8:
表8
Figure PCTCN2016074058-appb-000034
化疗位点详情见表9:
表9
基因名称 RS号 检测碱基 基因名称 RS号 检测碱基
XPC rs2228001 GT MTHFR rs1801133 AA
TP53 rs1042522 CC CBR3 rs1056892 GG
XRCC1 rs25487 CC MTHFR rs1801133 AA
GSTP1 rs1695 AG ATIC rs4673993 TT
ERCC1 rs11615 GG MTRR rs1801394 AA
ERCC1 rs3212986 CC TP53 rs1042522 CC
MTHFR rs1801133 AA DPYD rs3918290 CC
SOD2 rs4880 AA DPYD rs67376798 TT
GSTP1 rs1695 AG TPMT rs1800460 CC
MTHFR rs1801133 AA TPMT rs1800462 CC
MTHFR rs1801131 TT TPMT rs1800584 CC
GSTP1 rs1695 AG UGT1A1 rs8175347 7TA/7TA
UMPS rs1801019 GG      
药物预测:结合上述检测结果依据靶药化疗解读数据库,以下结论仅供临床医生制定治疗方案时参考:
表10靶向药物用药提示
Figure PCTCN2016074058-appb-000035
表11化疗药物用药提示
Figure PCTCN2016074058-appb-000036
Figure PCTCN2016074058-appb-000037
实施例5十二种常见癌症术后监控
1、芯片设计
基于富集探针芯片设计原则,完成肿瘤(肺癌、结直肠癌、胃癌、乳腺癌、肾癌、胰腺癌、卵巢癌、子宫内膜癌、甲状腺癌、宫颈癌、食管癌以及肝癌等)术后监控芯片——ONCOcare—JK,该芯片包括了常见高发癌症的相关Driver Gene、高频突变基因、癌症相关12条信号通路中重要基因等,共计508个基因,500Kb,共4800个热点变异。基因列表见表12。
表12ONCOcare—JK术后监控芯片基因列表
ABL1 CBLB DOT1L FGF7 IGF2 MSH2 PIK3CB SDHB TRAF7
ABL2 CBR1 DUSP6 FGFR1 IKBKB MSH3 PIK3CG SDHC TSC1
ACVR1B CCND1 EDNRA FGFR2 IKBKE MSH4 PIK3R1 SDHD TSC2
ACVR2A CCND2 EGFR FGFR3 IKZF1 MSH5 PIK3R2 SEMA3A TSHR
AJUBA CCND3 EGR3 FGFR4 IL7R MSH6 PLK1 SEMA3E TSHZ2
AKT1 CCNE1 EIF4A2 FLCN INHBA MSR1 PML SETBP1 TSHZ3
AKT2 CD79A ELAC2 FLT1 IRF4 MTOR PMS1 SETD2 TUBA1A
AKT3 CD79B ELF3 FLT3 IRS2 MUC1 PMS2 SF1 TUBB
ALK CDC25C EML4 FLT4 ITGB2 MUTYH PNRC1 SF3B1 TUBD1
ANGPT1 CDC42 EP300 FNTA JAK1 MYC POLQ SH2B3 TUBE1
ANGPT2 CDC73 EPHA2 FOXA1 JAK2 MYCL1 PPP2R1A SIN3A TUBG1
APC CDH1 EPHA3 FOXA2 JAK3 MYCN PRDM1 SLAMF7 TYR
AR CDK12 EPHA5 FOXL2 JUN NAV3 PRKCA SLC4A1 VEGFA
ARAF CDK2 EPHB1 FPGS KDR NBN PRKCB SLIT2 VEGFB
ARFRP1 CDK4 EPHB2 FUBP1 KEAP1 NCOA1 PRKCG SMAD2 VEZF1
ARID1A CDK6 EPHB6 FYN KIF1B NCOA2 PRKDC SMAD3 VHL
ARID1B CDK8 EPPK1 GAB2 KIF5B NCOR1 PRSS8 SMAD4 WISP3
ASXL1 CDKN1A ERBB2 GATA1 KIT NEK11 PSMB1 SMARCA1 WT1
ATM CDKN1B ERBB3 GATA2 KLF4 NF1 PSMB2 SMC1A WWP1
ATR CDKN2A ERBB4 GATA3 KLHL6 NF2 PSMB5 SMC3 XIAP
ATRX CDKN2B ERCC2 GID4 KRAS NOTCH1 PTCH1 SMO XPA
AURKA CDKN2C ERCC3 GNA11 LCK NOTCH2 PTCH2 SOCS1 XPC
AURKB CDX2 ERG GNA13 LIMK1 NOTCH3 PTEN SOX2 XPO1
AXIN1 CEBPA ESR1 GNAQ LRRK2 NOTCH4 PTP4A3 SOX9 XRCC3
AXIN2 CFLAR ETV1 GNAS MALAT1 NPM1 PTPN11 SPEN YES1
AXL CHD1 ETV6 GNRHR MAP2K1 NR3C1 PTPRD SPRY4 ZNF217
BACH1 CHD2 EWSR1 GPR124 MAP2K2 NRAS RAC1 SRC ZRSR2
BAK1 CHD4 EXT1 GRIN2A MAP2K4 NSD1 RAC2 SRD5A2  
BAP1 CHEK1 EXT2 GRM3 MAP3K1 NTRK1 RAD21 SRSF2  
BARD1 CHEK2 EZH2 GSK3B MAP3K13 NTRK2 RAD50 SSTR2  
BCL2 CHUK FAM46C H3F3A MAPK1 NTRK3 RAD51 STAG2  
BCL2A1 CIC FANCA H3F3C MAPK3 NUP93 RAF1 STAT4  
BCL2L1 CRBN FANCC HCK MAPK8 PAK3 RARA STAT5B  
BCL2L2 CREBBP FANCD2 HDAC1 MAX PAK7 RARB STK11  
BCL6 CRIPAK FANCE HDAC2 MC1R PALB2 RARG SUFU  
BCOR CRKL FANCF HDAC3 MCL1 PARP1 RB1 SUZ12  
BCORL1 CRLF2 FANCG HDAC4 MDM2 PARP2 REL SYK  
BCR CTCF FANCI HDAC6 MDM4 PARP3 RET TAF1  
BLM CTLA4 FANCL HDAC8 MED12 PARP4 RHEB TBX3  
BMPR1A CTNNA1 FANCM HGF MEF2B PCM1 RNF43 TEK  
BRAF CTNNB1 FAT3 HIF1A MEN1 PDGFRA ROBO1 TERT  
BRCA1 CUL4A FBXW7 HNF1A MET PDGFRB ROBO2 TET2  
BRCA2 CUL4B FCGR2A HRAS MITF PDK1 ROS1 TFG  
BRIP1 CYLD FCGR2B HRH2 MLH1 PHF6 RPA1 TGFBR2  
BTG1 DAXX FCGR2C IDH1 MLH3 PIGF RPL5 TIPARP  
BTK DDR1 FCGR3A IDH2 MLL PIK3C2A RPS14 TLR4  
CARD11 DDR2 FCGR3B IFNAR1 MLL2 PIK3C2B RXRA TOP1  
CASP8 DIS3 FGF3 IFNAR2 MLL3 PIK3C2G RXRB TOP2A  
CBFB DNMT1 FGF4 IGF1 MLL4 PIK3C3 RXRG TOP2B  
CBL DNMT3A FGF6 IGF1R MS4A1 PIK3CA SDHAF2 TP53  
2、测序结果分析
对1例肺腺癌术后3个月患者按照实施例1的步骤进行分析,其中,探针富集捕获步骤采用本实施例的芯片ONCOcare—JK,测序数据统计结果如下表13所示:
表13
Figure PCTCN2016074058-appb-000038
注释:正反链互配率:基于3条测序序列以上正反链均有的簇/3条测序序列上总的簇的比值,以评估可用数据中正反链互配情况;有效数据利用率:基于至少满足2+/2-簇的测序序列纠错后的个数与总测序测序序列数的比值;低频纠错深度:基于有效数据纠错后,对目标区域碱基的平均覆盖情况。
结果分析:总共检出5个Exon区非同义突变统计变异详情见表14:
表14
Figure PCTCN2016074058-appb-000039
Figure PCTCN2016074058-appb-000040
总共检出19个变异,其中5个Exon区非同义突变统计,相对正常人基线,检出变异较高。此外组织中存在的NOTCH1p.N685T;PDGFRA p.M745I,术后血浆中存在且仍然较高,预示着患者术后可能存在较高的复发风险。临床随访跟踪:患者疾病有进展,此外血浆常规高通量测序分析以及血浆数字PCR验证结果见表15。
表15
Figure PCTCN2016074058-appb-000041
工业实用性
本发明提供的一种血浆中游离的目标DNA低频突变富集测序方法,能够对5-10mL外周血样本的血浆DNA实现低频精确检测,操作简便,实用性强,具有高灵敏度可以对0.01%低频变异具有高特异性检测;具有高 特异性,可以更有效的实现低频变异的精确检测,其特异性平均在98%以上;具有高通量性,不仅可以对相关感兴趣的基因,一次性扫描,获取更全面的受检者信息,以得出更准确的相关预测,而且能够在很短的时间内同时进行多例样本检测,从而压缩成本,有利于临床的推广;还具有多维度应用性,能够充分发掘血浆ctDNA的应用潜能,可以为多种相关肿瘤的早期筛查、术后监控以及精准医疗奠定坚实的基础,从而有力的推动临床肿瘤诊断事业的发展。
Figure PCTCN2016074058-appb-000042

Claims (18)

  1. 一种血浆中游离的目标DNA低频突变富集测序方法,包括以下步骤:
    (1)血浆中游离的目标DNA的提取与文库构建;
    (2)通用文库TT-COLD PCR扩增富集;
    (3)探针富集捕获、杂交捕获产物的扩增与上机测序;
    (4)正反双链纠错低频信息分析。
  2. 根据权利要求1所述的方法,其特征在于,步骤(1)所述的血浆来自人类外周血,文库构建方法按照3步酶促反应,即末端修复,加“A”和文库接头连接。
  3. 根据权利要求1所述的方法,其特征在于,步骤(2)通用文库TT-COLD PCR扩增富集包括以下步骤:
    1)确定文库的Tm值;
    2)绕过每个插入片段存在的特异Tc值,基于1对通用引物,在1个系列的循环条件下,对文库中所有片段上的各种突变类型进行富集;设定Tc min≈TM-2.5,之后Tc以0.5℃逐步递增,在每个Tc条件下分别进行FULL COLD PCR。
  4. 根据权利要求3所述的方法,其特征在于,步骤1)文库的Tm值通过以下方法来确定,对正常人血浆中游离的目标DNA连接文库采用1对引物使用荧光定量PCR,根据溶解曲线分析获得文库Tm值;所述1对引物的核苷酸序列为:
    上游引物:
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
    下游引物:
    CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
  5. 根据权利要求3所述的方法,其特征在于,步骤2)所述的1对通用引物为通用文库TT-COLD PCR引物,其核苷酸序列为:
    上游引物:
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
    下游引物:
    CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
  6. 根据权利要求3所述的方法,其特征在于,所述1个系列循环条件为:
    Figure PCTCN2016074058-appb-100001
  7. 根据权利要求1-6任一所述的方法,其特征在于,步骤(3)所述探针富集捕获是将扩增后的文库质控合格后,采用富集探针芯片进行杂交捕获,并对杂交捕获产物进行PCR扩增,然后进行上机测序;
    富集探针芯片的设计方法为:基于目的基因的用途确定芯片捕获区间,参考目标DNA所属的数据库,在一定碱基范围内,确定至少1个最重要的热点变异位点,同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于相应的发生频率作为其在该位点总探针覆盖水平所占的比例;针对热点变异,将基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,同时热点变异探针总覆盖度与其他区域正常探针覆盖度的差异比例不少于3:1,从而实现捕获时对热点变异的富集。
  8. 根据权利要求1-6任一所述的方法,其特征在于,步骤(4)正反双链纠错低频信息分析,具体方法为:
    1)基于测序结果,截取成对测序序列中的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,且根据字母序排列以较小的标签在前连接成24bp的一条索引,同时根据标签的排列组合方式,选定正链和反链;
    2)对索引进行外部排序,以达到将同一个DNA模板的所有测序序列聚集到一起的目的;
    3)对聚集起来的拥有相同索引的测序序列进行中心聚类,根据其序列之间的汉明距离,将每个有相同索引的大簇聚集成若干个小簇,每个小簇中任意两对成对测序序列的汉明距离不超过10,以达到区分开拥有相同索引却来自不同DNA模板的测序序列的目的;
    4)对步骤3)中获得的同一个DNA模板的重复簇进行筛选,若正链和反链的测序序列数都达到2对以上,则进行后续分析;
    5)对满足4)中条件的簇进行纠错,并产生一对无错的新测序序列.对于DNA模板的每一个测序碱基,若某种碱基型在正链的测序序列中的一 致率达到80%,且在反链测序序列中的一致率也达到80%,则记新测序序列的这个碱基为此碱基型,否则记为N,这样便得到了代表原始DNA模板序列的新测序序列;
    6)将新测序序列用bwa mem算法重新比对到基因组上,筛除比对质量小于30的测序序列;
    7)根据6)中得到的测序序列进行统计,得到捕获区域内每个位点的碱基型分布,统计目标区域覆盖大小、平均测序深度,正反链互配率,低频突变率;
    8)Call SNV/InDel/SV/CNV:根据患者样品与对照样品信息的比对,用mutect流程call somatic SNV变异;用gatk流程call somatic InDel变异;用contra.py流程call CNV;用somVar流程call SV;
    所使用的筛选参数为:对照位点变异率≤2%;纠错后变异测序序列条数≥2;突变预测p值≤0.05;
    9)变异注释:注释变异的功能、变异测序序列支持数、变异频率、氨基酸变异及已有变异数据库中的该变异的情况。
  9. 根据权利要求8所述的方法,其特征在于,步骤1)中,基于插入片段两端的序列碱基作为标签,经双末端测序,每个片段将形成一对成对测序序列;将成对测序序列的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,字母序排列以较小的标签在前连接成24bp的一条索引,并且以这24bp作为成对测序序列的索引,测序序列1的标签在前就标记成正链;测序序列2的标签在前就标记为反链。
  10. 一种血浆中游离的目标DNA低频突变富集测序试剂盒,其特征在于,含有富集探针芯片,芯片上探针是将基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,且热点变异探针总覆盖度与其他区域正常探针覆盖度的差异至少为3:1;
    基于目标DNA突变碱基设计探针的方法为:根据目的基因的用途确定芯片捕获区间,参考目标DNA所属的数据库,在一定碱基范围内,确 定至少1个最重要的热点变异位点,同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于相应的发生频率作为其在该位点总探针覆盖水平所占的比例。
  11. 一种血浆中ctDNA低频突变富集测序的系统,包括:
    (1)血浆中ctDNA文库构建单元;
    (2)通用文库TT-COLD PCR扩增富集单元;
    (3)探针富集捕获单元、杂交捕获产物的扩增与上机测序单元;
    (4)正反双链纠错低频信息分析单元。
  12. 如权利要求11所述的系统,其特征在于,单元(2)的通用文库TT-COLD PCR扩增富集单元是基于通用引物对所有类型变异实现第一级突变富集扩增;所述通用引物的核苷酸序列为:
    上游引物:
    AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT,
    下游引物:
    CAAGCAGAAGACGGCATACGAGATxxxxxxxxGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT,其中xxxxxxxx为index标签。
  13. 如权利要求11所述的系统,其特征在于,单元(3)的探针富集捕获单元是针对热点变异通过富集探针芯片实现第二次富集捕获,所述富集探针芯片上探针是将原先基于人基因组参考序列hg19设计的探针替换为基于突变碱基设计的探针,其他位点探针不变,且热点变异探针总覆盖度与其他区域正常探针覆盖度的差异至少为3:1;
    基于ctDNA突变碱基设计探针的原则为:基于TCGA、ICGC、COSMIC数据库确定芯片捕获区间,参考TCGA、ICGC、COSMIC数据库,在每200bp碱基范围内,确定至少1个最重要的热点变异位点,同时针对该位点存在的多种突变类型,以几种主要类型作为参考,基于相应的发生频率作为其在该位点总探针覆盖水平所占的比例。
  14. 如权利要求11-13任一所述的系统,其特征在于,单元(4)的正反双链纠错低频信息分析单元是:
    1)基于插入片段两端的序列碱基作为标签,所述插入片段是文库中与接头引物相连接的DNA片段,经双末端测序,每个片段将形成一对成对测序序列;将成对测序序列的测序序列1的前12bp碱基和测序序列2的前12bp碱基作为标签,字母序排列以较小的标签在前连接成24bp的一条索引,并且以这24bp作为成对测序序列的索引,测序序列1的标签在前就标记成正链;测序序列2的标签在前就标记为反链;
    2)对索引进行外部排序,以达到将同一个DNA模板的所有测序序列聚集到一起的目的;
    3)对聚集起来的拥有相同索引的测序序列进行中心聚类,根据其序列之间的汉明距离,将每个有相同索引的大簇聚集成若干个小簇,每个小簇中任意两对成对测序序列的汉明距离不超过10,以达到区分开拥有相同索引却来自不同DNA模板的测序序列的目的;
    4)对步骤3)中获得的同一个DNA模板的重复簇进行筛选,若正链和反链的测序序列数都达到2对以上,则进行后续分析;
    5)对满足4)中条件的簇进行纠错,并产生一对无错的新测序序列.对于DNA模板的每一个测序碱基,若某种碱基型在正链的测序序列中的一致率达到80%,且在反链测序序列中的一致率也达到80%,则记新测序序列的这个碱基为此碱基型,否则记为N,这样便得到了代表原始DNA模板序列的新测序序列;
    6)将新测序序列用bwa mem算法重新比对到基因组上,筛除比对质量小于30的测序序列;
    7)根据6)中得到的测序序列进行统计,得到捕获区域内每个位点的碱基型分布,统计目标区域覆盖大小、平均测序深度,正反链互配率,低频突变率;
    8)Call SNV/InDel/SV/CNV:根据患者样品与对照样品信息的比对,用mutect流程call somatic SNV变异;用gatk流程call somatic InDel变异;用contra.py流程call CNV;用somVar流程call SV;
    所使用的筛选参数为:对照位点变异率≤2%;纠错后变异测序序列条数≥2;突变预测p值≤0.05;
    9)变异注释:注释变异的功能、变异测序序列支持数、变异频率、氨基酸变异及已有变异数据库中的该变异的情况。
  15. 权利要求1-9任一所述的方法或权利要求11-14任一所述的系统在制备疾病早期筛查试剂盒中的应用。
  16. 如权利要求15所述的应用,其特征在于,所述的疾病为肿瘤。
  17. 权利要求1-9任一所述的方法或权利要求11-14任一所述的系统在制备疾病术后监控试剂盒中的应用。
  18. 权利要求1-9任一所述的方法或权利要求11-14任一所述的系统在制备疾病用药指导试剂盒中的应用。
PCT/CN2016/074058 2015-08-10 2016-02-18 一种血浆中游离的目标dna低频突变富集测序方法 WO2017024784A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/751,722 US11001837B2 (en) 2015-08-10 2016-02-18 Low-frequency mutations enrichment sequencing method for free target DNA in plasma

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510487759.1A CN105063208B (zh) 2015-08-10 2015-08-10 一种血浆中游离的目标dna低频突变富集测序方法
CN201510487759.1 2015-08-10

Publications (1)

Publication Number Publication Date
WO2017024784A1 true WO2017024784A1 (zh) 2017-02-16

Family

ID=54492711

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/074058 WO2017024784A1 (zh) 2015-08-10 2016-02-18 一种血浆中游离的目标dna低频突变富集测序方法

Country Status (4)

Country Link
US (1) US11001837B2 (zh)
CN (1) CN105063208B (zh)
HK (1) HK1216184A1 (zh)
WO (1) WO2017024784A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108949996A (zh) * 2018-08-24 2018-12-07 山东德诺生物科技有限公司 用于检测rs1695的引物探针组及其应用
CN110010197A (zh) * 2019-03-29 2019-07-12 深圳裕策生物科技有限公司 基于血液循环肿瘤dna的单核苷酸变异检测方法、装置和存储介质
WO2019170773A1 (en) * 2018-03-06 2019-09-12 Cancer Research Technology Limited Improvements in variant detection
CN110241209A (zh) * 2018-03-09 2019-09-17 林云富 一种引物、试剂盒及用途
CN111916152A (zh) * 2020-06-04 2020-11-10 华南理工大学 一种用于高通量测序体细胞突变检测性能评估的数据集和方法
CN112592970A (zh) * 2020-11-24 2021-04-02 首都医科大学附属北京友谊医院 Gilbert综合征UGT1A1基因多位点变异检测试剂盒
CN114596918A (zh) * 2022-03-11 2022-06-07 苏州吉因加生物医学工程有限公司 一种检测突变的方法及装置

Families Citing this family (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105063208B (zh) * 2015-08-10 2018-03-06 北京吉因加科技有限公司 一种血浆中游离的目标dna低频突变富集测序方法
CN105602938A (zh) * 2016-01-22 2016-05-25 北京圣谷同创科技发展有限公司 血浆cfDNA提取方法
CN105950709A (zh) * 2016-03-30 2016-09-21 广州精科生物技术有限公司 试剂盒、建库方法以及检测目标区域变异的方法及系统
CN106047998B (zh) * 2016-05-27 2019-11-12 深圳市海普洛斯生物科技有限公司 一种肺癌基因的检测方法及应用
CN105950739A (zh) * 2016-05-30 2016-09-21 哈尔滨医科大学 用于人乳腺癌循环肿瘤dna检测的探针及其用途
CN107723352A (zh) * 2016-08-12 2018-02-23 嘉兴允英医学检验有限公司 一种循环肿瘤dna肝癌驱动基因高通量检测方法
CN107723351A (zh) * 2016-08-12 2018-02-23 嘉兴允英医学检验有限公司 一种循环肿瘤dna肺癌驱动基因的高通量检测方法
CN107881230A (zh) * 2016-09-30 2018-04-06 复旦大学 一种检测肺癌血浆游离目标dna多位点低频突变的方法和试剂盒
CN106570349B (zh) * 2016-10-28 2019-05-14 深圳华大基因科技服务有限公司 用于目标区域捕获高通量测序的特异性肿瘤探针区域设计方法和装置以及探针
CN106676169B (zh) * 2016-11-15 2021-01-12 上海派森诺医学检验所有限公司 一种用于乳腺癌易感基因brca1和brca2突变检测的杂交捕获试剂盒及其方法
CN108070586A (zh) * 2016-11-18 2018-05-25 杭州拓宏生物科技有限公司 Pcr扩增引物及其应用
CN106755505A (zh) * 2016-12-27 2017-05-31 安诺优达基因科技(北京)有限公司 用于检测血浆ctDNA中基因变异的试剂盒
CN108256294A (zh) * 2016-12-29 2018-07-06 安诺优达基因科技(北京)有限公司 一种用于检测体细胞突变的装置
CN106834275A (zh) * 2017-02-22 2017-06-13 天津诺禾医学检验所有限公司 ctDNA超低频突变检测文库的构建方法、试剂盒及文库检测数据的分析方法
CN108315321A (zh) * 2017-03-31 2018-07-24 索真(北京)医学科技有限公司 尿液ctDNA中K-ras基因突变位点的检测
CN108660135B (zh) * 2017-03-31 2022-09-02 天津华大医学检验所有限公司 一种用于dna建库的试剂盒及其应用
AU2018261332A1 (en) * 2017-05-01 2019-11-07 Illumina, Inc. Optimal index sequences for multiplex massively parallel sequencing
CA3062174A1 (en) * 2017-05-08 2018-11-15 Illumina, Inc. Universal short adapters for indexing of polynucleotide samples
CN107446996A (zh) * 2017-07-14 2017-12-08 艾吉泰康生物科技(北京)有限公司 一种检测靶序列的超低频突变的接头序列及方法
CN107523563A (zh) * 2017-09-08 2017-12-29 杭州和壹基因科技有限公司 一种用于循环肿瘤dna分析的生物信息处理方法
CN107545152A (zh) * 2017-09-18 2018-01-05 杭州和壹基因科技有限公司 一种基于Illumina数据找变异的方法
CN107782903B (zh) * 2017-10-18 2020-02-04 江西省妇幼保健院 一种通过Sufu蛋白阳性表达情况对宫颈鳞癌恶性程度的评价方法
CN107881232A (zh) * 2017-10-26 2018-04-06 上海仁东医学检验所有限公司 探针组合物及基于ngs方法检测肺癌和结直肠癌基因的应用
CN108048915A (zh) * 2017-12-01 2018-05-18 北京科迅生物技术有限公司 用于ctDNA文库构建的接头混合物、包括其的试剂盒及应用
CN108866174B (zh) * 2017-12-25 2023-05-19 厦门基源医疗科技有限公司 一种循环肿瘤dna低频突变的检测方法
CN108103160A (zh) * 2017-12-27 2018-06-01 沃森克里克(北京)生物科技有限公司 一种XPC基因rs2228001位点SNP核酸质谱检测方法
CN110029041B (zh) * 2018-01-12 2022-07-12 浙江安诺优达生物科技有限公司 基因检测芯片区域设计装置
CA3090951C (en) * 2018-02-12 2023-10-17 F.Hoffmann-La Roche Ag Method of predicting response to therapy by assessing tumor genetic heterogeneity
CN108676845A (zh) * 2018-04-13 2018-10-19 深圳蓝图基因科技有限公司 利用crispr技术剪切非突变靶点以凸显低频突变的方法
CN108486230B (zh) * 2018-05-18 2022-02-08 中国人民解放军陆军军医大学第一附属医院 用于无创检测mitf基因突变的试剂盒及其制备方法
CN108531583B (zh) * 2018-05-18 2022-05-17 中国人民解放军陆军军医大学第一附属医院 用于无创检测mitf基因突变的引物组合及检测方法
CN111378748A (zh) * 2018-12-28 2020-07-07 北京福安华生物科技有限公司 检测UMPS基因rs1801019位点多态性的人工模拟核酸分子信标与试剂盒
CN111383717B (zh) * 2018-12-29 2024-10-18 北京安诺优达医学检验实验室有限公司 一种构建生物信息分析参照数据集的方法及系统
CN109554475A (zh) * 2018-12-29 2019-04-02 江苏为真生物医药技术股份有限公司 用于肺结节良恶性鉴别的基因突变/融合组合及试剂盒
CN109777864A (zh) * 2018-12-29 2019-05-21 武汉康圣达医学检验所有限公司 一种检测bcr-abl融合基因abl激酶区突变的方法
CN111383713B (zh) * 2018-12-29 2023-08-01 北京安诺优达医学检验实验室有限公司 ctDNA检测分析装置及方法
CN109762881A (zh) * 2019-01-31 2019-05-17 中山拓普基因科技有限公司 一种用于检测肿瘤患者血液ctDNA中的超低频突变位点的生物信息方法
CN110904212B (zh) * 2019-12-02 2021-11-12 傅君芬 性发育异常相关基因捕获试剂盒及其应用
CN111118119B (zh) * 2019-12-18 2023-09-26 杭州瑞普基因科技有限公司 利用碱基错配的阻滞物对目标突变进行富集和检测的方法
CN112410329A (zh) * 2020-10-16 2021-02-26 深圳乐土生物科技有限公司 引物组合、试剂盒及其在卵巢癌早期筛查中的应用
CN112687331B (zh) * 2020-12-29 2024-01-05 上海派森诺生物科技股份有限公司 一种crispr目标区间变异检测的分析方法
CN113046346A (zh) * 2021-03-18 2021-06-29 深圳人体密码基因科技有限公司 一种基因组捕获探针的制备方法
CN113257350B (zh) * 2021-06-10 2021-10-08 臻和(北京)生物科技有限公司 基于液体活检的ctDNA突变程度分析方法和装置、ctDNA性能分析装置
CN113718034A (zh) * 2021-09-27 2021-11-30 中国医学科学院肿瘤医院 一种指导卵巢癌铂耐药患者用药及疗效评估的标志物、检测试剂盒及检测方法
FR3127504A1 (fr) * 2021-09-30 2023-03-31 Floating Genes Sas Méthode de détection de mutations rares sur biopsie liquide
CN114093428B (zh) * 2021-11-08 2023-04-14 南京世和基因生物技术股份有限公司 一种ctDNA超高测序深度下低丰度突变的检测系统和方法
CN113862263B (zh) * 2021-12-01 2022-03-15 江苏为真生物医药技术股份有限公司 测序文库构建方法及应用
CN115713971B (zh) * 2022-09-28 2024-01-23 上海睿璟生物科技有限公司 靶向序列捕获探针设计策略选择方法、系统及终端
CN116515955B (zh) * 2023-06-20 2023-11-17 中国科学院海洋研究所 一种多基因靶向分型方法
CN117012285B (zh) * 2023-10-07 2024-05-14 广州盛安医学检验有限公司 一种高通量测序数据处理及分析流程管控系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104153003A (zh) * 2014-08-08 2014-11-19 上海美吉生物医药科技有限公司 一种基于illumina测序平台的大片段DNA文库的构建方法
CN105063208A (zh) * 2015-08-10 2015-11-18 北京吉因加科技有限公司 一种血浆中游离的目标dna低频突变富集测序方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101457253B (zh) * 2008-12-12 2011-08-31 深圳华大基因研究院 一种测序序列纠错方法、系统及设备
EP2354243A1 (en) * 2010-02-03 2011-08-10 Lexogen GmbH Complexity reduction method
EP2545189B1 (en) * 2010-03-08 2018-01-10 Dana-Farber Cancer Institute, Inc. Full cold-pcr enrichment with reference blocking sequence
WO2013019361A1 (en) * 2011-07-07 2013-02-07 Life Technologies Corporation Sequencing methods
CN103865993B (zh) * 2012-12-18 2018-09-25 深圳华大基因股份有限公司 肿瘤靶向药物有效性检测、突变富集方法、引物对及试剂
CN103320531B (zh) * 2013-06-22 2015-07-29 福建医科大学附属第一医院 一种可同时检测多个hbv耐药突变位点的新方法
US10913977B2 (en) * 2013-07-24 2021-02-09 Dana-Farber Cancer Institute, Inc. Methods and compositions to enable enrichment of minor DNA alleles by limiting denaturation time in PCR or simply enable enrichment of minor DNA alleles by limiting the denaturation time in PCR

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104153003A (zh) * 2014-08-08 2014-11-19 上海美吉生物医药科技有限公司 一种基于illumina测序平台的大片段DNA文库的构建方法
CN105063208A (zh) * 2015-08-10 2015-11-18 北京吉因加科技有限公司 一种血浆中游离的目标dna低频突变富集测序方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIN LI ET AL., TWO-ROUND COAMPLIFICATION AT LOWER DENATURATION TEMPERATURE-PR (COLD-PCR)-BASED SANGER SEQUENCING IDENTIFIES A NOVEL SPECTRUM OF LOW-LEVEL MUTATIONS IN LUNG ADENOCARCINOMA HUMAN MUTATION, vol. 11, no. 30, 31 December 2009 (2009-12-31) *
XU LIN-LIN ET AL.: "The principle and application of COLD-PCR.", JOURNAL OF BIOLOGY., vol. 29, no. 6, 31 December 2012 (2012-12-31), pages 84 - 85 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019170773A1 (en) * 2018-03-06 2019-09-12 Cancer Research Technology Limited Improvements in variant detection
JP2021516962A (ja) * 2018-03-06 2021-07-15 キャンサー・リサーチ・テクノロジー・リミテッドCancer Research Technology Limited バリアント検出の改善
JP7523353B2 (ja) 2018-03-06 2024-07-26 キャンサー・リサーチ・テクノロジー・リミテッド バリアント検出の改善
CN110241209A (zh) * 2018-03-09 2019-09-17 林云富 一种引物、试剂盒及用途
CN110241209B (zh) * 2018-03-09 2022-11-29 浙江品级基因科技有限公司 一种引物、试剂盒及用途
CN108949996A (zh) * 2018-08-24 2018-12-07 山东德诺生物科技有限公司 用于检测rs1695的引物探针组及其应用
CN110010197A (zh) * 2019-03-29 2019-07-12 深圳裕策生物科技有限公司 基于血液循环肿瘤dna的单核苷酸变异检测方法、装置和存储介质
CN111916152A (zh) * 2020-06-04 2020-11-10 华南理工大学 一种用于高通量测序体细胞突变检测性能评估的数据集和方法
CN111916152B (zh) * 2020-06-04 2023-11-10 华南理工大学 一种用于高通量测序体细胞突变检测性能评估的数据集和方法
CN112592970A (zh) * 2020-11-24 2021-04-02 首都医科大学附属北京友谊医院 Gilbert综合征UGT1A1基因多位点变异检测试剂盒
CN114596918A (zh) * 2022-03-11 2022-06-07 苏州吉因加生物医学工程有限公司 一种检测突变的方法及装置
CN114596918B (zh) * 2022-03-11 2023-03-24 苏州吉因加生物医学工程有限公司 一种检测突变的方法及装置

Also Published As

Publication number Publication date
US20180371453A1 (en) 2018-12-27
CN105063208B (zh) 2018-03-06
HK1216184A1 (zh) 2016-10-21
CN105063208A (zh) 2015-11-18
US11001837B2 (en) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2017024784A1 (zh) 一种血浆中游离的目标dna低频突变富集测序方法
CN109880910B (zh) 一种肿瘤突变负荷的检测位点组合、检测方法、检测试剂盒及系统
CN104293938B (zh) 构建测序文库的方法及其应用
US11827942B2 (en) Methods for early detection of cancer
US20220098671A1 (en) Methods and systems for adjusting tumor mutational burden by tumor fraction and coverage
JP6905934B2 (ja) 腫瘍試料の多重遺伝子分析
CN104294371B (zh) 构建测序文库的方法及其应用
CN109427412B (zh) 用于检测肿瘤突变负荷的序列组合和其设计方法
CN109609647A (zh) 基于二代测序的用于泛癌种靶向、化疗及免疫用药的检测Panel、检测试剂盒及其应用
Liu et al. The contribution of hereditary cancer-related germline mutations to lung cancer susceptibility
US11384382B2 (en) Methods of attaching adapters to sample nucleic acids
CN113249483B (zh) 一种检测肿瘤突变负荷的基因组合、系统及应用
CN114480660A (zh) 一种用于检测泛癌种的基因Panel、探针及应用
WO2016049929A1 (zh) 构建测序文库的方法及其应用
US20230193355A1 (en) Methods and compositions for high-throughput target sequencing in single cells
CN114574576B (zh) 胆汁cfDNA在胆囊转移性癌症诊疗中的用途
EP3844309B1 (en) A method for diagnosing cancers of the genitourinary tract
EP3495494B1 (en) Method for determining presence or absence of risk of developing cancer
CN117524304B (zh) 实体瘤微小病灶残留的检测panel、探针组及其应用
US20240105279A1 (en) Methods and systems employing targeted next generation sequencing for classifying a tumor sample as having a level of homologous recombination deficiency similar to that associated with mutations in brca1 or brca2 genes
WO2024081859A2 (en) Methods and systems for performing genomic variant calls based on identified off-target sequence reads
KR20230132785A (ko) 샘플을 임상적으로 관련된 범주로 분류하기 위한 방법
KR20230133287A (ko) 샘플을 임상적으로 관련된 범주로 분류하기 위한 방법
TW202108773A (zh) Dna標記

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16834412

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30.05.2018)

122 Ep: pct application non-entry in european phase

Ref document number: 16834412

Country of ref document: EP

Kind code of ref document: A1