WO2022126938A1 - 一种检测多核苷酸变异的方法 - Google Patents

一种检测多核苷酸变异的方法 Download PDF

Info

Publication number
WO2022126938A1
WO2022126938A1 PCT/CN2021/086104 CN2021086104W WO2022126938A1 WO 2022126938 A1 WO2022126938 A1 WO 2022126938A1 CN 2021086104 W CN2021086104 W CN 2021086104W WO 2022126938 A1 WO2022126938 A1 WO 2022126938A1
Authority
WO
WIPO (PCT)
Prior art keywords
methylation
polynucleotide
hydroxymethylation
sequencing
analysis
Prior art date
Application number
PCT/CN2021/086104
Other languages
English (en)
French (fr)
Inventor
陈志伟
范建兵
Original Assignee
广州市基准医疗有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州市基准医疗有限责任公司 filed Critical 广州市基准医疗有限责任公司
Publication of WO2022126938A1 publication Critical patent/WO2022126938A1/zh
Priority to US18/335,453 priority Critical patent/US20240002953A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/16Primer sets for multiplex assays

Definitions

  • the invention belongs to the field of biotechnology, and in particular relates to a method for detecting polynucleotide variation by using methylation and hydroxymethylation surrogate markers.
  • Detection and quantification of polynucleotide variations i.e. somatic mutations (including single nucleotide variations (SNVs), insertions and deletions (InDels), fusions (Fusions) and copy number variations (CNVs)) for molecular biology and medical applications (such as diagnosis and prediction) have important implications.
  • SNVs single nucleotide variations
  • InDels insertions and deletions
  • Fusions fusions
  • CNVs copy number variations
  • next-generation sequencing NGS
  • NGS next-generation sequencing
  • Liquid biopsy is a method for monitoring cell-free nucleic acid samples from different types of bodily fluid sources. Compared with tissue specimens, this method has the following advantages: less invasiveness, real-time monitoring during treatment, simple and frequent detection, reduction and/or elimination of disease heterogeneity [Rossi, G. and M.J.C.r. Ignatiadis, Promises and pitfalls of using liquid biopsy for precision medicine. 2019.79(11):p.2798-2804.].
  • Methylation or hydroxymethylation at CpG sites is an epigenetic regulator of gene expression, often resulting in gene silencing or activation.
  • Extensive perturbation of DNA methylation has been noted in various diseases, especially in cancer, it causes changes in gene regulation that promote cancer development [Das, P.M. and R.J.J.o.c.o.Singal, DNA methylation and cancer.2004.22 (22): p.4632-4642.].
  • Certain changes in methylation are repeatedly found in almost all specific types of cancer, and these changes show great potential as biomarkers for early screening, treatment response prediction, and prognosis. This suggests that it is reasonable and feasible to use methylation or hydroxymethylation biomarkers as a surrogate to detect polynucleotide variants, thereby circumventing the limitations of traditional detection techniques.
  • a method for detecting polynucleotide variation comprising the following steps:
  • the polynucleotide comprises DNA.
  • the polynucleotide comprises RNA.
  • the polynucleotide variations comprise single nucleotide variations (SNVs).
  • the polynucleotide variation comprises insertions and/or deletions (InDels).
  • the polynucleotide variations comprise fusion polynucleotides (Fusions).
  • the polynucleotide variations comprise copy number changes (CNVs).
  • the biological sample includes a biological fluid sample such as blood, serum, plasma, vitreous, sputum, urine, tears, sweat, saliva, and the like.
  • the biological sample comprises a tissue sample.
  • the biological sample comprises a cell line sample.
  • the separation methods include phenol and/or chloroform-based DNA extraction methods, magnetic bead separation, and silica column separation.
  • the methylation and/or hydroxymethylation identification and characterization uses methylation-specific PCR methods.
  • the methylation and/or hydroxymethylation identification and characterization is detected using the MassARRAY (Agena) method.
  • the methylation and/or hydroxymethylation is identified and characterized using microarray hybridization technology.
  • the methylation and/or hydroxymethylation identification and characterization uses sequencing-based methods, preferably by combining bisulfite treatment to analyze 5-methylcytosine or 5-hydroxyl Methylcytosine distribution using whole-genome bisulfite sequencing or targeted methylation sequencing.
  • methods for inferring and/or determining polynucleotide variations include the use of bioinformatics analysis including determining optimal biomarkers by Spearman or Pearson analysis and/or Or model, preferably using random forest, LASSO regression, logistic regression, deep-learning network for modeling.
  • it also includes performing a simple quantitative detection method after the identification and characterization of the methylation and/or hydroxymethylation biomarker, the quantitative detection method including a method based on a high-throughput method Selected candidate markers in Methylation Specific Primer-based Extension Correlation Methods, Methylation Specific PCR (MSP), Methylation Specific qPCR Analysis, MassARRAY, Targeted Methylation Sequencing, etc.
  • MSP Methylation Specific PCR
  • MassARRAY Methylation Specific qPCR Analysis
  • Targeted Methylation Sequencing etc.
  • the single nucleotide variant gene includes AKT1, ALK, APC, AR, ARF, ARID1A, ATM, BRAF, BRCA1, BRCA2, CCND1, CCND2, CCNE1, CDH1, CDK4, CDK6, CDKN2A , CTNNB1, DDR2, EGFR, ERBB2, ESR1, EZH2, FBXW7, FGFR1, FGFR2, FGFR3, GATA3, GNA11, GNAQ, GNAS, HNF1A, HRAS, IDH1, IDH2, JAK2, JAK3, KIT, KRAS, MEK1, MEK2, ERK2 , ERK1, MET, MLH1, MPL, MTOR, MYC, NF1, NFE2LE, NOTCH1, NPM1, NRAS, NTRK1, NTRK3, PDGFRA, PI3CA, PTEN, PTPN11, RAF1, RB1, RET, RHEB, RHOA, RIT1, ROS
  • the inserted and/or deleted polynucleotides include ATM, APC, ARID1A, BRCA1, BRCA2, CDH1, CDKN2A, EGFR, ERBB2, GATA3, KIT, MET, MLH1, MTOR, NF1, PDGFRA , at least one of PTEN, RB1, SMAD4, STK11, TP53, TSC1, and VHL genes.
  • the fused polynucleotide comprises at least one of the ALK, FGFR2, FGFR3, NTRK1, RET, ROS1, EML4 genes.
  • the copy number variant polynucleotides include AR, BRAF, CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR, ERBB2 (HER2), FGFR1, FGFR2, KIT, KRAS, MET, MYC, PDGFRA, PI3CA, RAF1 and other genes.
  • the inferring and/or determining the polynucleotide variation comprises detection of ERBB2 (HER2) gene amplification (CNV).
  • HER2 HER2 gene amplification
  • the present invention detects polynucleotide variation by using methylation and hydroxymethylation surrogate markers.
  • the polynucleotide variation detection method of the present invention as a non-invasive auxiliary diagnosis method in cancer precision medicine, can be used for samples from various sources detection, and is particularly effective for identifying surrogate biomarkers in blood. According to the detection method of the present invention, only 1 ng of cell-free DNA (equivalent to 0.5 ml of blood sample) is extracted from plasma or serum to complete the detection, and the polynucleotide variation detection of the present invention can be used for disease detection, prediction, Precise treatment or postoperative monitoring.
  • Fig. 1 is a flow chart of the method for detecting polynucleotide variation in the embodiment of the present invention.
  • Figure 2 Flow chart of the process of non-invasive methylation detection of ERBB2 (HER2) amplification in gastric cancer.
  • FIG. 3 Identification of methylation biomarker-associated ERBB2 (HER2) amplification from gastric cancer tissue samples.
  • HER2 methylation biomarker-associated ERBB2
  • Figure 4 Simplified procedure of the methylation-specific qPCR assay for the detection of ERBB2 (HER2) amplification.
  • Figure 7 Effectiveness of methylation-specific qPCR to detect ERBB2 (HER2) amplification in gastric cancer plasma.
  • FIG. 8 is the average AUC of the test set results of the logistic regression modeling analysis in Example 2.
  • FIG. 9 is the average AUC of the results of the random forest modeling analysis test set in Example 2.
  • FIG. 10 is the average AUC of the test set results in Example 3.
  • FIG. 11 is the average AUC of the test set results in Example 4.
  • FIG. 12 is the average AUC of the test set results in Example 5.
  • the "plurality” mentioned in the present invention means two or more.
  • "And/or" which describes the association relationship of the associated objects means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • the character "/" generally indicates that the associated objects are an "or" relationship.
  • first, second used in the present invention is only used to distinguish names, and does not represent a specific number or order.
  • the present invention provides a method for detecting polynucleotide variations including single nucleotide variations (SNVs), insertions and deletions (InDels), fusions (Fusions) and copy number changes (CNVs) in biological samples.
  • the method includes sample preparation, or extraction and isolation of nucleic acids from biological samples; subsequent high-throughput methylation and/or hydroxymethylation analysis of polynucleotides by techniques known in the art; application of bioinformatics tools to identify Optimal correlation of methylation and/or hydroxymethylation markers and/or modeling to infer single nucleotide variations, insertions and deletions, fusions and copy number changes.
  • the method may also include a database or collection of different methylation and/or hydroxymethylation signatures for various diseases as an additional reference to aid in the detection of methylation and/or hydroxymethylation biomarkers; a Subsequent simplification and optimization of detection techniques for quantification of methylation and/or hydroxymethylation surrogate biomarkers. Therefore, the present invention provides a method for detecting polynucleotide variation (Fig. 1), which can be used for early diagnosis, companion diagnosis and prognosis of genetic diseases.
  • polynucleotide as used herein includes any related biopolymer.
  • the polynucleotides include, but are not limited to: DNA, RNA, amplicon, cDNA, dsDNA, ssDNA, plasmid DNA, cosmid DNA, high molecular weight (MW) DNA, chromosomal DNA, genomic DNA, viral DNA, bacterial DNA, mtDNA (mitochondrial DNA), mRNA, rRNA, tRNA, nRNA, siRNA, snRNA, snRNA, scaRNA, microRNA, dsRNA, ribozymes, riboswitches, and viral RNA (eg, retroviral RNA).
  • MW molecular weight
  • biological sample can be from a variety of sources, including human, mammalian, non-human mammalian, ape, monkey, chimpanzee, reptile, amphibian, or avian sources; in any form, such as 1) tissue-based, Including but not limited to fresh frozen tissue, formalin-fixed paraffin-embedded (FFPE) tissue specimens, etc.; 2) Body fluid materials from animal body fluids, including but not limited to blood, serum, plasma, vitreous, sputum, urine, tears, Sweat, saliva, semen, mucosal excretions, mucus, spinal fluid, amniotic fluid, lymph, etc. These free polynucleotides may originate from the fetus (via fluids extracted from the pregnant subject) or from the subject's own tissue; 3) cell lines.
  • tissue-based Including but not limited to fresh frozen tissue, formalin-fixed paraffin-embedded (FFPE) tissue specimens, etc.
  • FFPE formalin-fixed paraffin-em
  • Isolation, purification and preparation of polynucleotides can be performed by various techniques known in the art. Suitable methods include the methods described herein in the Examples, as well as variants of these methods, including but not limited to treatment with proteinase K followed by phenol and/or chloroform extraction methods or commercial kits [Laird, P.W., et al., Simplified mammalian DNA isolation procedure. 1991.19(15):p.4293], and column-based or particle-based (or bead-based separations) provided by Sigma-Aldrich, Life Technologies, Qiagen, Promega, Affymetrix, IBI, and similar companies separation method. Kits and extraction methods may also be non-commercial.
  • polynucleotides are first extracted by separation techniques, such as in cell-free DNA, which separates cell-free DNA from other insoluble components of cells and biological samples.
  • separation techniques may include, but are not limited to, techniques such as centrifugation or filtration.
  • DNA can be precipitated using isopropanol precipitation after adding buffer and other washing steps specific to different kits. Further washing steps, such as silica gel columns, can then be used to remove contaminants or salts. This common procedure can be optimized for specific applications. The purpose of this step is to allow purification of DNA or RNA from a larger number of samples and to increase the amount of detectable polynucleotide material (DNA or RNA in most cases) to facilitate analysis and improve accuracy.
  • polynucleotides may be combined with one or more additional materials or reagents (eg, ligases, proteases, restriction enzymes) after isolation and prior to analysis by downstream high throughput analytical techniques (eg, sequencing-based methods). enzyme, polymerase, etc.) premixed.
  • additional materials or reagents eg, ligases, proteases, restriction enzymes
  • isolated nucleotides in the sample can also be amplified.
  • standard nucleic acid amplification systems including PCR, ligase chain reaction, nucleic acid sequence-based amplification (NASBA), isothermal amplification methods (eg, multiple displacement amplification (MDA), helicase-dependent amplification ( HDA)), branched-chain DNA method, etc.
  • MDA multiple displacement amplification
  • HDA helicase-dependent amplification
  • branched-chain DNA method etc.
  • the preferred amplification method usually involves PCR.
  • a polynucleotide After a polynucleotide has been extracted and isolated from a biological sample, it is subjected to a process to determine whether the polynucleotide is methylated at a given site.
  • This treatment can be of any type, including chemical or enzymatic conversion methods. Preferred chemical conversion methods include treatment with commercial or non-commercial bisulfite (Bisulfite).
  • the enzymatic transformation method can be a commercial or non-commercial TET-APOBEC based transformation method. After conversion, methylation analysis can be performed to determine the methylation status of multiple CpG sites in the polynucleotide sequence.
  • biotechnologies known in the art can be used, including but not limited to: 1) Microarray hybridization technology, such as Illumina's Infinium HumanMethylation450 BeadChip (HM450K), Infinium CytoSNP-850K BeadChip or any custom-designed array ( Affymetrix) et al. [Sandoval, J., et al., Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. 2011.6(6):p.692-702.]; 2) combined with bisulfite-treated Sequencing-based method to analyze 5-methylcytosine distribution.
  • Microarray hybridization technology such as Illumina's Infinium HumanMethylation450 BeadChip (HM450K), Infinium CytoSNP-850K BeadChip or any custom-designed array ( Affymetrix) et al.
  • HM450K Infinium HumanMethylation450 BeadChip
  • Affymetrix Affymetrix
  • Sequencing methods may include, but are not limited to: Sanger sequencing, high-throughput sequencing, pyrosequencing, sequencing-by-synthesis, single-molecule sequencing, nanopore sequencing, semiconductor sequencing, sequencing by ligation, sequencing by hybridization, digital gene expression (Helicos), next-generation sequencing , Single Molecule Sequencing by Synthesis (SMSS) (Helicos), Massively Parallel Sequencing, Clonal Single Molecule Arrays (Solexa/Illumina), Shotgun Sequencing, Maxim Gilbert Sequencing, Primer Walking, Using PacBio, SOLiD, Ion Torrent or Nanopore Platforms Sequencing and any other sequencing method known in the art.
  • a sequencing method can include multiple sample processing units.
  • a sample processing unit may include, but is not limited to, multi-channel, multi-channel, multi-well, or other devices that can process multiple sample sets simultaneously. Additionally, the sample processing unit may include multiple sample chambers to allow simultaneous processing of multiple samples. In some embodiments, episomal polynucleotides comprising multiple different types can be sequenced. Nucleic acids can be polynucleotides or oligonucleotides, including but not limited to DNA or RNA.
  • Bioinformatics tools involve two parts: 1) Converting the raw data of the high-throughput platform into relative quantitative assays, which will allow downstream computation and analysis of changes.
  • bioinformatic tools are established in the art, such as array-based data, such as data from Illumina's HM450K, typically quantify the relative abundance of methylated and unmethylated sites in fluorescence intensity, and can use Illumina
  • array-based data such as data from Illumina's HM450K
  • the provided software performs the transformation; bisulfite-converted data, such as from whole-genome sulfite sequencing or targeted methylation sulfation sequencing, involve methylation calls of individual Cs, and statistical tests are required to assess differential methylation: Includes sequencing adapter adjustment, sequencing read quality assessment, reference genome-based calibration, and methylation degree calculation assessment.
  • Cutadapt fine-tuning
  • Cutadapt removes adapter sequences from high-throughput sequencing reads. 2011.17(1):p.10-12.
  • Bismark calibration
  • Krueger, F. and S.R.J.b. Andrews Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. 2011.27(11): p.1571-1572.
  • UCSC Genome Browser data visualization
  • methygo post-alignment analysis
  • Beta value typically estimate methylation levels by the ratio of intensities between methylated and unmethylated alleles; 2) determine the best beta for characterizing DNA mutations and other variants; Baseline markers, which can be done by simple correlation analysis using a single biomarker (such as Spearman analysis or Pearson analysis), or by modeling with multiple markers simultaneously, such as random forest regression [Liaw, A. and M.J.R.n.
  • targeted methylation and/or hydroxymethylation patterns can be optimized into simple quantitative detection methods using existing techniques, including but not limited to oligonucleotide arrays, massARRAY , MS-based primer extension methods, methylation-specific PCR (MSP) and methylation-specific qPCR analysis.
  • MSP is a mature technology for detecting the degree of gene methylation in selected gene sequences [Herman, J.G., et al., Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. 1996.93 (18):p.9821-9826.].
  • Methylation-specific qPCR detection is a high-throughput quantitative methylation detection method that uses fluorescent real-time PCR technology (TaqMan.RTM) PCR primers to distinguish methylated and unmethylated DNA, which are amplified in PCR amplification. No more operations such as electrophoresis and hybridization are required at the end of the increase, which reduces contamination and operational errors [Eads, C.A., et al., MethyLight: a high-throughput assay to measure DNA methylation. 2000.28(8):p.e32-00 .].
  • This real-time quantitative PCR reaction includes methylation-sensitive probes complementary to the methylation sites to be detected, such as TaqMan probes.
  • the salt-converted methylated DNA-specific TaqMan probe hybridizes with the substrate nucleotide, a fluorescent signal is released, and the signal intensity is proportional to the amount of PCR product, from which the methylation degree of the sample can be calculated.
  • Gastric cancer is the fifth most common cancer in the world and the second in Asia.
  • the human epidermal growth factor receptor 2 (ERBB2 (HER2)) gene is amplified or overexpressed in 9% to 38% of gastric cancer patients [Rüschoff, J., et al., HER2 testing in gastric cancer: a practical approach .2012.25(5):p.637-650.].
  • Phase III trastuzumab in gastric cancer showed that the combination of chemotherapy and trastuzumab, a monoclonal HER2-inhibiting antibody, improved survival compared to chemotherapy alone [Van Cutsem, E., et al., Efficacy results from the ToGA trial:A phase III study of trastuzumab added to standard chemotherapy(CT) in first-line human epidermal growth factor receptor 2(HER2)-positive advanced gastric cancer(GC).2009.27(18_suppl):p.LBA4509 -LBA4509.].
  • CT human epidermal growth factor receptor 2
  • tumor tissue should be assessed for HER2 overexpression and/or amplification by immunohistochemistry (IHC) and fluorescence or silver in situ hybridization (FISH or SISH) [ Carlson, R.W., et al., HER2 testing in breast cancer: NCCN Task Force report and recommendations. 2006.4(S3):p.S-1-S-22].
  • IHC immunohistochemistry
  • FISH fluorescence or silver in situ hybridization
  • the IHC method is more popular for the detection of HER2 protein expression because of its cost and operational advice
  • the FISH/SISH method is the gold standard for detecting the CNV status of the HER2 gene.
  • the present invention provides a non-invasive method for HER2 amplification analysis in liquid biopsy based on methylation technology (Fig. 2).
  • Tissue FFPE and plasma samples of gastric cancer patients were obtained from the Department of Pathology, Southern Medical University, Guangzhou. The project was approved by the Southern University Medical Ethics Committee. Patient informed consent was obtained in each case. 2-5 FFPE slide samples were collected from each patient postoperatively, and 3-5 mL of plasma was collected from each patient using a vacuum blood collection tube (BD, Cat#367525) prior to surgery. The HER2 amplification status of each patient, determined by immunohistochemical staining, was obtained from the hospital's official pathology report.
  • Tissue genomic DNA was isolated from FFPE tissue samples using the QIAamp-DNA-FFPE tissue kit (Qiagen, Cat#56404).
  • Cell-free DNA (cfDNA) was isolated from plasma using the Qiagen-Qiamp Circulating Nucleic Acid Kit (Qiagen, Cat#55114). Avoid repeated freezing and thawing of plasma to prevent cfDNA degradation.
  • concentration and quality of cfDNA were determined by a 2100 Bioanalyzer (Agilent) using the Qubitds TM DNA HS Assay Kit (Thermo Fisher Scientific, Cat#Q32854) and the Agilent High Sensitivity DNA Kit (Cat#5067-4626). Sequencing library construction was performed on cfDNA with yields greater than 3 ng and without excessive genomic DNA contamination.
  • Bisulfite conversion of cfDNA was performed using Zymo Lightning Conversion Reagent (Zymo Research, Cat#D5031) following the kit instructions, passed through a Zymo Spin TM IC column, washed and desulfurized, and bisulfite converted DNA eluted with M- The buffer was eluted twice to a final volume of 17 ⁇ L.
  • genomic DNA was fragmented to ⁇ 200 bp (peak size) using an M220 focused sonicator (Covaris, Inc.) according to the manufacturer's instructions, after which 800 ng of purified fragmented genomic DNA was used for bisulfite conversion. After bisulfite conversion and purification, bisulfite converted DNA was quantified by NanoDrop (Thermo Fisher Scientific) at A260. Then, 150 ng of the bisulfite conversion product was used for library preparation of FFPE tissue samples.
  • NGS pre-library preparation was accomplished using the AnchorDx-EpivisionTM Methylation Library Preparation Kit (AnchorDx, Cat#A0UX00019) and the AnchorDx-EpiVisioTM Indexed PCR Kit (AnchorDx, Cat#A2DX00025).
  • the amplified DNA was purified using 1:6 Agencourt AMPure XP magnetic beads (Beckman Coulter, Cat#A63882).
  • the amplified pre-libraries were purified with XP magnetic beads after ligation of the 3' end adapters of the reverse complementary DNA and indexing PCR (i5 and i7).
  • Pre-hybridized library DNA containing more than 800ng can be used for subsequent targeted enrichment analysis.
  • Target enrichment was performed using the AnchorDx-EpivisionTM target enrichment kit (AnchorDx, Cat#A0UX00031) and methylation panels, AnchorDx BrGcMet panels. 1000 ng of DNA containing up to 4 prehybridized libraries was pooled for targeted enrichment using the AnchorDx-BrGcMet methylation panel.
  • the AnchorDx-BrGcMet panel includes 12,892 preselected regions enriched for cancer-specific methylation, and the total size of the targeted genomic regions includes 123,269 CpG sites.
  • the enriched library was sequenced by the Illumina HiSeq X-Ten sequencing system according to the instructions.
  • the beta value ( ⁇ ) was defined by the ratio of allelic methylation and unmethylation intensities and was used to estimate methylation levels. Beta values are between 0 and 1, with 0 being unmethylated and 1 being fully methylated [Du, P., et al., Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. 2010.11(1):p.587.].
  • Methylation markers were designed and optimized, and methylation-specific qPCR analysis (AnchorDx, China) was performed according to the instruction manual.
  • EpiTect PCR Control DNA Set (Qiagen, Germany) was set as positive control and negative control.
  • qPCR reactions were performed on a QuantStudio 3 real-time PCR system (Thermo Fisher, USA) using an Epimark qPCR reaction system (NEB, Cat#M0490) under the following cycling conditions: denaturation at 98°C for 30s, 40 cycles (95°C for 10s, 62°C for 62°C) 20s).
  • the recommended amount of bisulfite-converted cfDNA is 10 ng. All purified cfDNA was used for bisulfite conversion when the cfDNA yield was between 1 and 10 ng. After bisulfite conversion, we used all bisulfite-converted cfDNA for subsequent methylation-specific qPCR detection.
  • An artificial ⁇ Ct of 35 was assigned for regions with uncertain Ct values.
  • HER2 was graded according to the methylation biomarker ⁇ Ct value (scores were based on a linear regression model of 2 methylation markers), and based on this score, we found that gastric cancer HER2+ and HER2- cell lines and breast There were significant differences between cancer HER2+ and HER2- cell lines (Figure 6).
  • LASSO east Absolute Convergence and Selection Operator
  • RF Random Forest
  • LR Linear Regression
  • Example 2 Analysis of insertion and/or deletion mutations in the ERBB2 gene (INDEL) in lung cancer tissue samples using methylation biomarkers
  • Lung cancer is one of the cancers with the highest morbidity and mortality in the world, ranking first among all cancers in China.
  • the ERBB2 (HER2) gene belongs to the human epidermal growth factor receptor family (HER), and HER2 gene mutations are widely found in many solid tumors, including breast, gastric, and lung cancers.
  • ERBB2 mutation is one of the common driver mutation genes in lung cancer, which can be detected in 2-4% of lung cancers, with exon 20/insertion mutation (INDEL) being the most common, which can activate kinase activity and downstream signaling pathways to promote Cell survival and tumorigenesis [Wang SE, et al. HER2 kinase domain mutation results in constitutive phosphorylation and activation of HER2 and EGFR and resistance to EGFR tyrosine kinase inhibitors. 2006;10(1):25-38.].
  • FFPE tissue from lung cancer patients was collected from the First affiliated Hospital of Guangzhou Medical University, Guangzhou.
  • the project was approved by the University Medical Ethics Committee of the First affiliated Hospital of Guangzhou Medical University. Patient informed consent was obtained in each case.
  • 2-5 FFPE slide samples were collected from each patient, and the relevant patient's personal pathological information was obtained from the hospital's official pathology report.
  • Example 1 For DNA extraction of tissue samples, bisulfite conversion, library construction, and methylation sequencing, please refer to Example 1 for details.
  • Lung cancer is one of the cancers with the highest morbidity and mortality in the world, ranking first among all cancers in China.
  • the protein encoded by the ATM gene belongs to the PI3/PI4 kinase family, which is an important cell cycle checkpoint kinase that regulates a series of downstream important proteins through phosphorylation, including tumor suppressor proteins p53 and BRCA1, checkpoint kinase CHK2,
  • the dot proteins RAD17 and RAD9 and the DNA repair protein NBS1 are mainly involved in the DNA damage repair process and the maintenance of genome stability.
  • ATM kinase in lung cancer cells can be used as a new tumor marker to measure the sensitivity of patients to MEK inhibitors. It can greatly improve the diagnosis and follow-up treatment of patients with this subtype, and is expected to expand the application of this type of drugs in patients with tumors other than RAS and BRAF mutations [Ji X, et al. Protein-altering germline mutations implicate novel genes related to lung cancer development. 2020;11(1):1-14.].
  • FFPE tissue from lung cancer patients was collected from the First affiliated Hospital of Guangzhou Medical University, Guangzhou.
  • the project was approved by the University Medical Ethics Committee of the First affiliated Hospital of Guangzhou Medical University. Patient informed consent was obtained in each case.
  • 2-5 FFPE slide samples were collected from each patient, and the relevant patient's personal pathological information was obtained from the hospital's official pathology report.
  • FFPE tissue samples were subjected to whole-genome sequencing analysis by a third party (ClearCode Biotechnology Co., Ltd.) DNA extraction, bisulfite conversion, library construction, and methylation sequencing of tissue samples refer to Example 1.
  • Lung cancer is one of the cancers with the highest morbidity and mortality in the world, ranking first among all cancers in China.
  • EGFR is a member of the epidermal growth factor receptor (HER) family, which is widely distributed on the surface of mammalian epithelial cells, fibroblasts, glial cells, keratinocytes and other cells. process plays an important role.
  • HER epidermal growth factor receptor
  • EGFR is one of the most common driver genes in non-small cell lung cancer (NSCLC).
  • NSCLC non-small cell lung cancer
  • the detection of EGFR gene is mainly used for the evaluation of patients with advanced non-small cell lung cancer before treatment. Patients can have corresponding targeted drugs, the effective rate is as high as 60%-70%, and the side effects are relatively small.
  • the birth of EGFR-TKI targeted drugs has significantly improved the survival of patients with EGFR mutation-positive advanced lung cancer, and allowed lung cancer.
  • the clinical treatment has entered the era of precision treatment.
  • the most common mutation sites of EGFR gene are located in exons 18-21, of which the mutation of exon 18 is G719X, the mutation of exon 19 is E19del, the mutation of exon 20 is T790M, S768I and E20ins, and the mutation of exon 21 is The exons are L858R and L861Q.
  • the deletion mutation E19del in exon 19 and the point mutation L858R in exon 21 are the most common, and are the main population for oral EGFR targeted drug therapy [Yamamoto H, Toyooka S, and Mitsudomi TJLc.Impact of EGFR mutation analysis in non -small cell lung cancer. 2009;63(3):315-21.].
  • FFPE tissue from patients with non-small cell lung cancer was collected from the First affiliated Hospital of Guangzhou Medical University, Guangzhou.
  • the project was approved by the University Medical Ethics Committee of the First affiliated Hospital of Guangzhou Medical University. Patient informed consent was obtained in each case.
  • 2-5 FFPE slide samples were collected from each patient, and the relevant patient's personal pathological information was obtained from the hospital's official pathology report.
  • FFPE tissue samples were subjected to whole-genome sequencing analysis by a third party (ClearCode Biotechnology Co., Ltd.) DNA extraction, bisulfite conversion, library construction, and methylation sequencing of tissue samples refer to Example 1.
  • Results Identification and modeling analysis of EGFR EXON 21L858R point mutation-related methylation surrogate biomarkers in non-small cell lung cancer tissues.
  • Lung cancer is one of the cancers with the highest morbidity and mortality in the world, ranking first among all cancers in China.
  • P53 gene is an important tumor suppressor gene. Deletion or mutation of p53 gene is found in 50% of human tumors, which is closely related to the occurrence and development of tumors.
  • Mutation of p53 gene is one of the important reasons for the occurrence of many tumors, including lung cancer. Gene mutations mainly include point mutation and deletion of alleles. It has been reported that in about 200 different tumors, 50% of the tumors carry p53 gene mutation. Four mutation hotspots in exons 5-8 have been found in p53 gene. Although the mutation spectrum of p53 gene is different in tumors of different tissues and organs, about 90% of the mutations are concentrated in this part of the region. They encode amino acids 132-143, 174-179, 236-248 and 272-281, respectively [Rodin SN, and Rodin ASJ PotNAoS. Human lung cancer and p53:the interplay between mutagenesis and selection. 2000;97(22):12244- 9.].
  • FFPE tissue from lung cancer patients was collected from the First affiliated Hospital of Guangzhou Medical University, Guangzhou.
  • the project was approved by the University Medical Ethics Committee of the First affiliated Hospital of Guangzhou Medical University. Patient informed consent was obtained in each case.
  • 2-5 FFPE slide samples were collected from each patient, and the relevant patient's personal pathological information was obtained from the hospital's official pathology report.
  • FFPE tissue samples were subjected to whole-genome sequencing analysis by a third party (ClearCode Biotechnology Co., Ltd.) DNA extraction, bisulfite conversion, library construction, and methylation sequencing of tissue samples refer to Example 1.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Physics & Mathematics (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Pathology (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Oncology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Hospice & Palliative Care (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)

Abstract

本发明涉及一种利用推测甲基化和羟基甲基化替代标记检测多核苷酸变异的方法,所述方法包括以下步骤:1)从生物样品中分离多核苷酸;2)甲基化和/或羟甲基化生物标志物的鉴定和表征;3)鉴定相关甲基化和/或羟甲基化标记或根据候选标记建立模型来推断和/或检测多核苷酸变异。本发明所述多核苷酸变异检测方法作为癌症精准医学中的无创辅助诊断方法,对于识别血液中的替代生物标记物尤为有效。本发明所述检测多核苷酸变异可用于疾病的检测、预测、精准治疗或者术后的监测。

Description

一种检测多核苷酸变异的方法 技术领域
本发明属于生物技术领域,具体涉及一种利用甲基化和羟基甲基化替代标记物检测多核苷酸变异的方法。
背景技术
多核苷酸变异,即体细胞突变(包括单核苷酸变异(SNVs)、插入和缺失(InDels)、融合(Fusions)和拷贝数变异(CNVs)的检测和定量)对于分子生物学和医学应用(如诊断和预测)具有重要意义。“个性化医疗”越来越多地被称为“精准医学”,其核心目标是将患者特有的基因信息与患者的基因特征相匹配的治疗方案结合起来[Ashley,E.A.J.N.R.G.,Towards precision medicine.2016.17(9):p.507.]。为了实现这一目标,必须建立可靠的基因检测,以此可靠地确定相关基因的遗传状态,例如,由基因改变(如多核苷酸变异)或表观遗传变化(如DNA甲基化和DNA羟甲基化)引起的疾病。
对遗传疾病(如癌症)在基因畸变(如单核苷酸突变(SNVs)、插入和缺失(InDels)、融合(Fusions)和拷贝数变异(CNVs))水平上进行早期检测和监测,通常是对患者进行适当治疗、遗传咨询和预防策略所必须的[Garofalo,A.,et al.,The impact of tumor profiling approaches and genomic data strategies for cancer precision medicine.2016.8(1):p.1-10.]。目前已开发了多种直接检测遗传变异的方法,如聚合酶链式反应(PCR)、多重连接探针扩增(MLPA)和DNA芯片技术[Jameson,J.L.,D.L.J.O.Longo,and g.survey,Precision medicine—personalized,problematic,and promising.2015.70(10):p.612-614.]。近 年来,二代测序(NGS)等技术出现并且其得到了极大的改进,能够快速、高通量和高准确率检测出多个遗传变异[Dong,L.,et al.,Clinical next generation sequencing for precision medicine in cancer.2015.16(4):p.253-263.]。
生物样本的不同类型(如血液、组织等)会极大地影响检测方法的可靠性。液体活检是一种监测从不同类型体液来源的游离核酸样本检测方法。与组织标本相比,这种方法具有以下优点:侵入性小,治疗时可实时监测,简便且可频繁检测,减少和/或消除疾病的异质性[Rossi,G.and M.J.C.r.Ignatiadis,Promises and pitfalls of using liquid biopsy for precision medicine.2019.79(11):p.2798-2804.]。然而,由于体液中核酸的数量有限,传统的检测方法往往存在灵敏度有限、信噪比低等问题[Wang,J.,et al.,Application of liquid biopsy in precision medicine:opportunities and challenges.2017.11(4):p.522-527.]。因此,本领域需要改进技术和/或系统来检测遗传变异,使用替代策略,例如替代生物标记物来检测和监测疾病。
CpG位点的甲基化或羟甲基化是基因表达的表观遗传调控因子,通常导致基因沉默或激活。DNA甲基化的广泛扰动已经在各种疾病中被注意到,特别是在癌症中,它会引起基因调控的改变,从而促进癌症的发生[Das,P.M.and R.J.J.o.c.o.Singal,DNA methylation and cancer.2004.22(22):p.4632-4642.]。甲基化的某些变化在几乎所有的特定类型的癌症中都被重复发现,这些变化显示出作为早期筛查、治疗反应预测和预后的生物标志物的巨大潜力。这说明用甲基化或羟甲基化生物标志物作为一种替代物来检测多核苷酸变异,从而规避传统检测技术的局限性是合理和可行的。
发明内容
实现上述目的的技术方案如下。
一种检测多核苷酸变异的方法,包括以下步骤:
1)从生物样品中分离多核苷酸;
2)对甲基化和/或羟甲基化生物标志物的鉴定和表征;
3)鉴定相关甲基化和/或羟甲基化标记物或根据候选标记物建立模型来推断和/或确定多核苷酸变异。
在其中一些实施例中,所述多核苷酸包含DNA。
在其中一些实施例中,所述多核苷酸包含RNA。
在其中一些实施例中,所述多核苷酸变异包括单核苷酸变异(SNVs)。
在其中一些实施例中,所述多核苷酸变异包括插入和/或缺失(InDels)。
在其中一些实施例中,所述多核苷酸变异包括融合多核苷酸(Fusions)。
在其中一些实施例中,所述多核苷酸变异包括拷贝数变化(CNVs)。
在其中一些实施例中,所述生物样品包括生物流体样品,例如血液、血清、血浆、玻璃体、痰、尿、眼泪、汗液、唾液等。
在其中一些实施例中,所述生物样品包括组织样品。
在其中一些实施例中,所述生物样品包括细胞系样品。
在其中一些实施例中,所述分离方法包括基于苯酚和/或氯仿的DNA提取方法,磁珠分离法和硅胶柱分离法。
在其中一些实施例中,所述甲基化和/或羟甲基化鉴定和表征使用甲基化特异性PCR方法。
在其中一些实施例中,所述甲基化和/或羟甲基化鉴定和表征使用MassARRAY(Agena)方法检测。
在其中一些实施例中,所述甲基化和/或羟甲基化鉴定和表征使用微阵列杂交技术。
在其中一些实施例中,所述甲基化和/或羟甲基化鉴定和表征使用基于测序的方法,优选地,通过结合亚硫酸氢盐处理来分析5-甲基胞嘧啶或5-羟甲基胞嘧啶分布,使用全基因组亚硫酸氢盐测序或靶向甲基化测序。
在其中一些实施例中,用于推断和/或确定多核苷酸变异的方法包括使用生物信息学分析,该生物信息学分析包括通过斯皮尔曼分析或皮尔森分析确定最佳生物标记物和/或模型,优选为使用随机森林、LASSO回归、逻辑回归、deep-learning network进行建模。
在其中一些实施例中,还包括在所述甲基化和/或羟甲基化生物标志物的鉴定和表征之后,进行简单的定量检测方法,所述定量检测方法包括基于从高通量方法中选择的候选标记,基于甲基化特异性引物的延伸相关方法、甲基化特异性PCR(MSP)、甲基化特异性qPCR分析、MassARRAY、靶向甲基化测序等。
在其中一些实施例中,所述单核苷酸变异的基因包括AKT1、ALK、APC、AR、ARF、ARID1A、ATM、BRAF、BRCA1、BRCA2、CCND1、CCND2、CCNE1、CDH1、CDK4、CDK6、CDKN2A、CTNNB1、DDR2、EGFR、ERBB2、ESR1、EZH2、FBXW7、FGFR1、FGFR2、FGFR3、GATA3、GNA11、GNAQ、GNAS、HNF1A、HRAS、IDH1、IDH2、JAK2、JAK3、KIT、KRAS、MEK1、MEK2、ERK2、ERK1、MET、MLH1、MPL、MTOR、MYC、NF1、NFE2LE、NOTCH1、NPM1、NRAS、NTRK1、NTRK3、PDGFRA、PI3CA、PTEN、PTPN11、RAF1、RB1、RET、RHEB、RHOA、RIT1、ROS1、SMAD4、SMO、STK11、TERT、TP53、TSC1、VHL基因中的至少一种。
在其中一些实施例中,所述插入和/或缺失的多核苷酸包括ATM、APC、ARID1A、BRCA1、BRCA2、CDH1、CDKN2A、EGFR、ERBB2、GATA3、KIT、MET、MLH1、MTOR、NF1、PDGFRA、PTEN、RB1、SMAD4、STK11、TP53、TSC1、VHL基因中的至少一种。
在其中一些实施例中,所述融合的多核苷酸包含ALK、FGFR2、FGFR3、NTRK1、RET、ROS1、EML4基因中的至少一种。
在其中一些实施例中,所述拷贝数变异的多核苷酸包括AR、BRAF、CCND1、CCND2、CCNE1、CDK4、CDK6、EGFR、ERBB2(HER2)、FGFR1、FGFR2、KIT、KRAS、MET、MYC、PDGFRA、PI3CA、RAF1等基因。
在其中一些实施例中,所述推断和/或确定多核苷酸变异包括对ERBB2(HER2)基因扩增(CNV)检测。
本发明通过利用甲基化和羟基甲基化替代标记物来检测多核苷酸变异,本发明所述多核苷酸变异检测方法作为癌症精准医学中的无创辅助诊断方法,可用于各种来源的样本检测,且对于识别血液中的替代生物标记物尤为有效。根据本发明所述检测方法,仅需从血浆或血清中提取1ng的游离DNA(相当于0.5ml的血样)即可完成检测,本发明所述检测多核苷酸变异可用于疾病的检测、预测、精准治疗或者术后的监测。
附图说明
图1本发明实施例中用于检测多核苷酸变异方法的流程图。
图2非侵入性甲基化检测胃癌中ERBB2(HER2)扩增过程的流程图。
图3从胃癌组织样本中鉴定甲基化生物标记物相关的ERBB2(HER2)扩增。
图4甲基化特异性qPCR检测ERBB2(HER2)扩增方法的简化过程。
图5甲基化特异性qPCR分析在独立组织样本中检测ERBB2(HER2)扩增的有效性。
图6甲基化特异性qPCR检测胃癌和乳腺癌细胞株ERBB2(HER2)扩增的有效性。
图7甲基化特异性qPCR检测胃癌血浆中ERBB2(HER2)扩增的有效性。
图8为实施例2中逻辑回归建模分析的测试集结果的AUC平均值。
图9为实施例2中随机森林建模分析测试集结果的AUC平均值。
图10为实施例3中测试集结果的AUC平均值。
图11为实施例4中测试集结果的AUC平均值。
图12为实施例5中测试集结果的AUC平均值。
具体实施方式
本发明下列实施例未注明具体条件的实验方法,通常按照常规条件,或按照制造厂商所建议的条件。实施例中所用到的各种常用化学试剂,均为市售产品。
除非另有定义,本发明所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不用于限制本发明。
本发明的术语"包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤的过程、方法、装置、产品或设备没有限定于已列出的步骤或组分,而是可选地还包括没有列出的步骤,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或组分。
在本发明中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存 在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
除非特别说明或另有定义,本发明所使用的“第一、第二…”仅仅是用于对名称的区分,不代表具体的数量或顺序。
为了便于理解本发明,下面将对本发明进行更全面的描述。本发明可以以许多不同的形式来实现,并不限于本发明所描述的实施例。相反地,提供这些实施例的目的是使对本发明公开内容的理解更加透彻全面。
本发明提供了一种检测生物样品中包括单核苷酸变异(SNVs)、插入和缺失(InDels)、融合(Fusions)和拷贝数变化(CNVs)的多核苷酸变异的方法。该方法包括样品制备,或从生物样品中提取和分离核酸;随后通过本领域已知技术对多核苷酸进行高通量甲基化和/或羟甲基化分析;应用生物信息学工具来识别最佳相关甲基化和/或羟甲基化标记和/或建模从而推断单核苷酸变异、插入和缺失、融合和拷贝数变化。该方法还可包括,各种疾病的不同甲基化和/或羟甲基化特征的数据库或集合,以此作为辅助检测甲基化和/或羟甲基化生物标记物的附加参考;甲基化和/或羟甲基化替代生物标记物定量的检测技术后续简化与优化。因此,本发明提供了的检测多核苷酸变异的方法,(图1),该方法可用于基因疾病早期诊断、伴随诊断和预后。
本发明所用术语“多核苷酸”包括任何相关的生物聚合物。所述多核苷酸包括但不限于:DNA、RNA、扩增子、cDNA、dsDNA、ssDNA、质粒DNA、cosmid DNA、高分子量(MW)DNA、染色体DNA、基因组DNA、病毒DNA、细菌DNA、mtDNA(线粒体DNA)、mRNA、rRNA、tRNA、nRNA、siRNA、snRNA、snRNA、scaRNA、microRNA、dsRNA、核酶,核糖开关和病毒RNA(如逆转录病毒RNA)。
所用术语“生物样品”可来自各种来源,包括人类、哺乳动物、非人类哺乳动物、猿、猴、黑猩猩、爬行动物、两栖动物或鸟类等来源;可以任何形式,例如1)基于组织,包括但不限于新鲜冷冻组织,福尔马林固定石蜡包埋(FFPE)组织标本等;2)来自动物体液的体液材料,包括但不限于血液、血清、血浆、玻璃体、痰、尿、眼泪、汗液、唾液、精液、黏膜排泄物、粘液、脊髓液、羊水、淋巴液等。这些游离多核苷酸可能来自胎儿(通过从怀孕的受试者身上提取的液体),也可能来自受试者自身的组织;3)细胞系。
多核苷酸的分离、纯化和制备可以通过本领域已知的各种技术来进行。合适的方法包括本文在实例中描述的方法,以及这些方法的变体,包括但不限于使用蛋白酶K处理,然后进行苯酚和/或氯仿萃取方法或商用试剂盒[Laird,P.W.,et al.,Simplified mammalian DNA isolation procedure.1991.19(15):p.4293],以及由Sigma-Aldrich、Life Technologies、Qiagen、Promega,Affymetrix,IBI以及类似公司提供的基于分离柱或者基于微粒(或磁珠分离法)的分离方法。试剂盒和提取方法也可能是非商业化的。一般来说,多核苷酸首先是通过分离技术进行提取的,例如在游离DNA里,从细胞和生物样品的其他不可溶成分中分离出游离DNA。该分离技术可包括但不限于离心或过滤等技术。在添加缓冲液和其他特定的于不同试剂盒中的洗涤步骤后,可以使用异丙醇沉淀法沉淀DNA。之后可采用进一步的洗涤步骤,如硅胶柱,以去除污染物或盐类。该常用步骤可针对特定应用进行优化。这一步骤的目的是允许从更大量的样本中纯化DNA或RNA,并增加可检测用的多核苷酸材料(大多数情况下是DNA或RNA)的数量,从而有利于分析并提高准确性。
在一些实施例中,多核苷酸在分离之后,在下游高通量分析技术(例如基于测序的方法)分析之前,可与一种或多种附加材料或试剂(例如,连接酶、 蛋白酶、限制性酶、聚合酶等)预混合。
在一些实施例中,也可对样品中分离的核苷酸进行扩增。例如使用标准的核酸扩增系统,包括PCR、连接酶链反应、基于核酸序列的扩增(NASBA)、等温扩增方法(例如,多重置换扩增(MDA)、解旋酶依赖性扩增(HDA))、支链DNA方法等。首选的扩增方法通常包括PCR。
多核苷酸从生物样品中提取和分离后,需经过一种处理,以确定多核苷酸在一个给定的位点是否经过甲基化。这种处理可以是任何类型的,包括化学或酶转化方法。优选化学转化方法包括使用商业化或非商业化的亚硫酸氢盐(Bisulfite)处理。酶转化法可以是商业化的或非商业化的基于TET-APOBEC的转化法。转换后,可进行甲基化分析,以确定多核苷酸序列中多个CpG位点的甲基化状态。为了达到该目的,可采用本领域已知各种生物技术,包括但不限于:1)微阵列杂交技术,例如Illumina的Infinium HumanMethylation450 BeadChip(HM450K)、Infinium CytoSNP-850K BeadChip或任何定制设计的阵列(Affymetrix)等[Sandoval,J.,et al.,Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.2011.6(6):p.692-702.];2)结合亚硫酸氢盐处理的基于测序的方法分析5-甲基胞嘧啶分布。测序方法可包括但不限于:Sanger测序、高通量测序、焦磷酸测序、合成测序、单分子测序、纳米孔测序、半导体测序、连接测序、杂交测序、数字基因表达(Helicos)、二代测序,单分子合成测序(SMSS)(Helicos)、大规模平行测序、克隆单分子阵列(Solexa/Illumina)、鸟枪测序、Maxim Gilbert测序、引物步移、使用PacBio、SOLiD、离子Torrent或纳米孔平台的测序以及本领域已知的任何其他测序方法。在一些情况下,测序方法可以包括多种样本处理单元。样本处理单元可包括但不限于多通道、多频道、多阱或其它可同时处理多个样本集的装置。另外,样 品处理单元可以包括多个样品室,以允许同时处理多个样品。在一些实施例中,可对包含多种不同类型的游离多核苷酸进行测序。核酸可以是多核苷酸或寡核苷酸,包括但不限于DNA或RNA。
随后使用生物信息学工具进行的多核苷酸分析涉及两个部分:1)将高通量平台的原始数据转换为相对定量测定,这将允许下游计算和变化分析。这些相关的生物信息工具已在本领域建立,例如基于阵列的数据,例如来自Illumina的HM450K的数据,通常以荧光强度定量甲基化和非甲基化位点的相对丰度,并且可以使用Illumina提供的软件进行转换;亚硫酸氢盐转换数据,例如来自全基因组亚硫酸化测序或靶向甲基化硫化测序,涉及单个Cs的甲基化调用,并且需要统计测试来评估差异甲基化:包括测序接头调整,测序读数质量评估、基于参考基因组的校准以及甲基化程度计算评估。市面上已经开发了许多工具,包括但不限于Cutadapt(微调)[Martin,M.J.E.j.,Cutadapt removes adapter sequences from high-throughput sequencing reads.2011.17(1):p.10-12.]、Bismark(校准)[Krueger,F.and S.R.J.b.Andrews,Bismark:a flexible aligner and methylation caller for Bisulfite-Seq applications.2011.27(11):p.1571-1572.]、UCSC基因组浏览器(数据可视化)、methygo(比对后分析)。像β值(β)这样的定量测量通常是通过甲基化和非甲基化等位基因之间强度的比值来估计甲基化水平的;2)确定表征DNA突变和其他变异的最佳甲基化标记,这可以通过使用单一生物标记物的简单相关分析(如Spearman分析或Pearson分析),也可以同时使用多个标记物建立模型,例如随机森林回归[Liaw,A.and M.J.R.n.Wiener,Classification and regression by randomForest.2002.2(3):p.18-22.]、LASSO回归[Tibshirani,R.J.J.o.t.R.S.S.S.B.,Regression shrinkage and selection via the lasso.1996.58(1):p.267-288.]、logistic回归以及深度学习神经网络来实现。
在一些实施例中,在标记物鉴定之后,可以使用现有技术将靶向甲基化和/或羟甲基化模式优化为简单的定量检测方法,包括但不限于寡核苷酸阵列、massARRAY、基于MS的引物延伸方法、甲基化特异性PCR(MSP)和甲基化特异性qPCR分析。其中,MSP是一种成熟的技术,用于检测所选基因序列中的基因甲基化程度[Herman,J.G.,et al.,Methylation-specific PCR:a novel PCR assay for methylation status of CpG islands.1996.93(18):p.9821-9826.]。甲基化特异性qPCR检测是一种高通量的定量甲基化检测方法,利用荧光实时PCR技术(TaqMan.RTM)PCR引物来区分甲基化和未甲基化的DNA,其在PCR扩增结束不需要例如电泳、杂交等更多操作,减少了污染和操作误差[Eads,C.A.,et al.,MethyLight:a high-throughput assay to measure DNA methylation.2000.28(8):p.e32-00.]。这种实时定量PCR反应包括了与待测甲基化位点互补的甲基化敏感的探针,例如使用TaqMan探针,随着目标序列甲基化状态的不同,只有荧光标记的亚硫酸氢盐转化的甲基化DNA特异的TaqMan探针与底物核苷酸杂交后释放出荧光信号,信号强度与PCR产物的量成正比,据此可计算出样品的甲基化程度。
下面对本发明进行全面的实施例描述,但不仅限于本发明所描述的实施例。
实施例1使用甲基化生物标志物分析胃癌血浆和组织样本中ERBB2(HER2)的扩增状态
胃癌是全球第五大常见癌症,在亚洲位居第二。在9%到38%的胃癌患者中,人表皮生长因子受体2(ERBB2(HER2))基因被扩增或过度表达[Rüschoff,J.,et al.,HER2 testing in gastric cancer:a practical approach.2012.25(5):p.637-650.]。三期曲妥珠单抗治疗胃癌(ToGA)表明,化疗和曲妥珠单抗(一种单克隆HER2抑制抗体)联合治疗比单纯化疗可提高生存率[Van Cutsem,E.,et  al.,Efficacy results from the ToGA trial:A phase III study of trastuzumab added to standard chemotherapy(CT)in first-line human epidermal growth factor receptor 2(HER2)-positive advanced gastric cancer(GC).2009.27(18_suppl):p.LBA4509-LBA4509.]。采用曲妥珠单抗作为HER2阳性胃癌的标准靶向治疗药物,提高了HER2检测的重要性。
根据国家综合癌症网络肿瘤学临床实践指南(NCCN指南),应通过免疫组织化学(IHC)和荧光或银原位杂交(FISH或SISH)对肿瘤组织进行HER2过度表达和/或扩增的评估[Carlson,R.W.,et al.,HER2 testing in breast cancer:NCCN Task Force report and recommendations.2006.4(S3):p.S-1-S-22]。IHC方法对于HER2蛋白表达层面的检测因为其成本及操作建议性更加普及,而FISH/SISH方法则是检测HER2基因的CNV状态,是金标准。研究表明,临床IHC对于HER2的检测与FISH检测有极高的关联性,是一种被公认接受的检测HER2表达变异的方法[Vincent‐Salomon A,MacGrogan G,Couturier J,et al:Calibration of immunohistochemistry for assessment of HER2 in breast cancer:results of the French Multicentre GEFPICS*Study.42:337-347,2003][1.Furrer D,Jacob S,Caron C,et al:Concordance of HER2 immunohistochemistry and fluorescence in situ hybridization using tissue microarray in breast cancer.37:3323-3329,2017][Arnould,L.,et al.,Accuracy of HER2 status determination on breast core-needle biopsies(immunohistochemistry,FISH,CISH and SISH vs FISH.2012.25(5):p.675-682]
然而,由于大多数胃癌患者被诊断为不可手术、晚期或转移性癌症,很难获得足够的组织用于HER2检测[Hofmann,M.,et al.,Assessment of a HER2 scoring system for gastric cancer:results from a validation study.2008.52(7):p. 797-805.]。同时,胃癌由于病变组织具有较高的异质性,传统的组织活检、免疫组织化学染色法、原位杂交检测等方法对样本采集、样本量和处理要求较高,在多次取样对患者也会造成一定伤害,且在检测实践中的一些问题也不断显现,例如胃镜活检标本HER2检测未得到普及;原位杂交检测率低下,并由此导致大多数免疫组织化学染色(IHC)2+的胃癌病例未能最终明确HER2状态;部分单位HER2阳性率与国内外文献报道差异较大等。[Lee,H.E.,et al.,Clinical significance of intratumoral HER2 heterogeneity in gastric cancer.2013.49(6):p.1448-1457.]。
本发明提供了一种非侵入性的基于甲基化技术的液体活检HER2的扩增分析方法(图2)。
患者
胃癌患者组织FFPE和血浆标本取自广州南方医科大学病理科。该项目获得了南方大学医学伦理学委员会的批准。在每一个病例中都征求患者知情同意。术后从每位患者采集2-5个FFPE载玻片样本,并在手术前使用真空采血管(BD,Cat#367525)从每个患者采集3-5毫升血浆。每个病人的HER2扩增状态,通过免疫组化染色确定,来自医院的官方病理报告。
样品采集和DNA提取
用QIAamp-DNA-FFPE组织试剂盒(Qiagen,Cat#56404)从FFPE组织样本中分离出组织基因组DNA。用Qiagen-Qiamp循环核酸试剂盒(Qiagen,Cat#55114)从血浆中分离出细胞的游离DNA(cfDNA)。避免血浆的反复冷冻和解冻,以防止cfDNA降解。采用Qubitds TM DNA HS分析试剂盒(Thermo Fisher Scientific,Cat#Q32854)和安捷伦高灵敏度DNA试剂盒(Cat#5067-4626) 通过2100生物分析仪(安捷伦)测定cfDNA的浓度和质量。对产量大于3ng且无过度基因组DNA污染的cfDNA进行测序文库构建。
组织样品亚硫酸氢盐转化及文库构建
按照试剂盒说明书,使用Zymo Lightning转换试剂(Zymo Research,Cat#D5031)进行cfDNA亚硫酸氢盐转化,通过Zymo自旋 TM IC柱,洗涤和脱硫,亚硫酸氢盐转化的DNA用M-洗脱缓冲液洗脱两次,最终体积为17μL。
对于组织样本,按照使用说明,使用M220聚焦超声仪(Covaris,Inc.)将2ug基因组DNA片段化为~200bp(峰值大小),之后纯化的800ng片段基因组DNA用于亚硫酸氢盐转化。在亚硫酸氢盐转化和纯化后,亚硫酸氢盐转化的DNA在A260下通过NanoDrop(Thermo Fisher Scientific)定量。然后,将150ng亚硫酸氢盐转化产物用于FFPE组织样本的文库制备。
使用AnchorDx-EpivisionTM甲基化文库制备试剂盒(AnchorDx,Cat#A0UX00019)和AnchorDx-EpiVisioTM索引PCR试剂盒(AnchorDx,Cat#A2DX00025)完成NGS预文库制备。在末端补齐(end pair reparation)、3'端接头连接和反向互补DNA扩增后,使用1:6 Agencourt AMPure XP磁珠(Beckman Coulter,Cat#A63882)纯化扩增的DNA。在反向互补DNA的3′端接头和标引PCR(i5和i7)连接后,用XP磁珠纯化扩增的预文库。含有800ng以上预杂交文库DNA可用于后续的靶向富集分析。
使用AnchorDx-EpivisionTM靶富集试剂盒(AnchorDx,Cat#A0UX00031)和甲基化面板(panel)、AnchorDx BrGcMet面板进行目标富集。使用AnchorDx-BrGcMet甲基化面板将含有多达4个预杂交文库的1000ng DNA汇集起来用于靶向富集。AnchorDx-BrGcMet面板包括12892个为癌症特异性甲基化富集的预选区域,所针对的基因组区域的总大小包括123269个CpG位点。探针 杂交、纯化和最终PCR扩增的程序遵循已报导的方案[Liang,W.,et al.,Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA(ctDNA).2019.9(7):p.2056.]。
DNA测序及DNA甲基化水平计算
根据说明书操作,通过Illumina HiSeq X-Ten测序系统对富集库进行测序。通过等位基因甲基化和非甲基化比率强度定义β值(β),用来估计甲基化水平。β值在0和1之间,0为非甲基化,1为完全甲基化[Du,P.,et al.,Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis.2010.11(1):p.587.]。
血浆样品甲基化特异性qPCR检测方法的建立和验证
设计甲基化标记并进行优化,根据使用说明书进行甲基化特异性qPCR分析(中国AnchorDx)。设置EpiTect PCR Control DNA Set(Qiagen,Germany)为阳性对照组和阴性对照组。在QuantStudio 3实时PCR系统(Thermo Fisher,美国)上使用Epimark qPCR反应系统(NEB,Cat#M0490)在以下循环条件下进行qPCR反应:98℃下变性30s,40个循环(95℃ 10s,62℃ 20s)。
对于血浆样品,亚硫酸氢盐转化的cfDNA的推荐使用量为10ng。当cfDNA产量在1~10ng之间时,所有纯化的cfDNA用于亚硫酸氢盐转化。亚硫酸氢盐转化后,我们使用所有亚硫酸氢盐转化的cfDNA进行后续甲基化特异性qPCR检测。
关于甲基化特异性qPCR分析,使用ΔCt表示目标区域的共甲基化水平,其中ΔCt=平均Ct(目标区域)-平均Ct(内部控制区域)。对于Ct值不确定的区域,将人工ΔCt指定为35。
数据处理
应用R-pROC软件包对个体标志物和最终分类模型进行临床性能分析。使用PythonSklearn包进行逻辑斯谛回归模型构建。采用Student-T方法对不同试验组的HER2扩增概率分布进行统计分析。
结果
扩增组织材料中HER2-相关甲基化替代生物标志物鉴定
为了鉴定胃癌中HER2扩增状态特有的甲基化特征,我们收集了74份FFPE组织样本,包括44份HER2-样本(IHC0或1+)和33份HER2+样本(IHC3+),所有样本均为晚期(III或IV期)。利用高通量靶向甲基化测序法,我们对原始测序数据进行了相关清理及处理分析[Liang,W.,et al.,Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA(ctDNA).2019.9(7):p.2056.],并基于对reads读数来确定每一个位点甲基化胞嘧啶的百分比(β值),通过对HER2+样本和HER2-样本每一个位点进行统计差异学分析,鉴定出102个候选甲基化标记物位点,这些标记物在HER2-和HER2+组之间有显著差异(FDR<0.01,图3)。
多重甲基化特异性qPCR检测方法的建立及生物标志物的筛选
接下来,我们进行了甲基化特异性qPCR检测设计,从102个候选标记物中设计出64个候选标记物的qPCR引物及探针(在HER2+与HER2-组织标本中β值差异>5%且符合引物基本设计原则[
Figure PCTCN2021086104-appb-000001
R.S.,et al.,Methylation-specific PCR:four steps in primer design.2014.9(12):p.1127-1139])。我们使用稀释的内参基因和未转化的DNA为对照检测这些检测的线性和特异性,剔除了13种由于检测性能较差的生物标记物(图4)。
组织样本、细胞系和血浆样本的验证
我们进一步在1)独立组织样本、2)胃癌和乳腺癌细胞系和3)胃癌血浆样本中验证了优化的甲基化特异性qPCR检测,以加强我们对这些标记物的发现。
在独立的FFPE-胃癌样品中(42-HER2-vs31-HER2+),我们通过使用LASSO模型包(R package-glmnet package),通过对64个候选标记物进行分析建模,构建立了基于10个甲基化标记物的线性回归模型,测定HER2扩增的AUC为0.94(图5)。
在细胞系样品中,根据甲基化生物标志物ΔCt值对HER2进行分级评分(评分基于2个甲基化标记的线性回归模型),基于此评分,我们发现胃癌HER2+和HER2-细胞系以及乳腺癌HER2+和HER2-细胞系之间存在显著差异(图6)。
在血浆样本中,我们测试了三种不同的建模方法(最小绝对值收敛和选择算子(LASSO)[Tibshirani,R.J.J.o.t.R.S.S.S.B.,Regression shrinkage and selection via the lasso.1996.58(1):p.267-288.]、随机森林(RF)[Liaw,A.and M.J.R.n.Wiener,Classification and regression by randomForest.2002.2(3):p.18-22.]和线性回归(LR)[Long,J.S.and L.H.J.T.A.S.Ervin,Using heteroscedasticity consistent standard errors in the linear regression model.2000.54(3):p.217-224.])模型。
正如预期,根据我们的HER2分类评分,胃癌HER2-血浆样本(N=7)和HER2-血浆样本(N=20)之间也存在显著差异(图7)。
表1建模使用的软件包信息
Figure PCTCN2021086104-appb-000002
Figure PCTCN2021086104-appb-000003
综上所述,这些数据表明我们可以使用游离甲基化生物标记物来准确评估胃癌患者的HER2扩增状态,从而可以作为胃癌的靶向治疗的伴随诊断产品。
实施例2使用甲基化生物标志物分析肺癌组织样本中ERBB2基因的插入和/或缺失突变(INDEL)
肺癌是全球发病率最高、死亡率最高的癌症之一,在中国所有癌症里居首位。ERBB2(HER2)基因属于人类表皮生长因子受体家族(HER),HER2基因突变广泛存在于许多实体肿瘤中,包括乳腺癌、胃癌、肺癌等。ERBB2突变是肺癌的常见驱动突变基因之一,在2-4%的肺癌中可以检测出,以20号外显子/插入突变(INDEL)最为常见,该突变可以激活激酶活性和下游信号通路,促进细胞存活和肿瘤发生[Wang SE,et al.HER2 kinase domain mutation results in constitutive phosphorylation and activation of HER2 and EGFR and resistance to EGFR tyrosine kinase inhibitors.2006;10(1):25-38.]。
患者
肺癌患者组织FFPE取自广州广州医科大学附属第一医院。该项目获得了广州医科大学附属第一医院大学医学伦理学委员会的批准。在每一个病例中都征求患者知情同意。术后从每位患者采集2-5个FFPE载玻片样本,相关病人个人病理信息来自医院的官方病理报告。
组织样品全基因组测序分析
FFPE组织样品由第三方(明码生物技术公司)进行全基因组测序分析
组织样品DNA提取、亚硫酸氢盐转化及文库构建、甲基化测序,请详见实例1。
结果
肺癌组织中ERBB2 EXON 20 INDEL相关甲基化替代生物标志物鉴定与建模分析
为了鉴定胃癌中ERBB2 INDEL状态特有的甲基化特征,我们对收集的78份FFPE组织样本进行全基因组INDEL分析,发现其中18例样本存在ERBB2 EXON20的INDEL变异,其余60正常(见表2)。
表2
  chr7 pos   gene type band
B2-C-028 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B5-C-036 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B3-C-018 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B2-C-007 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B3-C-039 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B2-C-027 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B2-C-023 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B3-C-026 chr7 55242464 exonic EGFR nonframeshift deletion 7p11.2
B2-C-029 chr7 55242465 exonic EGFR nonframeshift deletion 7p11.2
B4-C-006 chr7 55242466 exonic EGFR nonframeshift deletion 7p11.2
B3-C-081 chr7 55242466 exonic EGFR nonframeshift deletion 7p11.2
B2-C-018 chr7 55242469 exonic EGFR nonframeshift deletion 7p11.2
B4-C-021 chr7 55242469 exonic EGFR nonframeshift deletion 7p11.2
B5-C-038 chr7 55242469 exonic EGFR nonframeshift deletion 7p11.2
B3-C-067 chr7 55242469 exonic EGFR nonframeshift deletion 7p11.2
B3-C-068 chr7 55248998 exonic EGFR nonframeshift insertion 7p11.2
B3-C-040 chr7 55249002 exonic EGFR nonframeshift insertion 7p11.2
B3-C-048 chr7 55249010 exonic EGFR nonframeshift insertion 7p11.2
利用高通量靶向甲基化测序法,我们对原始测序数据进行了相关清理及处理分析[Liang,W.,et al.,Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA(ctDNA).2019.9(7):p.2056.],并基于对reads读数来确定每一个位点甲基化胞嘧 啶的百分比(β值),通过对ERBB2 EXON20 INDEL+样本和ERBB2 EXON20 INDEL-样本每一个位点进行统计差异学分析,使用p value<0.001.fdr<0.1,|diff|>0.1条件鉴定出5个存在显著差异的候选甲基化标记物位点。
通过对78例样本进行7:3分组,切分100次,使用这5个候选甲基化标志物进行逻辑回归建模分析,得到测试集结果的AUC平均值可达0.874(图8)。
通过对78例样本进行7:3分组,切分100次,使用这5个候选甲基化标志物进行随机森林建模分析,得到测试集结果的AUC平均值可达0.907(图9)。这说明使用5个标志物的模型可以准确的区分样本中ERBB2基因EXON 20 INDEL突变的状态。
实施例3使用甲基化生物标志物分析肺癌组织样本中ATM融合突变(FUSION)
肺癌是全球发病率最高、死亡率最高的癌症之一,在中国所有癌症里居首位。ATM基因编码的蛋白属于PI3/PI4激酶家族,这种蛋白是一种重要的细胞周期检查点激酶,通过磷酸化调控下游一系列重要蛋白,包括抑癌蛋白p53和BRCA1、检查点激酶CHK2、检查点蛋白RAD17和RAD9以及DNA修复蛋白NBS1,主要参与DNA损伤修复过程,基因组稳定性的维持等。
ATM基因的突变与肺癌发生密切相关。研究表明,ATM基因与肿瘤的放疗治疗的敏感性有较强的关联度,同时,ATM激酶在肺癌细胞中的突变状态可作为衡量患者对MEK抑制剂类药物敏感性的新型肿瘤标志物,可极大提高对该类亚型患者的诊断及后续治疗效,并有望拓展该类药物在RAS及BRAF等突变之外类型肿瘤患者中的应用[Ji X,et al.Protein-altering germline mutations implicate novel genes related to lung cancer development.2020;11(1):1-14.]。
患者
肺癌患者组织FFPE取自广州广州医科大学附属第一医院。该项目获得了广 州医科大学附属第一医院大学医学伦理学委员会的批准。在每一个病例中都征求患者知情同意。术后从每位患者采集2-5个FFPE载玻片样本,相关病人个人病理信息来自医院的官方病理报告。
组织样品全基因组测序分析
FFPE组织样品由第三方(明码生物技术公司)进行全基因组测序分析组织样品DNA提取、亚硫酸氢盐转化及文库构建、甲基化测序参考实施例1。
结果
肺癌组织中ATM FUSION相关甲基化替代生物标志物鉴定与建模分析
为了鉴定肺癌中ATM FUSION状态特有的甲基化特征,我们收集了6份出现ATM融合突变(ATM FUSION+)的FFPE组织样本及20份没有出现ATM融合突变(ATM FUSION-)的样本(经全基因组测序及分析验证)。
利用高通量靶向甲基化测序法,我们对原始测序数据进行了相关清理及处理分析[Liang,W.,et al.,Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA(ctDNA).2019.9(7):p.2056.],并基于对reads读数来确定每一个位点甲基化胞嘧啶的百分比(β值),通过对ATM FUSION+样本和ATM FUSION-样本每一个位点进行统计差异学分析,使用p value<0.001,fdr<0.05条件鉴定出4个存在显著差异的候选甲基化标记物位点。
通过对26例样本进行5:5分组,切分50次,使用这4个候选甲基化标志物进行逻随机森林建模分析,得到测试集结果的AUC平均值可达0.933(图10)。这说明使用4个标志物的模型可以准确的区分样本中ATM基因融合突变的状态。
实施例4使用甲基化生物标志物分析肺癌组织样本中EGFR EXON 21 L858R点突变(SNV)
肺癌是全球发病率最高、死亡率最高的癌症之一,在中国所有癌症里居首位。
EGFR是表皮生长因子受体(HER)家族成员之一,广泛分布于哺乳动物上皮细胞、成纤维细胞、胶质细胞、角质细胞等细胞表面,EGFR信号通路对细胞的生长、增殖和分化等生理过程发挥重要的作用。
EGFR是非小细胞肺癌(non-small cell lung cancer,NSCLC)中最常见的驱动基因之一,临床上EGFR基因的检测主要用于晚期非小细胞肺癌患者治疗前的评估,有了EGFR突变,意味着患者可以有对应的靶向药,有效率高达60%-70%以上,副作用也比较小,EGFR-TKI靶向药物的诞生,显著改善了EGFR突变阳性晚期肺癌患者的生存期,并让肺癌的临床治疗进入了精准治疗时代。
EGFR基因最常见的突变位点位于18-21外显子,其中18外显子的突变为G719X,19外显子的突变为E19del,20外显子的突变有T790M、S768I和E20ins,21外显子则是的L858R和L861Q。其中19外显子的缺失性突变E19del和21外显子的点突变L858R最为常见,是口服EGFR靶向药物治疗的主要人群[Yamamoto H,Toyooka S,and Mitsudomi TJLc.Impact of EGFR mutation analysis in non-small cell lung cancer.2009;63(3):315-21.]。
患者
非小细胞肺癌患者组织FFPE取自广州广州医科大学附属第一医院。该项目获得了广州医科大学附属第一医院大学医学伦理学委员会的批准。在每一个病例中都征求患者知情同意。术后从每位患者采集2-5个FFPE载玻片样本,相关病人个人病理信息来自医院的官方病理报告。
组织样品全基因组测序分析
FFPE组织样品由第三方(明码生物技术公司)进行全基因组测序分析组织样品DNA提取、亚硫酸氢盐转化及文库构建、甲基化测序参考实施例1。
结果,非小细胞肺癌组织中EGFR EXON 21L858R点突变相关甲基化替代生物标志物鉴定与建模分析。
为了鉴定肺癌中EGFR L858R点突变状态特有的甲基化特征,我们收集了的39份出现EGFR L858R点突变(L858R+)的FFPE组织样本及39份没有出现点突变(L858R-)的样本(经全基因组测序及分析验证)。
利用高通量靶向甲基化测序法,我们对原始测序数据进行了相关清理及处理分析[Liang,W.,et al.,Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA(ctDNA).2019.9(7):p.2056.],并基于对reads读数来确定每一个位点甲基化胞嘧啶的百分比(β值),通过对L858R+样本和L858R-样本每一个位点进行统计差异学分析,使用p value<0.001,fdr<0.05条件鉴定出20个存在显著差异的候选甲基化标记物位点。
通过对78例样本进行5:5分组,切分50次,使用这20个候选甲基化标志物进行逻随机森林建模分析,得到测试集结果的AUC平均值可达0.867(图11)。这说明使用20个标志物的模型可以准确的区分样本中EGFR L858R点突变的状态。
实施例5使用甲基化生物标志物分析肺癌组织样本P53基因外显子5-8的点突变(SNV)状态
肺癌是全球发病率最高、死亡率最高的癌症之一,在中国所有癌症里居首位。
P53基因是一种重要的抑癌基因。在人类50%的肿瘤中都发现p53基因有缺失或突变,其与肿瘤的发生发展密切相关。
p53基因突变是包括肺癌在内的,许多肿瘤发生的重要原因之一。基因突变主要包括点突变和等位基因的缺失。椐报道,在大约200多种不同的肿瘤中,有 50%的肿瘤带有p53基因突变。已发现p53基因中有4个位于外显子5-8的突变热点,虽然不同组织器官发生的肿瘤中,p53基因突变谱显示有差异,但约有90%的突变是集中在这部分区域。他们分别编码132-143、174-179、236-248和272-281号氨基酸[Rodin SN,and Rodin ASJPotNAoS.Human lung cancer and p53:the interplay between mutagenesis and selection.2000;97(22):12244-9.]。
患者
肺癌患者组织FFPE取自广州广州医科大学附属第一医院。该项目获得了广州医科大学附属第一医院大学医学伦理学委员会的批准。在每一个病例中都征求患者知情同意。术后从每位患者采集2-5个FFPE载玻片样本,相关病人个人病理信息来自医院的官方病理报告。
组织样品全基因组测序分析
FFPE组织样品由第三方(明码生物技术公司)进行全基因组测序分析组织样品DNA提取、亚硫酸氢盐转化及文库构建、甲基化测序参考实施例1。
结果
肺癌组织中P53 EXON 5-8点突变状态相关甲基化替代生物标志物鉴定与建模分析。
为了鉴定肺癌中P53 EXON 5-8点突变状态特有的甲基化特征,我们收集了的40份在P53 EXON5-8出现点突变的FFPE组织样本及38份没有出现点突变的样本(经全基因组测序及分析验证)。
利用高通量靶向甲基化测序法,我们对原始测序数据进行了相关清理及处理分析[Liang,W.,et al.,Non-invasive diagnosis of early-stage lung cancer using high-throughput targeted DNA methylation sequencing of circulating tumor DNA(ctDNA).2019.9(7):p.2056.],并基于对reads读数来确定每一个位点甲基化胞嘧 啶的百分比(β值),通过对P53 EXON 5-8点突变阳性样本和阴性样本每一个位点进行统计差异学分析,使用p value<0.001,fdr<0.05条件鉴定出20个存在显著差异的候选甲基化标记物位点。
通过对78例样本进行5:5分组,切分50次,使用这20个候选甲基化标志物进行逻随机森林建模分析,得到测试集结果的AUC平均值可达0.902(图12)。这说明使用20个标志物的模型可以准确的区分样本中P53基因EXON 5-8存在点突变的状态。
以上已对本发明创造的较佳实施例进行了具体说明,但本发明创造并不限于所述实施例,熟悉本领域的技术人员在不违背本发明创造精神的前提下还可做出种种的等同的变型或替换,这些等同的变型或替换均包含在本申请权利要求所限定的范围内。

Claims (23)

  1. 一种检测多核苷酸变异的方法,其特征在于,包括以下步骤:
    1)从生物样品中分离多核苷酸;
    2)对甲基化和/或羟甲基化生物标志物的鉴定和表征;
    3)鉴定相关甲基化和/或羟甲基化标记物或根据候选标记物建立模型来推断和/或确定多核苷酸变异。
  2. 根据权利要求1所述方法,其特征在于,所述多核苷酸包含DNA。
  3. 根据权利要求1所述方法,其特征在于,所述多核苷酸包含RNA。
  4. 根据权利要求1所述方法,其特征在于,所述多核苷酸变异包括单核苷酸变异(single-nucleotide variations,SNVs)。
  5. 根据权利要求1所述方法,其特征在于,所述多核苷酸变异包括插入和/或缺失(insertions and deletions,InDels)。
  6. 根据权利要求1所述方法,其特征在于,所述多核苷酸变异包括融合多核苷酸(fusions)。
  7. 根据权利要求1所述方法,其特征在于,所述多核苷酸变异包括拷贝数变化(copy number variations,CNVs)。
  8. 根据权利要求1所述方法,其特征在于,所述生物样品包括生物流体样品,优选为血液、血清、血浆、玻璃体、痰、尿、眼泪、汗液、唾液。
  9. 根据权利要求1所述方法,其特征在于,所述生物样品包括组织样品。
  10. 根据权利要求1所述方法,其特征在于,所述生物样品包括细胞系样品。
  11. 根据权利要求1所述方法,其特征在于,所述分离方法包括基于苯酚和/或氯仿的DNA提取方法,磁珠分离法和硅胶柱分离法。
  12. 根据权利要求1所述方法,其特征在于,多核苷酸从生物样品中提取和分离后,所述甲基化和/或羟甲基化生物标志物的鉴定和表征中的甲基化处理方法包括化学或酶转化方法;优选地,化学转化方法包括使用商业或非商业化的亚硫酸氢盐(Bisulfite)处理;酶转化法是商业化的或非商业化的基于TET-APOBEC的转化法。
  13. 根据权利要求1所述方法,其特征在于,所述甲基化和/或羟甲基化生物标志物的鉴定和表征包括甲基化特异性PCR方法。
  14. 根据权利要求1所述方法,其特征在于,所述甲基化和/或羟甲基化鉴定和表征使用MassARRAY方法检测。
  15. 根据权利要求1所述方法,其特征在于,所述甲基化和/或羟甲基化鉴定和表征使用微阵列杂交技术。
  16. 根据权利要求1所述方法,其特征在于,所述甲基化和/或羟甲基化鉴定和表征使用基于测序的方法,优选地,通过结合亚硫酸氢盐处理来分析5-甲基胞嘧啶或5-羟甲基胞嘧啶分布,使用全基因组亚硫酸氢盐测序或靶向甲基化测序。
  17. 根据权利要求1所述方法,其特征在于,所述用于推断和/或确定多核苷酸变异的方法包括使用生物信息学分析,该生物信息学分析包括通过斯皮尔曼分析或皮尔森分析确定最佳生物标记物和/或模型,优选为使用随机森林、LASSO回归、逻辑回归、deep-learning network进行建模。
  18. 根据权利要求1所述方法,其特征在于,还包括在所述甲基化和/或羟甲基化生物标志物的鉴定和表征之后,进行简单的定量检测方法,所述定量检测包括基于从高通量方法中选择的候选标记,基于甲基化特异性引物的延伸方法、甲基化特异性PCR、甲基化特异性qPCR分析、MassARRAY、靶向甲基化测序。
  19. 根据权利要求4所述方法,其特征在于,所述单核苷酸变异的基因包括AKT1、ALK、APC、AR、ARF、ARID1A、ATM、BRAF、BRCA1、BRCA2、CCND1、CCND2、CCNE1、CDH1、CDK4、CDK6、CDKN2A、CTNNB1、DDR2、EGFR、ERBB2、ESR1、EZH2、FBXW7、FGFR1、FGFR2、FGFR3、GATA3、GNA11、GNAQ、GNAS、HNF1A、HRAS、IDH1、IDH2、JAK2、JAK3、KIT、KRAS、MEK1、MEK2、ERK2、ERK1、MET、MLH1、MPL、MTOR、MYC、NF1、NFE2LE、NOTCH1、NPM1、NRAS、NTRK1、NTRK3、PDGFRA、PI3CA、PTEN、PTPN11、RAF1、RB1、RET、RHEB、RHOA、RIT1、ROS1、SMAD4、SMO、STK11、TERT、TP53、TSC1、VHL基因中的至少一种。
  20. 根据权利要求5所述方法,其特征在于,所述插入和/或缺失的多核苷酸包括ATM、APC、ARID1A、BRCA1、BRCA2、CDH1、CDKN2A、EGFR、ERBB2、GATA3、KIT、MET、MLH1、MTOR、NF1、PDGFRA、PTEN、RB1、SMAD4、STK11、TP53、TSC1、VHL基因中的至少一种。
  21. 根据权利要求6所述方法,其特征在于,所述融合的多核苷酸包含ALK、FGFR2、FGFR3、NTRK1、RET、ROS1、EML4基因中的至少一种。
  22. 根据权利要求7所述方法,其特征在于,所述拷贝数变异的多核苷酸包括AR、BRAF、CCND1、CCND2、CCNE1、CDK4、CDK6、EGFR、ERBB2(HER2)、FGFR1、FGFR2、KIT、KRAS、MET、MYC、PDGFRA、PI3CA、RAF1基因。
  23. 根据权利要求1所述方法,其特征在于,所述多核苷酸变异包括ERBB2(HER2)基因的扩增。
PCT/CN2021/086104 2020-12-15 2021-04-09 一种检测多核苷酸变异的方法 WO2022126938A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/335,453 US20240002953A1 (en) 2020-12-15 2023-06-15 Method for detecting polynucleotide variations

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202011479880 2020-12-15
CN202011479880.7 2020-12-15
CN202110269166.3 2021-03-12
CN202110269166.3A CN114634982A (zh) 2020-12-15 2021-03-12 一种检测多核苷酸变异的方法

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/335,453 Continuation US20240002953A1 (en) 2020-12-15 2023-06-15 Method for detecting polynucleotide variations

Publications (1)

Publication Number Publication Date
WO2022126938A1 true WO2022126938A1 (zh) 2022-06-23

Family

ID=81946691

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086104 WO2022126938A1 (zh) 2020-12-15 2021-04-09 一种检测多核苷酸变异的方法

Country Status (3)

Country Link
US (1) US20240002953A1 (zh)
CN (1) CN114634982A (zh)
WO (1) WO2022126938A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1882700A (zh) * 2003-09-22 2006-12-20 特莱索根生物科技有限合伙公司 用于检测基因座拷贝数改变的方法和试剂盒
WO2018094031A1 (en) * 2016-11-16 2018-05-24 Progenity, Inc. Multimodal assay for detecting nucleic acid aberrations
WO2018204408A1 (en) * 2017-05-02 2018-11-08 Sanford Burnham Prebys Medical Discovery Institute Methods of diagnosing and treating alzheimer's disease
CN109689891A (zh) * 2016-07-06 2019-04-26 夸登特健康公司 用于无细胞核酸的片段组谱分析的方法
CN110438228A (zh) * 2019-07-31 2019-11-12 南通大学附属医院 结直肠癌dna甲基化标志物
CN110621788A (zh) * 2017-05-18 2019-12-27 基美健有限公司 用于膀胱癌监视的dna甲基化和突变分析方法
CN111712582A (zh) * 2017-11-02 2020-09-25 香港中文大学 使用核酸大小范围进行非侵入性产前检查和癌症检测

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080221056A1 (en) * 2007-02-12 2008-09-11 Johns Hopkins University Early Detection and Prognosis of Colon Cancers
CA2858144C (en) * 2011-12-06 2021-05-04 Mdxhealth Sa Methods of detecting mutations and epigenetic changes
CN108292328B (zh) * 2015-11-10 2022-04-19 美国陶氏益农公司 用于预测转基因沉默风险的方法和系统
CN108920904B (zh) * 2018-07-26 2022-05-27 深圳市易基因科技有限公司 一种同源基因特异性甲基化时序数据的分析方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1882700A (zh) * 2003-09-22 2006-12-20 特莱索根生物科技有限合伙公司 用于检测基因座拷贝数改变的方法和试剂盒
CN109689891A (zh) * 2016-07-06 2019-04-26 夸登特健康公司 用于无细胞核酸的片段组谱分析的方法
WO2018094031A1 (en) * 2016-11-16 2018-05-24 Progenity, Inc. Multimodal assay for detecting nucleic acid aberrations
WO2018204408A1 (en) * 2017-05-02 2018-11-08 Sanford Burnham Prebys Medical Discovery Institute Methods of diagnosing and treating alzheimer's disease
CN110621788A (zh) * 2017-05-18 2019-12-27 基美健有限公司 用于膀胱癌监视的dna甲基化和突变分析方法
CN111712582A (zh) * 2017-11-02 2020-09-25 香港中文大学 使用核酸大小范围进行非侵入性产前检查和癌症检测
CN110438228A (zh) * 2019-07-31 2019-11-12 南通大学附属医院 结直肠癌dna甲基化标志物

Also Published As

Publication number Publication date
US20240002953A1 (en) 2024-01-04
CN114634982A (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
JP6664025B2 (ja) まれな変異およびコピー数多型を検出するためのシステムおよび方法
US20200048697A1 (en) Compositions and methods for detection of genomic variance and DNA methylation status
CN113330121A (zh) 用于循环细胞分析的方法
EP3541934B1 (en) Methods for preparing dna reference material and controls
US11939636B2 (en) Methods and systems for improving patient monitoring after surgery
US20170298427A1 (en) Nucleic acids and methods for detecting methylation status
US20190309352A1 (en) Multimodal assay for detecting nucleic acid aberrations
US20220228219A1 (en) Target-enriched multiplexed parallel analysis for assessment of tumor biomarkers
WO2016071369A1 (en) Method for determining the presence of a biological condition by determining total and relative amounts of two different nucleic acids
CN116219020B (zh) 一种甲基化内参基因及其应用
WO2022126938A1 (zh) 一种检测多核苷酸变异的方法
US20220127601A1 (en) Method of determining the origin of nucleic acids in a mixed sample
WO2023106415A1 (ja) リンパ腫に罹患したイヌの化学療法後の予後予測方法
US20240002946A1 (en) Methods and systems for improving patient monitoring after surgery
WO2022188776A1 (zh) 可用于胃癌her2伴随诊断的基因甲基化标记物或其组合和应用
CN110870017B (zh) 从无细胞核酸生成背景等位基因频率分布及检测突变的方法
Taylor Biomarkers of Lung Cancer Risk and Progression
CN115772564A (zh) 用于辅助检测肺癌体细胞atm基因融合突变的甲基化生物标记物及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904877

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904877

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21904877

Country of ref document: EP

Kind code of ref document: A1