WO2024056008A1 - 鉴别癌症的甲基化标志物及应用 - Google Patents

鉴别癌症的甲基化标志物及应用 Download PDF

Info

Publication number
WO2024056008A1
WO2024056008A1 PCT/CN2023/118675 CN2023118675W WO2024056008A1 WO 2024056008 A1 WO2024056008 A1 WO 2024056008A1 CN 2023118675 W CN2023118675 W CN 2023118675W WO 2024056008 A1 WO2024056008 A1 WO 2024056008A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
methylation
seq
dna
marker
Prior art date
Application number
PCT/CN2023/118675
Other languages
English (en)
French (fr)
Inventor
徐敏杰
陈桦
孙津
马成城
何其晔
苏志熙
刘蕊
Original Assignee
江苏鹍远生物科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211129987.8A external-priority patent/CN117821585A/zh
Priority claimed from CN202211190564.7A external-priority patent/CN117778568A/zh
Application filed by 江苏鹍远生物科技股份有限公司 filed Critical 江苏鹍远生物科技股份有限公司
Publication of WO2024056008A1 publication Critical patent/WO2024056008A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6851Quantitative amplification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer

Definitions

  • This application relates to the field of biomedicine, specifically a method for early diagnosis and screening of cancer.
  • Colorectal cancer is the third most common and lethal cancer in the world. In recent years, with changes in diet and other lifestyle changes, the incidence of colorectal cancer has gradually increased, which has greatly harmed human health. The cure rate of colorectal cancer is closely related to the cancer stage. The five-year survival rate of patients with stage I and stage II colorectal cancer reaches 80%, while the five-year survival rate of stage III patients drops to 50%, and the five-year survival rate of stage IV patients is only 8%. Unfortunately, most patients with colorectal cancer have no obvious symptoms in the early stages and are already in the middle and late stages of cancer when they seek treatment, missing the best period for treatment.
  • Gastric cancer is the second most common type of cancer worldwide, and almost two-thirds of cases occur in developing countries. According to existing data, gastric cancer is the fourth most common cancer among men and the seventh most common cancer among women. At present, gastric cancer has become a serious threat to people's health. Finding convenient and effective early diagnosis methods for gastric cancer plays a vital role in reducing its mortality and improving its survival rate. Among them, tumor markers are an important examination method, which can provide effective evidence for clinical diagnosis and treatment and reduce screening costs for patients under simple and economical conditions. Blood is the preferred source of candidate tumor markers for gastric cancer screening, and blood-based biomarkers provide a profile of the entire patient body, including the primary tumor, metastatic disease, immune response, and peritumoral stroma.
  • Common gastric cancer blood markers include CEA, CA19-9, CA72-4, etc. These tumor markers all have low sensitivity, and the detection rate is only about 50%. In addition, poor specificity is a major drawback. For example, CA19-9 serum levels vary in various glands Cancers (B including pancreatic cancer, hepatobiliary cancer, and gastric cancer) were increased. CEA is elevated in a variety of cancers and even non-cancer diseases. Due to low sensitivity and poor specificity, the use of these blood markers is relatively limited in actual clinical practice, especially in early screening applications for gastric cancer.
  • Esophageal cancer is one of the most common malignant tumors in the world. It has high morbidity and mortality, and has become a serious threat to people's health. The symptoms of early esophageal cancer are not obvious, and there is no specific diagnostic method. Therefore, most esophageal cancer patients are already in the middle and late stages when diagnosed. Tumor markers are also an important examination method for esophageal cancer. Previous studies have focused on the difference between single serum markers, such as miR-138, in esophageal cancer patients and normal controls, but their sensitivity and specificity have not yet reached expectations. However, some studies have conducted combined detection of serum markers, such as combined detection of multiple small RNAs. However, despite this, their sensitivity and specificity improvements are limited.
  • Circulating tumor DNA (ctDNA) molecules are derived from apoptotic or necrotic tumor cells and carry tumor-specific DNA methylation markers from early malignant tumors. In recent years, they have been studied as a promising tool for the development of non-invasive early screening tools for various cancers. new target. However, most of these studies did not achieve valid results.
  • liver cancer is a cancer that seriously threatens health in our country.
  • the onset of liver cancer is insidious. Once patients develop clinical symptoms, their condition is often in the middle and late stages and they lose the opportunity for radical treatment.
  • the prognosis is extremely dangerous; therefore, the sooner liver cancer patients are diagnosed , the better the treatment effect, the higher the survival rate.
  • common detection methods such as alpha-fetoprotein assay are used to measure embryonic antigens produced by immunological methods. They are currently one of the most specific methods for diagnosing hepatocellular carcinoma and are relatively specific for diagnosing hepatocellular carcinoma.
  • hepatocellular carcinoma can be diagnosed if ⁇ -FP counter-immunoelectrophoresis is positive or the quantitative value is >500ng/ml for more than one month, and pregnancy, active liver disease, gonad embryonal tumors, etc. can be ruled out.
  • Blood enzymology test ⁇ -glutamyl transpeptidase, alkaline phosphatase and lactate dehydrogenase isoenzymes in the serum of liver cancer patients can be higher than normal, but due to lack of specificity, they are mostly used as auxiliary diagnosis.
  • This application provides an early non-invasive identification of cancer (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer) methylation markers and applications, based on the biomarker group of this application in plasma methylation markers Based on the level of typing, it can conveniently, accurately and efficiently identify patients with (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer), and provide early diagnosis of (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer).
  • a new method is provided.
  • the detection process of this application is non-invasive, highly safe and convenient for large-scale clinical application.
  • This application only needs to detect the methylation level of several or even one gene to detect benign and malignant, significantly The target detection area is reduced, the application scope of the technology is improved, and more samples can be included.
  • the methylation markers, detection methods and/or kits of the present application have the characteristics of high sensitivity and specificity in applications such as early diagnosis and recurrence monitoring of cancer.
  • This application detects the methylation level of DNA methylation markers in patient samples, and uses the detected methylation level data to predict scores according to the diagnostic model to distinguish cancer patients from non-cancer patients, enabling early screening.
  • the purpose of early diagnosis of cancer is higher accuracy and lower cost in the process.
  • the application provides a colorectal cancer methylation marker, which is an isolated nucleic acid molecule from mammals, and the sequence of the nucleic acid molecule includes: (1) any one of SEQ ID NO: 1-47 One or more (for example, at least 6, at least 7, at least 8 or at least 9) or all of the sequences shown or complementary sequences or variants thereof, which variants are at least 70% sequence identical to the corresponding sequence A sexual variant in which the methylation site is not mutated, or (2) a processed sequence of (1) that converts unmethylated cytosine to guanine A base with lower binding capacity than cytosine.
  • said (1) is selected from any of the following groups:
  • SEQ ID NO:4 or its complement or variant SEQ ID NO:11 or its complement or variant, SEQ ID NO:15 or its complement or variant, SEQ ID NO:18 or its complement or variant, SEQ ID NO:19 or its complement or variant, SEQ ID NO:30 or its complement or variant, SEQ ID NO:34 or its The complementary sequence or variant, SEQ ID NO:37 or the complementary sequence or variant thereof, SEQ ID NO:41 or the complementary sequence or variant thereof, optionally including any of the remaining sequences in SEQ ID NO:1-47 or more or complementary sequences or variants thereof,
  • SEQ ID NO:6 or its complement or variant SEQ ID NO:10 or its complement or variant, SEQ ID NO:13 or its complement or variant, SEQ ID NO:14 or its complement or variant, SEQ ID NO:22 or its complement or variant, SEQ ID NO:28 or its complement or variant, SEQ ID NO:43 or its Complementary sequences or variants, optionally also include any one or more of the remaining sequences in SEQ ID NOs: 1-47 or complementary sequences or variants thereof.
  • the methylation sites are contiguous CpGs.
  • the methylation marker can be any one or more CpG sites in the sequence region.
  • the nucleic acid molecule is used to detect DNA methylation levels of corresponding sequences in a sample internal standard or control.
  • the present application provides a reagent for detecting DNA methylation, which is used to screen the risk of colorectal cancer, diagnose colorectal cancer, and evaluate the prognosis of colorectal cancer.
  • the reagent includes the methyl group of the marker in the sample of the detection subject.
  • the DNA sequence includes one or more or all selected from CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, optionally including others in (p) One or more or all of the gene sequences.
  • the DNA sequence includes one or more or all selected from TTLL10, ACTR3B, BARX1, CUX2, DNM2 and SIM2, optionally also including other gene sequences in (p) One or more or all.
  • the DNA sequence includes one or more or all selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including other gene sequences in (p) one or more or all of them.
  • the marker comprises at least 3 CpG dinucleotides.
  • the DNA sequence includes a sense or antisense strand of DNA.
  • the fragment length is 1-1000 bp, preferably 1-700 bp.
  • the fragment is the promoter region of a gene sequence or a portion thereof.
  • the fragment contains at least 1, preferably at least 3 CpG dinucleotides.
  • the marker has the sequence of a nucleic acid molecule described herein.
  • the reagent is a primer molecule that hybridizes to the marker or transformed sequence thereof.
  • the primer molecule can amplify the marker or its transformed variant.
  • the primer sequences are methylation specific or non-specific.
  • the primer molecule is at least 9 bp.
  • the agent is a probe molecule that hybridizes to a marker or its transformed sequence.
  • the probe further contains a detectable substance.
  • the detectable substance is a 5' fluorescent reporter group and a 3' labeled quencher group.
  • the fluorescent reporter Because it is selected from Cy5, FAM and VIC.
  • the probe molecule is at least 12 bp.
  • the sample is from a mammal, preferably a human.
  • the present application provides a medium recorded with DNA sequences or fragments thereof and/or methylation information thereof, and the DNA sequences include:
  • the DNA sequence includes one or more or all selected from CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, optionally including others in (p) One or more or all of the gene sequences.
  • the DNA sequence includes one or more or all selected from TTLL10, ACTR3B, BARX1, CUX2, DNM2 and SIM2, optionally also including other gene sequences in (p) One or more or all.
  • the DNA sequence includes one or more or all selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including other gene sequences in (p) one or more or all of them.
  • the medium is used for comparison with gene methylation sequencing data to determine the presence, content and/or methylation levels of nucleic acid molecules containing the sequence or fragments thereof.
  • the marker comprises at least 3 CpG dinucleotides.
  • the DNA sequence includes a sense or antisense strand of DNA.
  • the fragment length is 1-1000 bp, preferably 1-700 bp. In one or more embodiments, the fragment is the promoter region of a gene sequence or a portion thereof. In one or more embodiments, the fragment contains at least 1, preferably at least 3 CpG dinucleotides.
  • the marker has the sequence shown in any one of the nucleic acid molecules SEQ ID NO: 1-47 described in the application.
  • the medium is a carrier printed with the DNA sequence or fragments thereof and/or methylation information thereof, including cards, such as paper, plastic, metal, and glass cards.
  • the medium is a computer program storing the sequence and/or its methylation information.
  • a computer-readable medium when the computer program is executed by the processor, the following steps are implemented: comparing the methylation sequencing data of the sample with the sequence or information, thereby obtaining the nucleic acid containing the sequence in the sample The presence, content and/or methylation level of the molecule. The presence, content and/or methylation level of nucleic acid molecules containing the sequence are used to screen colorectal cancer risk, diagnose colorectal cancer, and assess colorectal cancer prognosis.
  • this application also provides the use of the following (a) and optional (b) in preparing kits for screening colorectal cancer risk, diagnosing colorectal cancer, and evaluating colorectal cancer prognosis,
  • a reagent or device for determining the methylation level of a marker in a sample of a subject the marker being a DNA sequence and 5 kb upstream and 5 kb downstream of the DNA sequence, or a fragment thereof, or one or more thereof CpG dinucleotides
  • the DNA sequence includes one, more or all of the following gene sequences: (p) TTLL10, ST6GALNAC5, KCNA3, CACNA1E, TRAPPC12, UBE2F, ZIC4, ZNF595, EVC2, HMX1, PITX2, POU4F2, IRX4, IRX1 , CRHBP, KCNMB1, KCNQ5, TBX20, ACTR3C, ACTR3B, VIPR2, SOX17, MOS, PREX2, GDF6, OSR2, BARX1, SORCS3, VAX1, DPYSL4, UTF1, B3GAT1, HOXC13, CUX2, GLT1D1, ITGBL1, SKOR1, TM6SF1, LRRK1 , FOXL1, MYO15B, DNM2, ZNF536, YTHDF1 and SIM2.
  • the DNA sequence includes one or more or all selected from CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, optionally including others in (p) One or more or all of the gene sequences.
  • the DNA sequence includes one or more or all selected from TTLL10, ACTR3B, BARX1, CUX2, DNM2 and SIM2, optionally also including other gene sequences in (p) One or more or all.
  • the DNA sequence includes one or more or all selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including other gene sequences in (p) one or more or all of them.
  • the marker comprises at least 3 CpG dinucleotides.
  • the DNA sequence includes a sense or antisense strand of DNA.
  • the fragment length is 1-1000 bp, preferably 1-700 bp. In one or more embodiments, the fragment is the promoter region of a gene sequence or a portion thereof. In one or more embodiments, the fragment contains at least 1, preferably at least 3 CpG dinucleotides.
  • the marker has the sequence shown in any one of SEQ ID NOs: 1-47 of the nucleic acid molecules described in the present application.
  • the nucleic acid molecule is a nucleic acid molecule comprising the sequence shown in any one of SEQ ID NOs: 1-47.
  • the reagents comprise primer molecules and/or probe molecules.
  • the reagent comprises a primer molecule that hybridizes to the marker or transformed sequence thereof.
  • the primer molecules can amplify the DNA sequence or fragments thereof or their transformed variants.
  • the primer sequences are methylation specific or non-specific.
  • the primer molecule is at least 9 bp.
  • the agent is a probe molecule that hybridizes to the marker or its transformed sequence.
  • the probe further contains a detectable substance.
  • the detectable substance is a 5' fluorescent reporter group and a 3' labeled quencher group.
  • the fluorescent reporter gene is selected from Cy5, FAM, and VIC.
  • the probe molecule is at least 12 bp.
  • the reagent comprises a medium as described in any embodiment herein.
  • the kit is a non-invasive diagnostic kit.
  • the subject is a mammal, preferably a human.
  • the sample is from a mammalian tissue, cell or body fluid, such as an intestinal tissue sample, blood, serum or plasma.
  • the mammal is preferably a human.
  • the sample includes genomic DNA.
  • the sample is blood.
  • the DNA sequence is: the sequence of the corresponding marker in the genome, or its transformed sequence, or its sequence treated with a methylation-sensitive restriction endonuclease, said Conversion converts unmethylated cytosine into a base that has a lower binding capacity for guanine than cytosine.
  • the conversion is performed using enzymatic methods, preferably deaminase treatment, or the conversion is performed using non-enzymatic methods, preferably treatment with bisulfite, acid sulfite or metabisulfite or a combination thereof.
  • the kit further includes PCR reaction reagents.
  • the PCR reaction reagents include DNA polymerase, PCR buffer, dNTPs, and Mg 2+ .
  • the kit further includes additional reagents for detecting DNA methylation, the other reagents being reagents used in one or more of the following methods: bisulfite conversion-based PCR (e.g., methylation-specific PCR), DNA sequencing (e.g., bisulfite sequencing, whole-genome methylation sequencing, simplified methylation sequencing), methylation-sensitive restriction endonuclease assays, fluorescence quantification methods, methylation-sensitive high-resolution melting curve methods, chip-based methylation profiling, and mass spectrometry (e.g., flight mass spectrometry).
  • bisulfite conversion-based PCR e.g., methylation-specific PCR
  • DNA sequencing e.g., bisulfite sequencing, whole-genome methylation sequencing, simplified methylation sequencing
  • methylation-sensitive restriction endonuclease assays e.g., fluorescence quantification methods, methylation-sensitive high-resolution melting curve methods, chip
  • the other reagents are selected from one or more of the following: bisulfite, bisulfite, acid sulfite or metabisulfite or derivatives thereof, methylation sensitive or insensitive limitations Endonuclease, digestion buffer, fluorescent dye, fluorescent quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard and control.
  • the PCR reaction solution contains Taq DNA polymerase, PCR buffer, dNTPs, KCl, MgCl 2 and (NH 4 ) 2 SO 4 .
  • the Taq DNA polymerase is hot start Taq DNA polymerase.
  • the final concentration of Mg 2+ is 1.0-10.0mM.
  • screening for colorectal cancer risk, diagnosing colorectal cancer, and assessing colorectal cancer prognosis include: comparing methylation levels of markers with corresponding reference levels, and screening for colorectal cancer based on scores. Rectal cancer risk, diagnosing colorectal cancer, and assessing colorectal cancer prognosis.
  • the comparison includes directly comparing the methylation level of the marker with a reference level, or by calculating a score and comparing the score of the methylation level of the marker with a corresponding reference score.
  • the calculation is performed by constructing a logistic regression model.
  • the present application also provides a method for screening colorectal cancer risk, diagnosing colorectal cancer, or assessing colorectal cancer prognosis, including:
  • the methylation level of a marker in the sample of the detection object is a DNA sequence and the upstream 5kb and downstream 5kb of the DNA sequence, or a fragment thereof, or one or more CpG dinucleotides therein,
  • the DNA sequence includes one or more or all of the following gene sequences: (p) TTLL10, ST6GALNAC5, KCNA3, CACNA1E, TRAPPC12, UBE2F, ZIC4, ZNF595, EVC2, HMX1, PITX2, POU4F2, IRX4, IRX1, CRHBP, KCNMB1, KCNQ5, TBX20, ACTR3C, ACTR3B, VIPR2, SOX17, MOS, PREX2, GDF6, OSR2, BARX1, SORCS3, VAX1, DPYSL4, UTF1, B3GAT1, HOXC13, CUX2, GLT1D1, ITGBL1, SKOR1, TM6SF1, LRRK1, FOXL1,
  • the DNA sequence includes one or more or all selected from CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, optionally including others in (p) One or more or all of the gene sequences.
  • the DNA sequence includes one or more or all selected from TTLL10, ACTR3B, BARX1, CUX2, DNM2 and SIM2, optionally also including other gene sequences in (p) One or more or all.
  • the DNA sequence includes one or more or all selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including other gene sequences in (p) one or more or all of them.
  • the marker comprises at least 3 CpG dinucleotides.
  • the DNA sequence includes a sense or antisense strand of DNA.
  • the fragment length is 1-1000 bp, preferably 1-700 bp. In one or more embodiments, the fragment is the promoter region of a gene. In one or more embodiments, the fragment contains at least 1, preferably at least 3 CpG dinucleotides.
  • the marker has the sequence of the nucleic acid molecule described in the first aspect of the invention.
  • the method further includes the step of obtaining a biological sample containing DNA from the subject before step (1), such as DNA extraction and/or quality inspection.
  • step (1) includes performing the detection using primer molecules, probe molecules and/or media as described herein, and optionally nucleic acid molecules as described herein.
  • the detection includes, but is not limited to: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction endonuclease analysis, fluorescence quantification, methylation-sensitive High-resolution melting curve method, chip-based methylation profile analysis, and mass spectrometry.
  • the detection is DNA sequencing.
  • the DNA sequencing has a sequencing depth of at least 10X, preferably 20X, and more preferably 30X.
  • the sample is from a mammalian tissue, cell, body fluid, such as an intestinal tissue sample, blood, serum or plasma.
  • the mammal is preferably a human.
  • the sample is blood.
  • the sample includes genomic DNA.
  • the DNA sequence is: the sequence of the corresponding marker in the genome, or its transformed sequence, or its sequence treated with a methylation-sensitive restriction endonuclease, said Conversion converts unmethylated cytosine into a base that does not bind to guanine.
  • the conversion is performed using enzymatic methods, preferably deaminase treatment, or the conversion is performed using non-enzymatic methods, preferably treatment with bisulfite, acid sulfite or metabisulfite or a combination thereof.
  • the comparison in step (2) includes: directly comparing the methylation level of the marker in step (1) with a reference level, or calculating a score and comparing the methylation of the marker Level scores and corresponding reference scores.
  • the score is calculated using a logistic regression model.
  • step (3) includes: when the methylation level of the marker is greater than the reference level, or the score of the methylation level is greater than the reference score, the subject is at risk of developing colorectal cancer, Having colorectal cancer or colorectal cancer has a poor prognosis.
  • this application also provides a kit for screening colorectal cancer risk, diagnosing colorectal cancer, or assessing colorectal cancer prognosis, including:
  • a reagent or device for determining the methylation level of a marker in a sample of a subject the marker being a DNA sequence and 5 kb upstream and 5 kb downstream of the DNA sequence, or a fragment thereof, or one or more thereof CpG dinucleotides, and
  • the treatment converts unmethylated cytosine into a base with a lower guanine-binding capacity than cytosine,
  • the DNA sequence includes one, more or all of the following gene sequences: (p) TTLL10, ST6GALNAC5, KCNA3, CACNA1E, TRAPPC12, UBE2F, ZIC4, ZNF595, EVC2, HMX1, PITX2, POU4F2, IRX4, IRX1 , CRHBP, KCNMB1, KCNQ5, TBX20, ACTR3C, ACTR3B, VIPR2, SOX17, MOS, PREX2, GDF6, OSR2, BARX1, SORCS3, VAX1, DPYSL4, UTF1, B3GAT1, HOXC13, CUX2, GLT1D1, ITGBL1, SKOR1, TM6SF1, LRRK1 , FOXL1, MYO15B, DNM2, ZNF536, YTHDF1, SIM2.
  • the DNA sequence includes one or more or all selected from CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, optionally including others in (p) One or more or all of the gene sequences.
  • the DNA sequence includes one or more or all selected from TTLL10, ACTR3B, BARX1, CUX2, DNM2 and SIM2, optionally also including other gene sequences in (p) One or more or all.
  • the DNA sequence includes one or more or all selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including other gene sequences in (p) one or more or all of them.
  • the marker comprises at least 3 CpG dinucleotides.
  • the DNA sequence includes a sense or antisense strand of DNA.
  • the fragment length is 1-1000 bp, preferably 1-700 bp. In one or more embodiments, the fragment is the promoter region of a gene. In one or more embodiments, the fragment contains at least 1, preferably at least 3 CpG dinucleotides.
  • the marker comprises the sequence of a nucleic acid molecule described herein.
  • the kit is suitable for use as described in any of the embodiments herein.
  • the nucleic acid molecule is a nucleic acid molecule described herein.
  • the reagents comprise primer molecules and/or probe molecules.
  • the reagents comprise primer molecules that hybridize to the DNA sequences or fragments thereof or transformed sequences thereof.
  • the primer molecules can amplify the DNA sequence or fragments thereof or their transformed variants.
  • the primer sequences are methylation specific or non-specific.
  • the primer molecule is at least 9 bp.
  • the agent is a probe molecule that hybridizes to the DNA sequence or a fragment thereof or a transformed sequence thereof.
  • the probe further contains a detectable substance. in one or more real In an embodiment, the detectable substance is a 5' end fluorescent reporter group and a 3' end labeled quenching group. In one or more embodiments, the fluorescent reporter gene is selected from Cy5, FAM, and VIC.
  • the probe molecule is at least 12 bp.
  • the reagent comprises a medium as described in any embodiment herein.
  • the kit is a non-invasive diagnostic kit.
  • the subject is a mammal, preferably a human.
  • the sample is from a mammalian tissue, cell or body fluid, such as an intestinal tissue sample, blood, serum or plasma.
  • the mammal is preferably a human.
  • the sample includes genomic DNA.
  • the sample is blood.
  • the DNA sequence is: the sequence of the corresponding marker in the genome, or its transformed sequence, or its sequence treated with a methylation-sensitive restriction endonuclease, said Conversion converts unmethylated cytosine into a base that has a lower binding capacity for guanine than cytosine.
  • the conversion is performed using enzymatic methods, preferably deaminase treatment, or the conversion is performed using non-enzymatic methods, preferably treatment with bisulfite, acid sulfite or metabisulfite or a combination thereof.
  • the kit further includes PCR reaction reagents.
  • the PCR reaction reagents include DNA polymerase, PCR buffer, dNTP, and Mg 2+ .
  • the kit further includes a reagent for detecting DNA methylation, the reagent being a reagent selected from one or more of the following methods: bisulfite conversion-based PCR (for example, methylation-specific PCR), DNA sequencing (such as bisulfite sequencing, whole-genome methylation sequencing, simplified methylation sequencing), methylation-sensitive restriction endonuclease analysis, fluorescence quantification, Methylation-sensitive high-resolution melting curve methods, chip-based methylation profiling, mass spectrometry (e.g. flight mass spectrometry).
  • a reagent for detecting DNA methylation the reagent selected from one or more of the following methods: bisulfite conversion-based PCR (for example, methylation-specific PCR), DNA sequencing (such as bisulfite sequencing, whole-genome methylation sequencing, simplified methylation sequencing), methylation-sensitive restriction endonuclease analysis, fluorescence quantification, Methylation-sensitive high-resolution melting curve methods
  • the reagent is selected from one or more of the following: bisulfite and its derivatives, methylation-sensitive or insensitive restriction endonucleases, enzyme digestion buffers, fluorescent dyes, and fluorescence quenchers. , fluorescent reporter, exonuclease, alkaline phosphatase, internal standard, control substance.
  • the present application also provides a device for screening colorectal cancer risk, diagnosing colorectal cancer, or assessing colorectal cancer prognosis.
  • the device includes a memory, a processor, and a device stored on the memory and capable of processing.
  • the DNA sequence includes one or more or all of the following gene sequences: (p) TTLL10, ST6GALNAC5, KCNA3, CACNA1E, TRAPPC12, UBE2F, ZIC4, ZNF595, EVC2, HMX1, PITX2, POU4F2, IRX4, IRX1, CRHBP, KCNMB1, KCNQ5, TBX20, ACTR3C, ACTR3B, VIPR2, SOX17, MOS, PREX2, GDF6, OSR2, BARX1, SORCS3, VAX1, DPYSL4, UTF1, B3GAT1, HOXC13, CUX2, GLT1D1, ITGBL1, SKOR1, TM6SF1, LRRK1, FO
  • the DNA sequence includes the following gene sequence:
  • the DNA sequence includes one or more or all selected from CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, optionally including others in (p) One or more or all of the gene sequences.
  • the DNA sequence includes one or more or all selected from TTLL10, ACTR3B, BARX1, CUX2, DNM2 and SIM2, optionally also including other gene sequences in (p) One or more or all.
  • the DNA sequence includes one or more or all selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including other gene sequences in (p) one or more or all of them.
  • the marker comprises at least 3 CpG dinucleotides.
  • the DNA sequence includes a sense or antisense strand of DNA.
  • the fragment length is 1-1000 bp, preferably 1-700 bp. In one or more embodiments, the fragment is the promoter region of a gene. In one or more embodiments, the fragment contains at least 1, preferably at least 3 CpG dinucleotides. Preferably, the marker has the sequence of a nucleic acid molecule described herein.
  • step (1) is preceded by a step of obtaining DNA, such as DNA extraction and/or quality inspection.
  • step (1) includes using the primer molecules, probe molecules and/or media described herein, and optionally the nucleic acid molecules described herein, to detect the sequence of the sequence in the sample. Methylation levels.
  • the detection includes, but is not limited to: bisulfite conversion-based PCR, DNA sequencing (such as bisulfite sequencing, whole-genome methylation sequencing, simplified methylation sequencing), Methylation-sensitive restriction endonuclease analysis, fluorescence quantification, methylation-sensitive high-resolution melting curve method, chip-based methylation profile analysis, mass spectrometry (e.g. flight mass spectrometry).
  • the detection is DNA sequencing.
  • the DNA sequencing has a sequencing depth of at least 10X, preferably 20X, and more preferably 30X.
  • the sample is from a mammalian tissue, cell or body fluid, such as an intestinal tissue sample, blood, serum or plasma.
  • the mammal is preferably a human.
  • the sample includes genomic DNA.
  • the sample is blood.
  • the DNA sequence is: the sequence of the corresponding marker in the genome, or its transformed ylated sequence, or its sequence treated with a methylation-sensitive restriction endonuclease, which converts unmethylated cytosine into a base that does not bind to guanine.
  • the conversion is performed using enzymatic methods, preferably deaminase treatment, or the conversion is performed using non-enzymatic methods, preferably treatment with bisulfite, acid sulfite or metabisulfite or a combination thereof.
  • the comparison in step (2) includes: directly comparing the methylation level of the marker in step (1) with a reference level, or calculating a score and comparing the methylation of the marker Level scores and corresponding reference scores.
  • the score is calculated using a logistic regression model.
  • step (3) includes: when the methylation level of the marker is greater than the reference level, or the score of the methylation level is greater than the reference score, the subject is at risk of developing colorectal cancer, Having colorectal cancer or colorectal cancer has a poor prognosis.
  • the present application provides the application of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in preparing a detection reagent or diagnostic kit for diagnosing gastric cancer, and for Application of a device for determining the methylation status or level of at least one CpG dinucleotide of one or more target markers in preparing a diagnostic kit for diagnosing gastric cancer; wherein the one or more target markers are selected from Any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, in the following sequence (1)-(48) 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47 or all 48 sequences:
  • (21) Contains chr5:140871317:140871517 (SEQ ID NO: 68) and its sequence within 5kb upstream and/or within 5kb downstream;
  • (22) Contains chr5:92906255:92906617 (SEQ ID NO:69) and its sequence within 5kb upstream and/or within 5kb downstream;
  • the one or more gastric cancer target markers include the (3), (8), (13), (15), (17), (19), (22) , (25), (29), (31), (37), (38), (40), (41), (42), (43), (45), (47) and (48) the sequence described.
  • the one or more gastric cancer target markers include the (2), (6), (7), (8), (12), (15), (19) , (25), (28), (32), (33), (36), (37), (40), (42), (43), (44), (46) and (48) the sequence described.
  • the one or more gastric cancer target markers include the (3), (13), (14), (20), (22), (28), (30) and a sequence described in item (36); or
  • the one or more gastric cancer target markers include the (3), (13), (27), Sequences described in items (30) and (35).
  • the one or more gastric cancer target markers include the (7), (14), (22), (26), (35), (38), (40) , the sequences described in items (43), (47) and (48).
  • the one or more gastric cancer target markers are selected from the group consisting of (7), (14), (22), (26), (35), (38), (40) ), (43), (47) and (48) any of the sequences described in 1, 2, 3, 4, 5, 6, 7, 8 or 9.
  • the gastric cancer target marker includes the sequence described in item (40), and any one or more sequences in items (1)-(39) and (41)-(48) .
  • the gastric cancer target marker includes the sequence described in item (47), and any one or more sequences in items (1)-(46) and (48).
  • the gastric cancer target marker includes the sequence described in item (43), and any one or more sequences in items (1)-(42) and (44)-(48) .
  • the gastric cancer target marker includes the sequence described in item (26), and any one or more sequences in items (1)-(25) and (27)-(48) .
  • the gastric cancer target marker includes the sequence described in item (35), and any one or more sequences in items (1)-(34) and (36)-(48) .
  • the gastric cancer target marker includes the sequence described in item (14), and any one or more sequences in items (1)-(13) and (15)-(48) .
  • the gastric cancer target marker includes the sequence described in item (38), and any one or more sequences in items (1)-(37) and (39)-(48) .
  • the gastric cancer target marker includes the sequence described in item (22), and any one or more sequences in items (1)-(21) and (23)-(48) .
  • the gastric cancer target marker includes the sequence described in item (7), and any one or more sequences in items (1)-(6) and (8)-(48) .
  • the gastric cancer target marker includes the sequence described in item (48), and any one or more sequences in items (1)-(47).
  • the gastric cancer target marker includes within 1 kb, preferably within 500 bp, more preferably within 300 bp, and more upstream of each starting site of any sequence in SEQ ID NO: 48-95.
  • the sequence is within 100 bp and/or the sequence is within 1 kb downstream of each terminal site, preferably within 500 bp, preferably within 300 bp, preferably within 100 bp; preferably, the target marker contains the SEQ ID NO: 48-95 Any sequence with a length of less than 400bp gene sequence within.
  • sequences described in items (1) to (48) are the sequences shown in SEQ ID NO: 48-95 respectively.
  • the present application provides the use of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in the preparation of detection reagents or diagnostic kits for diagnosing esophageal cancer, and the use of Application of a device for determining the methylation status or level of at least one CpG dinucleotide of one or more target markers in preparing a diagnostic kit for diagnosing esophageal cancer; wherein the one or more esophageal cancer targets
  • the marker is selected from any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42 or all 43 sequence, and sequences within 5 kb upstream and/or within 5 kb downstream.
  • the target marker includes within 1 kb, preferably within 500 bp, more preferably within 300 bp, and more preferably within 1 kb upstream of each starting site of any sequence in SEQ ID NO: 96-138. Sequences within 100 bp and/or sequences within 1 kb downstream of each terminal site, preferably within 500 bp, preferably within 300 bp, preferably within 100 bp; preferably, the target marker is any sequence containing the SEQ ID NO: 96-138 A gene sequence with a length of less than 400bp.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining a DNA region selected from chromosome range numbers 1 to 43 in Table 1 below, or a complementary region thereof, or The presence and/or content of the modification state of the above fragments:
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, which includes determining the upstream or downstream 5k bp selected from any one of SEQ ID NO: 96 to 138 in the sample to be tested. The presence and/or content of the modified state of the DNA region, or its complementary region, or fragments thereof.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, including determining a region selected from the sample to be tested within 5k bp upstream or downstream of SEQ ID NO: 105 and the gene number in Table 2 below. The presence and/or content of the modification state of the DNA region where genes 1 to 76 are located, or its fragments.
  • the present application provides a nucleic acid, said nucleic acid comprising the presence of a modified state capable of binding to a DNA region selected from chromosome range numbers 1 to 43 in Table 1 of the present application, or a complementary region thereof, or a fragment thereof. and/or content.
  • the present application provides a nucleic acid comprising a DNA region capable of binding to a DNA region within 5 kbp upstream or downstream shown in any one of SEQ ID NO: 96 to 138, or a complementary region thereof, or the above The presence and/or content of the modification state of the fragment.
  • the present application provides a nucleic acid comprising a region capable of binding to a region selected from the group consisting of SEQ ID NO: 105 upstream or within 5k bp downstream of SEQ ID NO: 105 and a region where genes numbered 1 to 76 in Table 2 of the present application are located.
  • a nucleic acid comprising a region capable of binding to a region selected from the group consisting of SEQ ID NO: 105 upstream or within 5k bp downstream of SEQ ID NO: 105 and a region where genes numbered 1 to 76 in Table 2 of the present application are located.
  • the present application provides a kit comprising the nucleic acid described in the present application.
  • the present application provides a method for preparing a nucleic acid, comprising designing a method that can combine the DNA region selected from the chromosome range numbered 1 to 43 of the present application, or its complementary region, or the modification state of the above-mentioned fragment.
  • the DNA region, or its complementary region, or the above-mentioned transformed region, or the nucleic acid of the above-mentioned fragment are designed a method that can combine the DNA region selected from the chromosome range numbered 1 to 43 of the present application, or its complementary region, or the modification state of the above-mentioned fragment.
  • the present application provides a method for preparing nucleic acid, comprising a DNA region within 5 kbp upstream or downstream shown in any one of SEQ ID NO: 96 to 138, or a complementary region thereof, or the above
  • the modified state of the fragment is used to design a nucleic acid capable of binding to the DNA region, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • the present application provides a method for preparing nucleic acids, comprising DNA selected from a region within 5 kbp upstream or downstream of SEQ ID NO: 105 and genes numbered 1 to 76 in Table 2 of the present application. Modification state of the region or its fragment, and design a nucleic acid capable of binding to the DNA region, its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • the present application provides nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region, use in the preparation of substances for assessing the presence and/or progression of esophageal cancer, said method for determining the presence and/or progression of esophageal cancer.
  • the DNA region includes the DNA region selected from the chromosome range numbers 1 to 43 in Table 1 of the present application, or its complementary region, or the modified state of the above fragments, and is designed to be able to bind to the DNA region, or its complementary region, or the above The transformed region, or the sequence of the above fragment.
  • the present application provides nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region, use in the preparation of substances for assessing the presence and/or progression of esophageal cancer, said method for determining the presence and/or progression of esophageal cancer.
  • the DNA region includes a DNA region selected from the upstream or downstream 5k bp shown in any one of SEQ ID NO: 96 to 138, or its complementary region, or the modified state of the above fragments, and is designed to be able to bind the DNA region, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • the present application provides nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region, use in the preparation of substances for assessing the presence and/or progression of esophageal cancer, said method for determining the presence and/or progression of esophageal cancer.
  • the DNA region includes the region within 5k bp upstream or downstream of SEQ ID NO: 105 and the DNA region where the genes numbered 1 to 76 in Table 2 of the application are located, or the modification status of its fragments.
  • the design can combine all The DNA region, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • the present application provides the use of a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in preparing detection reagents or diagnostic kits for diagnosing liver cancer, and for use Application of a device for determining the methylation status or level of at least one CpG dinucleotide of one or more target markers in preparing a diagnostic kit for diagnosing liver cancer; wherein the one or more liver cancer target markers are selected From any of SEQ ID NO:139-340 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
  • the target marker includes within 1 kb, preferably within 500 bp, more preferably within 300 bp, more preferably within 1 kb upstream of each starting site of any one of the SEQ ID NO: 139-340 sequences. Sequences within 100 bp and/or sequences within 1 kb downstream of each terminal site, preferably within 500 bp, preferably within 300 bp, preferably within 100 bp; preferably, the target marker is any of the SEQ ID NOs: 139-340. A gene sequence with a length of less than 400bp.
  • the present application provides a method for assessing the presence and/or progression of liver cancer, comprising determining a DNA region within 5k bp upstream or downstream of chromosome range numbers 44 to 245 in the sample to be tested, or its The presence and/or content of the complementary region, or modification state of the above-mentioned fragments.
  • This application provides a method for assessing the presence and/or progression of liver cancer, which includes determining a DNA region within 5k bp upstream or downstream of any one of SEQ ID NO: 139 to 340 in a sample to be tested, or its The presence and/or content of the complementary region, or modification state of the above-mentioned fragments.
  • This application provides a method for assessing the presence and/or progression of liver cancer, which includes determining the DNA region within 5k bp upstream or downstream of the gene numbered 77 to 354 in Table 4 in the sample to be tested, or its fragment. The presence and/or content of the modified state.
  • the present application provides a nucleic acid, said nucleic acid comprising a modified state capable of binding to a DNA region within 5k bp upstream or downstream of chromosome range numbers 44 to 245 selected from Table 3 above, or a complementary region thereof, or a modified state of the above fragments and/or content.
  • the present application provides a nucleic acid, said nucleic acid comprising a modification capable of binding to a DNA region within 5 kbp upstream or downstream shown in any one of SEQ ID NO: 139 to 340, or a complementary region thereof, or a fragment thereof.
  • the presence and/or content of a state is provided.
  • the present application provides a nucleic acid comprising a gene capable of binding to genes numbered 77 to 354 selected from the group consisting of genes in Table 4 above.
  • the present application provides a kit comprising the nucleic acid described in the present application.
  • the present application provides a method for preparing nucleic acids, comprising designing a DNA region within 5k bp upstream or downstream of chromosome range numbers 44 to 245 selected from Table 3, or a complementary region thereof, or the modification state of the above fragments.
  • the present application provides a method for preparing nucleic acids, comprising a DNA region within 5 kbp upstream or downstream shown in any one of SEQ ID NO: 139 to 340, or a complementary region thereof, or a modification state of the above fragments , design nucleic acids capable of binding to the DNA region, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • the present application provides a method for preparing nucleic acids, which includes designing a DNA region that can bind to the DNA based on the modification status of the DNA region within 5k bp upstream or downstream of the gene numbered 77 to 354 in Table 4, or its fragment. region, or its complementary region, or the above-mentioned transformed region, or the nucleic acid of the above-mentioned fragment.
  • the present application provides nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region, for use in the preparation of substances for assessing the presence and/or progression of liver cancer, said use for determining the upstream or downstream 5k bp
  • the DNA regions within include DNA regions selected from the chromosome range numbers 44 to 245 in Table 3 above or within 5k bp upstream or downstream, or their complementary regions, or modification states of the above fragments, designed to be able to combine with the DNA regions, or their The complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • the present application provides nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region, for use in the preparation of substances for assessing the presence and/or progression of liver cancer, said use for determining the upstream or downstream 5k bp
  • the DNA region within includes a DNA region selected from the upstream or downstream 5k bp shown in any one of SEQ ID NO: 139 to 340, or a complementary region thereof, or a modified state of the above fragments, designed to be able to bind to the DNA region , or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • the present application provides nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region, for use in the preparation of substances for assessing the presence and/or progression of liver cancer, said use for determining the upstream or downstream 5k bp
  • the DNA region within includes the DNA region selected from the application gene numbered 77 to 354 within 5k bp upstream or downstream of the gene, or the modified state of its fragment, designed to be able to combine with the DNA region, or its complementary region, or the above-mentioned The transformed region, or the sequence of the above fragment.
  • the reagents include primers and/or probe molecules; preferably, the primer molecules are identical to, complementary to, or hybridize under stringent conditions to the one or more target markers and Containing at least 9 consecutive nucleotides, the probe molecule hybridizes to the amplification product of the one or more target markers under stringent conditions.
  • the reagents are those required to perform genomic simplified methylation sequencing technology.
  • the present application also provides a diagnostic reagent or diagnostic kit for detecting the methylation status or methylation level of at least one CpG dinucleotide of one or more target markers to diagnose cancer, which includes A reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers; wherein the one or more target markers are as described above.
  • the diagnostic reagent or diagnostic kit includes primers and/or probe molecules; preferably, the primer molecules are identical to, complementary to, or hybridize under stringent conditions to the one or more target markers and comprising at least 9 consecutive nucleotides, and the probe molecule hybridizes to the amplification product of the one or more target markers under stringent conditions.
  • the diagnostic reagent or diagnostic kit further includes primer molecules and/or probe molecules for detecting the internal reference gene ACTB.
  • the diagnostic reagent or diagnostic kit further includes one or more substances selected from the following: PCR buffer, polymerase, dNTPs, restriction endonucleases, enzyme digestion buffer, Fluorescent dye, fluorescent quencher, fluorescent reporter, exonuclease, alkaline phosphatase, internal standard, control, KCl, MgCl 2 and (NH 4 ) 2 SO 4 .
  • the reagents further include reagents for use in one or more of the following methods: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis , fluorescence quantification method, methylation-sensitive high-resolution melting curve method, chip-based methylation profile analysis and mass spectrometry.
  • the reagent is selected from one or more of the following: bisulfite and its derivatives, fluorescent dyes, fluorescence quenchers, fluorescent reporters, internal standards, and controls.
  • the present application also provides at least one reagent or set of reagents for distinguishing methylated and unmethylated CpG dinucleotides within at least one target region of genomic DNA prepared for use in detecting and/or classifying cancer in an individual.
  • the method includes contacting genomic DNA isolated from the individual biological sample with the at least one reagent or set of reagents, wherein the target region is the same as, identical to, or A sequence of at least 16 contiguous nucleotides complementary to one or more markers of interest, wherein the contiguous nucleotides comprise at least one CpG dinucleotide sequence, thereby providing, at least in part, the detection of cancer and/or Classification, wherein the one or more target markers are as described above.
  • the present application also provides one or more reagents, amplifications that convert the unmethylated cytosine base at position 5 into uracil or other bases that are detectably different from cytosine in hybridization performance.
  • the one or more target markers are as described above.
  • the genomic DNA or fragments thereof are treated with a reagent selected from the group consisting of bisulfite, acid sulfite, metabisulfite and combinations thereof.
  • thermostable DNA polymerase as the amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using polymerase chain reaction and /or generate amplification products with detectable labels for contact or amplification of nucleic acid molecules.
  • contacting or amplification in c) involves the use of methylation-specific primers.
  • the present application also provides one or more methylation-sensitive restriction enzymes and amplification enzymes and at least one primer comprising at least 9 consecutive nucleotides for use in the preparation of detection and/or classification of cancer in individuals.
  • the primer is identical to, complementary to, or hybridizes to one or more target markers under stringent conditions; the method includes:
  • the one or more target markers are as described above.
  • the presence or absence of an amplification product is determined by hybridizing to at least one nucleic acid or peptide nucleic acid that is equal to or complementary to one or more nucleic acids selected from the group consisting of A fragment of at least 16 bases long of the sequence of the target marker.
  • the present application also provides a method for detecting and/or classifying cancer in an individual, the method comprising the following steps:
  • b)b1) Treat the genomic DNA or fragments thereof of a) with one or more reagents that can convert the unmethylated cytosine base at position 5 into uracil or in Other compounds that measurably differ from cytosine in hybridization properties other bases; or b2) digest the genomic DNA or fragments thereof described in a) with one or more methylation-sensitive restriction enzymes,
  • the one or more target markers are as described above.
  • the genomic DNA or fragments thereof are treated with a reagent selected from the group consisting of bisulfite, acid sulfite, metabisulfite and combinations thereof.
  • thermostable DNA polymerase as the amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using polymerase chain reaction and /or produce amplification products with detectable labels for contacting or amplifying nucleic acid molecules.
  • contacting or amplification in c) involves the use of methylation-specific primers.
  • the presence or absence of an amplification product is determined by hybridizing to at least one nucleic acid or peptide nucleic acid that is the same, equivalent or complementary to one selected from the group consisting of Or at least 16 bases long fragment of the sequence of multiple target markers.
  • the present application also provides the use of processed nucleic acids derived from one or more target markers in the preparation of a kit for diagnosing cancer, wherein the processing is suitable for converting the one or more target markers into At least one unmethylated cytosine base of the substance is converted to uracil or other bases that are detectably different from cytosine on hybridization, and the one or more target markers are as described above.
  • the present application also provides an apparatus for detecting and diagnosing cancer in an individual, the apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program
  • the following steps are achieved when: (1) obtaining the methylation level or methylation status of at least one CpG dinucleotide of one or more target markers in the sample, and (2) the methylation level according to (1) Or methylation status is used to diagnose cancer; wherein the one or more target markers are as described above.
  • the present application provides a storage medium recording a program that can run the method described in the present application.
  • the present application provides a device, which includes the storage medium described in the present application, and optionally further includes a processor coupled to the storage medium, the processor being configured to store the data based on the storage medium.
  • the program in the storage medium is executed to implement the method described in this application.
  • Figure 1 shows the process for screening colorectal cancer methylation markers.
  • Figure 2 shows the methylation level distribution of colorectal cancer and non-colorectal cancer samples in the training set.
  • Figure 3 shows the distribution of methylation levels of colorectal cancer and non-colorectal cancer samples in the test set.
  • Figure 4 shows the distribution of colorectal cancer ALLMODEL prediction scores.
  • Figure 5 shows the colorectal cancer ALLMODEL ROC curve.
  • Figure 6 shows the distribution of SUBMODEL1 prediction scores for colorectal cancer.
  • Figure 7 shows the ROC curve of SUBMODEL1 in colorectal cancer.
  • Figure 8 shows the distribution of SUBMODEL2 prediction scores for colorectal cancer.
  • Figure 9 shows the ROC curve of SUBMODEL2 in colorectal cancer.
  • Figure 10 shows the distribution of SUBMODEL3 prediction scores for colorectal cancer.
  • Figure 11 shows the ROC curve of SUBMODEL3 in colorectal cancer.
  • Figure 12 shows a flow chart for performance discrimination of single methylation markers in gastric cancer.
  • Figure 13 shows the distribution of model prediction scores in the training set and test set samples of the model constructed for all target markers of gastric cancer.
  • Figure 14 shows the ROC curve of the model constructed with all target markers of gastric cancer in diagnosing gastric cancer in the training set and test set samples.
  • Figure 15 shows the distribution of model prediction scores in the training set and test set samples of the model constructed with gastric cancer markers.
  • Figure 16 shows the ROC curve of the model constructed with gastric cancer markers in diagnosing gastric cancer in the training set and test set samples.
  • Figure 17 shows the distribution of model prediction scores in the training set and test set samples of the model constructed with gastric cancer markers.
  • Figure 18 shows the ROC curve of the model constructed with gastric cancer markers in diagnosing gastric cancer in the training set and test set samples.
  • Figure 19 shows the model prediction score distribution diagram of the model constructed with gastric cancer markers in the training set and test set samples.
  • Figure 20 shows the ROC curve of the model constructed with gastric cancer markers in diagnosing gastric cancer in the training set and test set samples.
  • FIG21 shows the distribution of model prediction scores of the model constructed by gastric cancer markers in the training set and test set samples.
  • Figure 22 shows the ROC curve of the model constructed with gastric cancer markers in diagnosing gastric cancer in the training set and test set samples.
  • Figure 23 shows the ROC curve of the prediction model for diagnosing esophageal cancer.
  • Figure 24 shows the prediction score distribution of the esophageal cancer prediction model in each group.
  • Figure 25 shows the ROC curve of the prediction model for 16 esophageal cancer methylation marker combinations in diagnosing esophageal cancer.
  • Figure 26 shows the prediction score distribution of the prediction model for 16 esophageal cancer methylation marker combinations in each group.
  • Figure 27 shows the ROC curve of the prediction model for 16 esophageal cancer methylation marker combinations in diagnosing esophageal cancer.
  • Figure 28 shows the prediction score distribution of the prediction model for 16 esophageal cancer methylation marker combinations in each group.
  • Figure 29 shows the ROC curve of the prediction model for diagnosing esophageal cancer.
  • Figure 30 shows the prediction score distribution of the esophageal cancer prediction model in each group.
  • Figure 31 shows the ROC curve of the prediction model of seven esophageal cancer methylation marker combinations in diagnosing esophageal cancer.
  • Figure 32 shows the distribution of prediction scores in each group of the prediction model of seven esophageal cancer methylation marker combinations.
  • Figure 33 shows the ROC curve of the prediction model of seven esophageal cancer methylation marker combinations in diagnosing esophageal cancer.
  • Figure 34 shows the distribution of prediction scores in each group of the prediction model of seven esophageal cancer methylation marker combinations.
  • Figure 35 shows the ROC curve of the prediction model for diagnosing esophageal cancer.
  • Figure 36 shows the prediction score distribution of the esophageal cancer prediction model in each group.
  • Figure 37 shows the ROC curve of the prediction model for 17 esophageal cancer methylation marker combinations in diagnosing esophageal cancer.
  • Figure 38 shows the prediction score distribution of the prediction model for 17 esophageal cancer methylation marker combinations in each group.
  • Figure 39 shows the ROC curve of the prediction model for 15 esophageal cancer methylation marker combinations in diagnosing esophageal cancer.
  • Figure 40 shows the prediction score distribution of the prediction model for 15 esophageal cancer methylation marker combinations in each group.
  • Figure 41 shows the ROC curve of the prediction model for diagnosing liver cancer.
  • Figure 42 shows the prediction score distribution of the liver cancer prediction model in each group.
  • Figure 43 shows the ROC curve of the prediction model of 25 liver cancer methylation marker combinations for diagnosing liver cancer.
  • Figure 44 shows the prediction score distribution of the prediction model for 25 liver cancer methylation marker combinations in each group.
  • Figure 45 shows the ROC curve of the prediction model of 52 liver cancer methylation marker combinations for diagnosing liver cancer.
  • Figure 46 shows the prediction score distribution of the prediction model for 52 liver cancer methylation marker combinations in each group.
  • methylation marker refers to a nucleic acid or gene region of interest, a methylation site, whose methylation level or score of a computational model based on the methylation level is indicative of cancer status.
  • the term “marker of interest” refers to a nucleic acid or gene region of interest whose methylation level is indicative of whether a subject has cancer.
  • the term “methylation marker” or “target marker” shall be considered to include all transcript variants thereof and all promoter and regulatory elements thereof. As understood by those skilled in the art, certain genes are known to exhibit allelic variation or single nucleotide polymorphisms ("SNPs”) between individuals.
  • SNPs single nucleotide polymorphisms
  • SNPs include insertions and deletions of simple repeat sequences of varying lengths, such as di- and tri-nucleotide repeats. Therefore, this application should be understood to extend to all forms of markers/genes resulting from any other mutation, polymorphism or allelic variation.
  • methylation marker shall include both the sense strand sequence of the marker or gene and the antisense strand sequence of the marker or gene.
  • methylation marker or "target marker” as used herein is to be interpreted broadly to include both 1) the original marker (at a specific methylation) found in a biological sample or genomic DNA; 2) Its processed sequence (for example, the corresponding region after bisulfite conversion or the corresponding region after treatment with methylation-sensitive restriction endonuclease MSRE).
  • the bisulfite-converted corresponding region differs from the target marker in the genomic sequence in that one or more unmethylated cytosine residues are converted to uracil bases, thymine bases, or during hybridization Other bases that behave differently than cytosine.
  • the MSRE-treated corresponding region differs from the target marker in the genomic sequence in that the sequence is cleaved at one or more MSRE cleavage sites.
  • the methylation markers or target markers of the present application also include non-enzymatic conversion (such as the corresponding region after bisulfite conversion), and the corresponding region obtained after enzymatic conversion (such as MSRE conversion).
  • the target markers of the present application also include various variants of each of the above genes.
  • Variants include those from the same region that are at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% sequence identical to a gene or region described herein nucleic acid sequence that is unique (i.e., has one or more deletions, insertions, substitutions, reverse sequences, etc.). Therefore, the present application should be understood to extend to such variants that achieve the same results despite the fact that there is minor genetic variation in the actual nucleic acid sequence between individuals.
  • percent (%) sequence identity refers to the amino acid (or nucleic acid) residues and The percentage of identity between the amino acid (or nucleic acid) residues of a reference sequence after sequence alignment. Gaps (if necessary) can be introduced during alignment to maximize the number of identical amino acids (or nucleic acids). In other words, the percent sequence identity (%) of an amino acid sequence (or nucleic acid sequence) can be determined by dividing the number of amino acid residues (or bases) that are identical to the reference sequence by the number of amino acid residues (or bases) in the candidate sequence or reference sequence. ) (whichever is shorter) is calculated. Conservative substitutions of amino acid residues may or may not be considered identical residues.
  • Percent amino acid (or nucleic acid) sequence identity can be determined, for example, using publicly available tools such as BLASTN, BLASTp (available at the website of the National Center for Biotechnology Information (NCBI), see also Altschul SFet al., J. Mol. Biol., 215:403–410 (1990); Stephen F.
  • the methylation markers or target markers of this application also include the corresponding regions of the 5 kb upstream of the start site and the 5 kb downstream of the end site of the above-mentioned genes after non-enzymatic conversion (such as bisulfite conversion) or enzymatic conversion.
  • the corresponding region after treatment with pro-method methods such as methylation-sensitive restriction enzyme treatment).
  • the "methylation level” mentioned in this application refers to the methylation level of the CpG site involved or the average methylation level of multiple or all CpG sites in the involved sequence.
  • the methylation level of a site generally refers to the percentage of methylated Cs at the site. If all Cs at the CpG site are unmethylated, the methylation level is is zero. Methylation levels may also be the result of other types of calculations, which are within the knowledge of those skilled in the art.
  • an increase or decrease in the methylation level of a sequence does not mean that the methylation level of all CpG sites in the region is increased or decreased.
  • the process of converting results from methods of detecting DNA methylation (eg, simplified methylation sequencing) into methylation levels is known in the art. For example, based on the methylation level of CpG sites detected in the promoter region of each gene, the average methylation is calculated and used as the DNA methylation level in the promoter region of the gene.
  • the methylation level is obtained by MethylTitan (CN201910515830, Kunyuan) methylation sequencing method. Methylation levels can be normalized.
  • the "methylation information" described in this application includes characteristic information related to cytosines that may be methylated in the sequence.
  • the cytosine that may be methylated is usually the C in CpG.
  • Such characteristics include, but are not limited to: whether any cytosine (C) residue within the sequence is methylated, the location of one or more methylation sites (e.g., CpG dinucleotides) and/or their methylation Level of methylation, methylation level of any specific region of a nucleic acid, frequency or percentage of methylated C, relative concentration, absolute concentration or pattern of methylated C or unmethylated C, methylation haplotype ratio (MHL), average methylation level (AMF), and allelic differences in methylation due to differences in, for example, allelic origins.
  • C cytosine residue within the sequence is methylated
  • the location of one or more methylation sites e.g., CpG dinucleotides
  • MHL methylation ha
  • cytosine (C) residues within a nucleic acid sequence are methylated, it may be said to be “hypermethylated” or have "increased methylation,” whereas if within a DNA sequence One or more cytosine (C) residues are unmethylated, then they can be said to be “demethylated” or have "minus Less methylation”.
  • the methylation levels of the tested genes can be mathematically analyzed to obtain a score.
  • the term "methylation score” refers to a numerical value obtained by calculating methylation levels using mathematical methods (eg, mathematical models). For the tested sample, when the score is greater than the threshold, the determination result is positive, that is, it is cancer or has a risk of developing cancer or has a poor prognosis; otherwise, it is negative.
  • Conventional mathematical analysis methods and the process of determining thresholds are known in the art. Exemplary methods are mathematical models, including but not limited to regression models, support vector machines, random forests, etc.
  • a support vector machine for the training group samples, and the accuracy, sensitivity and specificity of the detection results as well as the area under the predicted value characteristic curve (ROC) (AUC) are calculated using the model.
  • ROC predicted value characteristic curve
  • Statistical test set sample prediction scores Another example is to construct a logistic regression (Logistic Regression) for the methylation levels of differential methylation markers, and use the model to count the accuracy, sensitivity and specificity of the detection results as well as the area under the predictive value characteristic curve (ROC) (AUC) , statistical test set sample prediction scores.
  • Logistic Regression logistic regression
  • subject or “individual” as used herein includes both humans and non-human animals.
  • Non-human animals include all vertebrate animals, such as mammals and non-mammals.
  • the subject is a human.
  • the term "gene” includes the coding and non-coding sequences of the gene in question on the genome.
  • Non-coding sequences include introns, promoters, regulatory elements or sequences, etc.
  • Molecular diagnosis in the present invention in addition to early diagnosis of cancer (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer), also includes late stage diagnosis of cancer (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer) , and also includes cancer (eg, colorectal, gastric, esophageal, and/or liver cancer) screening, risk assessment, prognosis, disease identification.
  • Early diagnosis refers to the possibility of detecting cancer before its occurrence and/or metastasis, preferably before morphological changes in tissues or cells are observable.
  • mutant refers to a polynucleoside that changes a nucleic acid sequence by the insertion, deletion or substitution of one or more nucleotides compared to a reference sequence while retaining its ability to hybridize to other nucleic acids. acid.
  • the mutants described in any embodiment of the present application include having at least 70%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 97% sequence identity to the reference sequence and retaining Reference sequence to biologically active nucleotide sequence. Sequence identity between two aligned sequences can be calculated using, for example, NCBI's BLASTn.
  • Mutants also include nucleotide sequences that have one or more mutations (insertions, deletions, or substitutions) in the reference sequence and in the nucleotide sequence while still retaining the biological activity of the reference sequence.
  • the plurality of mutations usually refers to within 1-10, such as 1-8, 1-5 or 1-3.
  • the substitution may be between purine nucleotides and pyrimidine nucleotides, or between purine nucleotides or between pyrimidine nucleotides.
  • Substitutions are preferably conservative substitutions. For example, in the art, conservative substitutions with nucleotides with similar or similar properties generally do not alter the stability and function of the polynucleotide.
  • Conservative substitutions include the exchange of purine nucleotides (A and G) and the exchange of pyrimidine nucleotides (T or U and C). Therefore, substitution of one or several positions in a polynucleotide of the invention with residues from the same residue will not materially affect its activity. Furthermore, methylation sites (eg, consecutive CGs) are not mutated in the variants of the invention. That is, the method of the present invention detects the methylation status of methylatable sites in the corresponding sequence, and mutations can occur in bases at non-methylatable sites. Typically, methylation sites are contiguous CpG dyads glycosides.
  • bases of DNA or RNA can be converted.
  • the "transformation”, “cytosine conversion” or “CT transformation” described in this application refers to the use of non-enzymatic or enzymatic methods to process DNA to convert unmodified cytosine bases (cytosine, C) into the ability to bind to guanine The process of bases lower than cytosine (such as uracil (U)).
  • Non-enzymatic or enzymatic methods for converting cytosine are well known in the art.
  • non-enzymatic methods include treatment with conversion reagents such as bisulfite, acid sulfite or metabisulfite, such as calcium bisulfite, sodium bisulfite, potassium bisulfite, ammonium bisulfite , magnesium bisulfite, aluminum bisulfite, bisulfite ion, sodium bisulfate, potassium bisulfate and ammonium bisulfate, and any combination thereof.
  • conversion reagents such as bisulfite, acid sulfite or metabisulfite, such as calcium bisulfite, sodium bisulfite, potassium bisulfite, ammonium bisulfite , magnesium bisulfite, aluminum bisulfite, bisulfite ion, sodium bisulfate, potassium bisulfate and ammonium bisulfate, and any combination thereof.
  • conversion reagents such as bisulfite, acid sulfite or metabisulfite, such as calcium bisulfite, sodium bisulfite, potassium bisulfit
  • Diagnosis in the present invention includes, in addition to early diagnosis of colorectal cancer, late diagnosis of colorectal cancer, and also includes colorectal cancer screening, risk assessment, prognosis, and disease identification.
  • Early diagnosis refers to the possibility of detecting cancer before metastasis, preferably before morphological changes in tissues or cells are observable.
  • methylation levels of these genes are related to the nature of colorectal cancer: TTLL10, ST6GALNAC5, KCNA3, CACNA1E, TRAPPC12, UBE2F, ZIC4 , ZNF595, EVC2, HMX1, PITX2, POU4F2, IRX4, IRX1, CRHBP, KCNMB1, KCNQ5, TBX20, ACTR3C, ACTR3B, VIPR2, SOX17, MOS, PREX2, GDF6, OSR2, BARX1, SORCS3, VAX1, DPYSL4, UTF1, B3GAT1 , HOXC13, CUX2, GLT1D1, ITGBL1, SKOR1, TM6SF1, LRRK1, FOXL1, MYO15B, DNM2, ZNF536, YTHDF1, SIM2.
  • the present invention provides the methylation detection of the above-mentioned genes in samples (especially blood), and uses mathematical models to distinguish colorectal cancer based on their methylation levels, so as to achieve the purpose of non-invasive and accurate diagnosis of colorectal cancer.
  • the methylation marker of colorectal cancer includes a DNA sequence and the upstream 5kb and downstream 5kb of the DNA sequence, or a fragment thereof, or one or more CpG dinucleotides thereof, and the DNA sequence includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 in the above gene sequence , 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46 or 47 species, For example, at least 6, at least 7, at least 8 or at least 9.
  • the DNA sequence includes 1, 2, 3, 4, 5, 6, 7 or 8 selected from the group consisting of CACNA1E, PITX2, CRHBP, TBX20, SORCS3, B3GAT1, GLT1D1 and LRRK1, any The selection also includes one or more or all of the other gene sequences in (p). In one or more embodiments, the DNA sequence includes selected from the group consisting of TTLL10, ACTR3B, 1, 2, 3, 4, 5 or 6 of BARX1, CUX2, DNM2 and SIM2, optionally also including one, more or all of the other gene sequences in (p).
  • the DNA sequence includes 1, 2, 3, 4, 5, 6 or 7 selected from UBE2F, HMX1, IRX4, IRX1, VIPR2, OSR2 and MYO15B, optionally also including ( One or more or all of the other gene sequences in p).
  • the present invention provides uses and methods of these markers and their detection reagents in screening colorectal cancer risk, diagnosing colorectal cancer, and evaluating colorectal cancer prognosis.
  • colonrectal cancer as used in this application has its ordinary meaning in the art and includes tumors present in the colon, rectum, and/or appendix.
  • the properties of colorectal cancer are related to methylation of fragments of the above-mentioned genes.
  • Such fragments may be derived from one or more of the gene sequences described.
  • the length of the fragment is 1bp-1kb, preferably 1bp-700bp; the fragment includes one or more methylation sites in the chromosomal region of the corresponding gene.
  • the fragment is, for example, the promoter region of the above-mentioned gene.
  • the DNA sequence 1k bp upstream and 200bp downstream of the transcription start site (TSS) is defined as the promoter region. If a gene has multiple transcripts (that is, multiple promoter regions), any of the promoter regions can be selected.
  • the fragment detected contains at least 3 CpG dinucleotides. Therefore, further, the properties of colorectal cancer are related to the methylation level of the fragments shown in SEQ ID NO: 1-47 of each gene shown in Table 5.
  • colonal cancer-related sequences include any of the above 47 genes, sequences within 20kb upstream or downstream (preferably within 5kb), or their fragments, or the above 47 sequences (SEQ ID NO: 1- 47) or any combination of its complementary sequences.
  • sequences of the above genes in the Hg19 genome, as well as the sequences of 20 kb upstream or downstream of each gene, can be obtained in public databases (such as the NCBI website).
  • gastric cancer methylation markers described in this application are selected from any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47 or all 48:
  • chr6 391738: 391938 (SEQ ID NO: 51) and its sequence within 5kb upstream and/or within 5kb downstream;
  • chr1 119532788: 119532988 (SEQ ID NO: 91) and its sequence within 5kb upstream and/or within 5kb downstream;
  • one or more gastric cancer methylation markers described in the application include: chr17:76929754:76929954 (SEQ ID NO:50) and sequences within 5kb upstream and/or within 5kb downstream; Contains chr7: 8482114: 8482413 (SEQ ID NO: 55) and its sequence within 5kb upstream and/or within 5kb downstream; contains chr8: 143613755: 143613955 (SEQ ID NO: 60) and its sequence within 5kb upstream and/or 5kb downstream.
  • one or more gastric cancer methylation markers described in the application include: chr11:11600237:11600617 (SEQ ID NO:40) and sequences within 5kb upstream and/or within 5kb downstream; Contains chr2: 177030134: 177030449 (SEQ ID NO: 53) and its sequence within 5kb upstream and/or within 5kb downstream; contains chr7: 35301095: 35301411 (SEQ ID NO: 54) and its sequence within 5kb upstream and/or 5kb downstream Sequences within; including chr7: 8482114: 8482413 (SEQ ID NO: 55) and sequences within 5kb upstream and/or within 5kb downstream; including chr12: 113901298: 113901498 (SEQ ID NO: 59) and sequences within 5kb upstream and/or /or the sequence within 5kb downstream; including chr7: 107499318: 107499518 (SEQ ID NO
  • one or more gastric cancer methylation markers described in the application include: chr17:76929754:76929954 (SEQ ID NO:50) and sequences within 5kb upstream and/or within 5kb downstream; Contains chr8: 143613755: 143613955 (SEQ ID NO: 60) and its sequence within 5 kb upstream and/or within 5 kb downstream; contains chr8: 20375580: 20375780 (SEQ ID NO: 61) and its sequence within 5 kb upstream and/or 5 kb downstream.
  • one or more gastric cancer methylation markers described in the application include: chr17:76929754:76929954 (SEQ ID NO:50) and sequences within 5kb upstream and/or within 5kb downstream; Contains chr8: 143613755: 143613955 (SEQ ID NO: 60) and its sequence within 5kb upstream and/or within 5kb downstream; contains chr16: 82660460: 82660774 (SEQ ID NO: 74) and its sequence within 5kb upstream and/or 5kb downstream Sequences within; including chr10: 123923943: 123924143 (SEQ ID NO: 77) and sequences within 5kb upstream and/or within 5kb downstream; and sequences containing chr6: 108488634: 108488917 (SEQ ID NO: 82) and within 5kb upstream and/or sequences within 5 kb downstream.
  • the Hg coordinate region of one or more gastric cancer methylation markers described in this application is selected from any one or a combination of any more of the following sequences: chr7:35301095:35301411 (SEQ ID NO. : 54) and its sequence within 5kb upstream and/or within 5kb downstream; including chr8: 20375580: 20375780 (SEQ ID NO: 61) and its sequence within 5kb upstream and/or within 5kb downstream; including chr5: 92906255: 92906617 (SEQ ID NO: 69) and its sequence within 5kb upstream and/or within 5kb downstream; including chr7: 73407894: 73408161 (SEQ ID NO: 73) and its sequence within 5kb upstream and/or within 5kb downstream; including chr6 : 108488634: 108488917 (SEQ ID NO: 82) and its sequence within 5kb upstream and/or within 5kb downstream
  • the gastric cancer methylation markers described in the present application include within 3kb, preferably within 2kb, more preferably within 1kb, and more preferably 500bp upstream of each starting site of each sequence of SEQ ID NO: 48-95. Within, more preferably within 300 bp, more preferably within 100 bp of the sequence and/or the downstream sequence of each terminal site within 3 kb, preferably within 2 kb, preferably within 1 kb, preferably within 500 bp, preferably within 300 bp, preferably within 100 bp.
  • the gastric cancer methylation marker described in the present application is a gene sequence containing any of the above-mentioned SEQ ID NO: 48-95 sequences and a length of within 1000 bp, preferably within 600 bp, and more preferably within 400 bp.
  • the gastric cancer methylation marker described in this application is selected from any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 in SEQ ID NO: 48-95 ,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36 , 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47 or all 48 sequences.
  • the gastric cancer methylation markers described in this application include: SEQ ID NO:50, SEQ ID NO:55, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:69, SEQ ID NO:72, SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:84, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94 and SEQ ID NO: 95.
  • the gastric cancer methylation markers described in the application include: SEQ ID NO: 49, SEQ ID NO: 53, SEQ ID NO: 54, SEQ ID NO: 55, SEQ ID NO: 59, SEQ ID NO:62, SEQ ID NO:66, SEQ ID NO:74, SEQ ID NO:75, SEQ ID NO:79, SEQ ID NO:80, SEQ ID NO:83, SEQ ID NO:84, SEQ ID NO :87, SEQ ID NO:89, SEQ ID NO:90, SEQ ID NO:91, SEQ ID NO:93 and SEQ ID NO:95.
  • the gastric cancer methylation markers described in this application include: SEQ ID NO: 50, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 67, SEQ ID NO: 69, SEQ ID NO:75, SEQ ID NO:77 and SEQ ID NO:83.
  • the gastric cancer methylation markers described in the application include: SEQ ID NO: 50, SEQ ID NO: 60, SEQ ID NO: 74, SEQ ID NO: 77 and SEQ ID NO: 82.
  • the gastric cancer methylation markers described in this application include any one or a combination of any multiple of the following sequences: SEQ ID NO: 54, SEQ ID NO: 61, SEQ ID NO: 69, SEQ ID NO: 73, SEQ ID NO: 82, SEQ ID NO: 85, SEQ ID NO: 87, SEQ ID NO: 90, SEQ ID NO: 94 and SEQ ID NO: 95.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 87, and any one or more of SEQ ID NO: 1-39 and 448-95.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 94, and any one or more of SEQ ID NO: 1-46 and 48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 90, and any one or more of SEQ ID NO: 1-42 and 44-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 73, and any one or more of SEQ ID NO: 1-25 and 27-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 82, and any one or more of SEQ ID NO: 1-34 and 36-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 61, and any one or more of SEQ ID NO: 1-13 and 15-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 85, and any one or more of SEQ ID NO: 1-37 and 39-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 67, and any one or more of SEQ ID NO: 1-21 and 23-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 54, and any one or more of SEQ ID NO: 1-6 and 8-48.
  • the gastric cancer methylation markers described in this application include SEQ ID NO: 95, and any one or more of SEQ ID NO: 1-47.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining a DNA region selected from the chromosome range numbers 1 to 43 of the present application, or a complementary region thereof, or a modification of the above fragment in a sample to be tested The presence and/or content of a state. For example, the method of the present application determines one or more DNA regions selected from the chromosome range numbers 1 to 43 of the present application in the sample to be tested.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 1, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, or 43 are selected from the chromosome range number 1 of the present application to 43 DNA regions.
  • One or more of the DNA regions of the above chromosome range numbers 1 to 43 may be methylation markers of esophageal cancer.
  • This application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining a DNA region within 5k bp upstream or downstream of any one of SEQ ID NO: 96 to 138 in a sample to be tested, or The presence and/or content of the complementary region, or the modified state of the above-mentioned fragments.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 , 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, or 43 selected from SEQ ID NO: 96 to 138
  • the above-mentioned one or more DNA regions or fragments within 5k bp upstream or downstream shown in any one of SEQ ID NO: 96 to 138 can be methylation markers of esophageal cancer.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, which includes determining a region selected from a sample to be tested within 5k bp upstream or downstream of SEQ ID NO: 105 and genes numbered 1 to 76 of the present application. The presence and/or content of the modification state of the DNA region or fragment thereof.
  • the method of the present application determines 1, 2, 3, 4, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45 , 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62 DNA regions where 1, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, or 76 genes are located, or the presence and/or content of modified states of fragments thereof.
  • the above-mentioned regions selected within 5 kbp upstream or downstream of SEQ ID NO: 105 and the DNA regions or fragments where the genes numbered from 1 to 76 of this application are located can be methylation markers of esophageal cancer.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining the chromosome range numbers 3, 5, 7, 8, 12, 13, 14, 16, 17, Modification of the DNA regions 18, 19, 21, 22, 26, 28, 29, 30, 32, 33, 35, 36, 38, 39, 40, 41, 42 and 43, or their complementary regions, or the above fragments
  • the presence and/or content of a state For example, the method of the present application determines one or more DNA regions selected from the chromosome range numbers 1 to 27 of the present application in the sample to be tested.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, or 27 selected from the chromosome range of the present application DNA regions numbered 1 to 27.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining that a sample to be tested is selected from SEQ ID NO: 98, 100, 102, 103, 107, 108, 109, 111, 112, 113, The DNA region within 5k bp upstream or downstream shown in any one of 114, 116, 117, 121, 123, 124, 125, 127, 128, 130, 131, 133, 134, 135, 136, 137 and 138, or The presence and/or content of the complementary region, or the modified state of the above-mentioned fragments.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, or 27 selected from SEQ ID NO: 1 To the DNA region within 5k bp upstream or downstream of any of the items shown in 27.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining the presence and/or content of a DNA region in a sample to be tested, or a modification state of a fragment thereof, of a gene selected from the following group: IRF2BP2, LZTS1 ⁇ DMRTA2 ⁇ CLVS1 ⁇ CSNK2A3 ⁇ HOXD12 ⁇ FAM109A ⁇ HOXD13 ⁇ FKBP4 ⁇ TOMM20 ⁇ AVPR1A ⁇ ELAVL4 ⁇ CARKD ⁇ GALNT18 ⁇ DLL4 ⁇ CUX2 ⁇ NR2F2 ⁇ CACNA1C ⁇ ZSCAN10 ⁇ CARS2 ⁇ MTSS1L ⁇ VPS18 ⁇ TIMP2 ⁇ IL32 ⁇ TBCD ⁇ VAC14 ⁇ RAB3D, LGALS3BP, HOXD10, ZNF750, HOXD1, HOXD11, FAM150B, HOXD4, TNFRSF6B, TMEM18, ETV5, ARFRP1, BDH1, DGKG
  • this The applied method determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 selected from the above genes in the sample to be tested 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, which includes determining the chromosome range numbers 1, 2, 4, 6, 9, 10, 11, 15, 20, The presence and/or content of the modification state of the DNA regions 23, 24, 25, 26, 27, 30, 31, 34, 35, 37, 38, 39, 40, and 42, or their complementary regions, or the above fragments .
  • the method of the present application determines one or more DNA regions selected from the chromosome range numbers 1 to 23 of the present application in the sample to be tested.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 DNA regions or fragments selected from the chromosome range numbers 1 to 23 of the present application.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining a sample to be tested selected from SEQ ID NO: 96, 97, 99, 101, 104, 105, 106, 110, 115, 118, The DNA region within 5k bp upstream or downstream shown in any one of 119, 120, 121, 122, 125, 126, 129, 130, 132, 133, 134, 135 and 137, or its complementary region, or the above fragment The presence and/or content of the modified state.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, or 23 selected from the upstream or downstream 5k shown in any one of SEQ ID NO: 96 to 118 DNA regions or fragments within bp.
  • the present application provides a method for assessing the presence and/or progression of esophageal cancer, comprising determining a region in a sample to be tested selected from the group consisting of SEQ ID NO: 111 within 5k bp upstream or downstream, and CSF1, DNM2, EPS8L3, RAB3D, RER1, RPL18, SKI, DBP, ARHGEF16, HOXD1, PRDM16, HOXD4, RNF207, PDCD1, ICMT, EP300, TBX5, RBX1, TBX3, ETV5, CHFR, DGKG, ZNF605, SLC2A9, DIO3, DRD5, ENSG00000269375, PCDHGC5, CTU2, DIAPH1, The presence and/or content of the modification status of the DNA region where RNF166, MPC1, MBP, RPS6KA2, ZNF236, ELN, ICAM5, WBSCR28, ZGLP1, and LZTS1 are located, or their fragments.
  • the method of the present application determines 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, and 5 genes selected from the region within 5 k bp upstream or downstream of SEQ ID NO: 111 in the sample to be tested, as well as the above-mentioned genes provided by the application. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 , 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34 The presence and/or content of the modification state of the DNA region where the 35, 36, 37, 38, 39, or 40 genes are located, or its fragments.
  • the present application provides a method for assessing the presence and/or progression of liver cancer, comprising determining within 5k bp upstream or downstream of the chromosome range numbers 44 to 245 (SEQ ID NO: 139-340) selected from the present application in a sample to be tested.
  • the method of the present application determines that one or more DNA regions in the sample to be tested are selected from the DNA region within 5k bp upstream or downstream of the chromosome range numbers 44 to 245 (SEQ ID NO: 139-340) of the present application.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 , 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 150, 200, or 202 are selected from DNA regions within 5k bp upstream or downstream of chromosome range numbers 44 to 245 (SEQ ID NO: 139-340) of the present application.
  • the DNA region within 5k bp upstream or downstream of chromosome range numbers 44 to 245 (SEQ ID NO: 139-340), or its complementary region or fragment can be a liver cancer methylation marker.
  • This application provides a method for assessing the presence and/or progression of liver cancer, which includes determining a DNA region within 5k bp upstream or downstream of any one of SEQ ID NO: 139 to 340 in a sample to be tested, or its The presence and/or content of the complementary region, or modification state of the above-mentioned fragments.
  • the method of this application determines 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 in the sample to be tested , 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 , 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 150, 200, or 202 are selected from the DNA region within 5k bp upstream or downstream of any one of SEQ ID NO: 139 to 340.
  • the DNA region selected from the upstream or downstream 5k bp shown in any one of SEQ ID NO: 139 to 340 or its complementary region or fragment can be a liver cancer methylation marker.
  • the present application provides a method for assessing the presence and/or progression of liver cancer, which includes determining the DNA region within 5k bp upstream or downstream of genes numbered 77 to 354 in the application, or a fragment thereof in a sample to be tested. The presence and/or content of the modified state. For example, the method of the present application determines 1, 2, 3, 4, 5, 6, 7, 8, and 9 genes selected from the gene numbers 77 to 354 of the present application in the sample to be tested.
  • the marker may be derived from a biological sample of any individual of interest.
  • the term "individual” includes humans and non-human animals. Non-human animals include all vertebrate animals, such as mammals and non-mammals. An “individual” may also be a domestic animal, such as cattle, pigs, sheep, poultry, and horses; or a rodent, such as a rat, a mouse; or a non-human primate, such as an ape, monkey, or rhesus monkey; or a domesticated Animals, such as dogs or cats. In some embodiments, the individual is a human or non-human primate. In some embodiments, the individual is a human. In this application, “individual,” “subject,” and “subject” are used interchangeably.
  • sequences given in Part I "Markers" above are human sequences.
  • existing techniques can be used to easily determine the corresponding positions and corresponding sequences of the above-mentioned genes in the genome of the non-human animal.
  • sample refers to a biological composition obtained or derived from an individual that contains a material to be characterized based on physical, biochemical, chemical and/or physiological characteristics. or cells and/or other molecular entities (such as DNA) to be identified.
  • Biological samples include, but are not limited to, cells, tissues, organs and/or biological fluids of an individual obtained by any method known to those skilled in the art.
  • the sample or biological sample to be tested is selected from the group consisting of histological sections, tissue biopsies, paraffin-embedded tissues, body fluids, surgical resection samples, isolated blood cells, cells isolated from blood, and random combination.
  • the body fluid is selected from the group consisting of whole blood, serum, plasma, and any combination thereof. Choosing the most appropriate sample will depend on the nature of the situation.
  • the sample or biological sample to be tested is whole blood of an individual.
  • the sample or biological sample to be tested is an individual's plasma.
  • plasma is obtained by centrifuging whole blood from an individual one, two, three, four, five or more times.
  • the sample or biological sample to be tested is a gastric cancer biopsy.
  • the DNA to be detected can be isolated from the biological sample.
  • the DNA to be detected can be isolated and purified from the biological sample using various methods known in the art. Isolation and purification can be performed using commercially available kits. For example, DNA is isolated from cells and tissues by cleavage of raw materials under highly denaturing and reducing conditions, partial use of protein-degrading enzymes, purification of nucleic acid components obtained by phenol/chloroform extraction processes, and removal of DNA from water by dialysis or ethanol precipitation. Nucleic acids are recovered in the phase (see, eg, Sambrook, J., Fritsch, E. Fin. T. Maniatis, C S H, Molecular Cloning, 1989).
  • reagent systems that are particularly suitable for purifying DNA fragments from agarose gels, isolating plasmid DNA from bacterial lysates, and isolating longer nucleic acids (genomic DNA, total cellular RNA) from blood, tissue, or cell culture. ).
  • Many of these commercially available purification systems are based on the fairly well-known principle of binding nucleic acids to mineral supports in the presence of solutions of different chaotropic salts. In these systems, suspensions of finely ground glass powder, diatomaceous earth or silica gel are used as carrier materials.
  • Some other methods of isolating and purifying DNA from biological samples are described, for example, in US7888006B2 and EP1626085A1. Choosing between methods will be influenced by several factors, including time, cost, and the amount of DNA required.
  • the DNA contained in the sample or biological sample to be tested includes genomic DNA.
  • genomic DNA refers to DNA comprising the complete genome of a cell or organism and fragments or portions thereof. Genomic DNA is a large segment of DNA derived from an individual (e.g., longer than about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, or 300 kb) and may have natural modifications, such as DNA methylation .
  • the DNA contained in the sample or biological sample to be tested includes cellular DNA.
  • cellular DNA refers to DNA present within a cell, or DNA obtained from cells in vivo and isolated in vitro, or otherwise manipulated in vitro, as long as the DNA is not removed from cells in vivo.
  • the DNA contained in the sample or biological sample to be tested includes extracellular free DNA.
  • extracellular free DNA refers to DNA fragments that exist outside cells in the body. The term may also be used to refer to DNA fragments obtained from extracellular sources in vivo and isolated, or manipulated in vitro. DNA fragments in extracellular free DNA usually have a length of about 100 to 200 bp, which is presumably related to the length of the DNA fragment wrapped in nucleosomes.
  • Extracellular cell-free DNA includes, for example, extracellular cell-free fetal DNA and circulating tumor DNA.
  • Extracellular cell-free fetal DNA circulates in the body (eg, blood) of pregnant women and represents the fetal genome, while circulating tumor DNA circulates in the body (eg, blood) of cancer patients.
  • the extracellular cell-free DNA may be substantially free of an individual's cellular DNA.
  • the extracellular free DNA can comprise less than about 1,000 ng/mL, less than about 100 ng/mL, less than about 10 ng/mL, less than about 1 ng/mL of cellular DNA.
  • Extracellular cell-free DNA can be prepared using conventional techniques known in the art.
  • a blood sample can be centrifuged at about 200-20,000g, about 200-10,000g, about 200-5,000g, about 300-4000g, etc. for about 3-30 minutes, about 3-15 minutes, about 3-10 minutes, etc. , about 3-5 minutes to obtain extracellular DNA from blood samples.
  • cell-free DNA of a blood sample can be obtained by centrifuging an individual's plasma or serum one, two, three, four, five or more times.
  • the biological sample can be obtained by microfiltration in order to separate cells and their fragments from cell-free components containing soluble DNA.
  • microfiltration can be performed by using a filter, for example, a 0.1-0.45 micron membrane filter, such as a 0.22 micron membrane filter.
  • extracellular cell-free DNA is extracted from whole blood, serum, or plasma for analysis using commercially available DNA extraction products.
  • This extraction method is claimed to have high recovery rates (>50%) of circulating DNA, and some products (such as the QIAamp Circulating Nucleic Acid Kit produced by Qiagen) are claimed to extract small-sized DNA fragments.
  • Typical sample volumes used are 1-5 mL of serum or plasma.
  • extracellular cell-free DNA includes circulating tumor DNA.
  • Circulating tumor DNA (“ctDNA”) is tumor-derived fragmented DNA in cell-independent body fluids (e.g., blood, urine, saliva, sputum, feces, pleural fluid, cerebrospinal fluid, etc.).
  • ctDNA is highly fragmented, with an average length of approximately 150 base pairs.
  • ctDNA typically includes a very small fraction of extracellular cell-free DNA in body fluids (eg, plasma), eg, ctDNA may constitute less than about 10% of plasma DNA. Typically, this percentage is less than about 1%, such as less than about 0.5% or less than about 0.01%.
  • the total amount of plasma DNA is usually very low, such as approximately 10 ng/mL plasma.
  • the amount of ctDNA varies from person to person and depends on the type of tumor, its location, and, for cancerous tumors, the stage of the cancer.
  • ctDNA is usually very rare in body fluids and can only be detected with extremely sensitive and specific techniques. Detection of ctDNA may help detect and diagnose tumors, guide tumor-specific treatment, monitor treatment, and monitor cancer response.
  • DNA methylation is the biological process of adding a methyl group to a DNA molecule (eg, to one or more cytosine bases of the DNA molecule) (eg, through the action of DNA methyltransferases).
  • DNA methylation occurs at the 5' position of cytosine-phosphate-guanine (CpG) dinucleotides (i.e., "CpG sites") when it occurs at the promoter or first site of a gene.
  • CpG sites cytosine-phosphate-guanine
  • the 5'-CpG-3' dinucleotide in the exon will cause epigenetic inactivation of the gene.
  • DNA methylation has been well documented to play an important role in regulating gene expression, tumorigenesis, and other genetic and epigenetic diseases.
  • methylated cytosine residue refers to a derivative of a cytosine residue in which a methyl group is attached to a carbon atom of the cytosine ring (eg, C5).
  • unmethylated cytosine residue refers to an underivatized cytosine residue in which, as opposed to a "methylated cytosine residue", there is no carbon atom on the cytosine ring (e.g., C5) Methyl linkage.
  • CpG sites in which the cytosine residues are methylated are methylated CpG sites, while CpG sites in which the cytosine residues are not methylated are unmethylated CpG sites .
  • conversions can occur between bases of DNA or RNA.
  • Conversion refers to the use of non-enzymatic or enzymatic methods to process DNA to convert unmodified cytosine bases (cytosine, C) into substances that are not related to guanine (G). ) bonded bases (such as uracil (U)).
  • cytosine, C unmodified cytosine bases
  • G guanine
  • U uracil
  • Some reagents are able to distinguish between unmethylated and methylated CpG sites in DNA, thereby obtaining processed DNA. This reagent acts selectively on unmethylated cytosine residues but does not act significantly on methylated cytosine residues.
  • the reagent may act selectively on methylated cytosine residues without significantly acting on unmethylated cytosine residues.
  • some reagents can selectively convert unmethylated cytosine residues to uracil, thymine, or hybridize to cytosine. different bases, while the methylated cytosine residue remains in an unconverted state; for another example, some reagents can selectively cleave methylated residues, or selectively cleave unmethylated residues. base.
  • the original DNA is converted into treated DNA in a manner that depends on whether it is methylated, so that the treated DNA can be distinguished from the original DNA by its hybridization behavior.
  • processed DNA refers to CpG sites that have been used to distinguish between unmethylated and methylated CpG sites in DNA, nucleic acid sequences, and gene fragments. DNA, nucleic acid sequences, and gene fragments treated with point reagents.
  • cytosine conversion can be performed using non-enzymatic or enzymatic methods.
  • non-enzymatic methods include bisulfite or bisulfate treatment.
  • reagents used in non-enzymatic methods include bisulfite reagents.
  • bisulfite reagent refers to, for example, reagents including bisulfite, bisulfite, and reagents disclosed herein that can be used to distinguish between methylated and unmethylated CpG dinucleotide sequences. ions or any combination thereof.
  • bisulfite reaction or “bisulfite treatment” referring to the reaction that converts unmethylated cytosine residues, specifically In the presence of bisulfite ions, unmethylated cytosine residues in nucleic acids are converted into uracil bases, thymine bases, or other bases that are different from cytosine in hybridization behavior, and Among them, methylated cytosine residues are not significantly converted. In other words, bisulfite treatment can be used to distinguish methylated CpG dinucleotides from unmethylated CpG dinucleotides.
  • methylated cytosine residues are not significantly converted does not exclude a very small percentage (e.g., less than 0.1%, less than 0.2%, less than 0.3%, less than 0.4%, less than 0.5%, less than 0.6 %, less than 0.7%, less than 0.8%, less than 0.9%, less than 1%, less than 2%, less than 3%, less than 4%, less than 5%, less than 6%, less than 7%, less than 8%, less than 9%, less than 10%, less than 11%, less than 12%, less than 13%, less than 14%, less than 15%, less than 16%, less than 17%, less than 18%, less than 19%, less than 20%) methylated cells Pyrimidine residues are converted to uracil, thymine, or other bases that differ in hybridization behavior from cytosine, although it is intended to convert only unmethylated cytosine residues.
  • the bisulfite reagent is selected from the group consisting of ammonium bisulfite, sodium bisulfite, potassium bisulfite, calcium bisulfite, magnesium bisulfite, aluminum bisulfite, sulfurous acid Hydrogen ions, and any combination thereof.
  • the bisulfite reagent is sodium bisulfite.
  • bisulfite reagents are commercially available, for example, MethylCodeTM Bisulfite Conversion Kit, EpiMarkTM Bisulfite Conversion Kit, EpiJETTM Bisulfite Conversion Kit, EZDNAMethylation-GoldTM Kit, etc. In some embodiments, Carry out the bisulfite reaction according to the kit's instructions.
  • Exemplary enzymatic methods include deaminase treatment and the use of reagents to selectively cleave unmethylated residues but not methylated residues, or to selectively cleave methylated residues but not methylated residues. Unmethylated residues.
  • the reagent is a methylation sensitive restriction enzyme (MSRE).
  • methylation-sensitive restriction enzyme refers to an enzyme that selectively digests nucleic acids based on the methylation status of its recognition site. For restriction enzymes that specifically cleave when the recognition site is unmethylated or hemimethylated, cleavage does not occur or cleaves with significantly reduced efficiency when the recognition site is methylated. . For restriction enzymes that specifically cleave when the recognition site is methylated, when the recognition site is not methylated, cleavage does not occur, or it cleaves with significantly reduced efficiency.
  • the recognition sequence of the methylation-sensitive restriction enzyme contains a CG dinucleotide (eg, cgcg or cccggg). In some embodiments, the methylation-sensitive restriction enzyme does not cleave when the cytosine in the CG dinucleotide is methylated at the C5 carbon atom.
  • Exemplary MSREs are selected from the group consisting of: HpaII enzyme, SalI enzyme, enzyme, ScrFI enzyme, BbeI enzyme, NotI enzyme, SmaI enzyme, XmaI enzyme, MboI enzyme, BstBI enzyme, ClaI enzyme, MluI enzyme, NaeI enzyme, NarI enzyme, PvuI enzyme, SacII enzyme, HhaI enzyme and any combination thereof.
  • a methylation-sensitive restriction enzyme capable of distinguishing between methylated CpG dinucleotides and unmethylated CpG dinucleotides within the region of interest, using methods known in the art.
  • a series of restriction enzyme reagents are used to determine methylation, such as, but not limited to, differential methylation hybridization ("DMH").
  • DNA in a biological sample can be cleaved prior to treatment with methylation-sensitive restriction enzymes.
  • methylation-sensitive restriction enzymes Such methods are known in the art and may include both physical and enzymatic means. Particularly preferred is the use of one or more restriction enzymes that are insensitive to methylation and whose recognition sites are AT-rich and do not contain CG dinucleotides. The use of such enzymes allows CpG sites and CpG-rich regions in DNA fragments to be preserved.
  • such restriction enzymes are selected from the group consisting of Msel enzyme, BfaI enzyme, Csp6I15 enzyme, Tru1I enzyme, Tru9I enzyme, MaeI enzyme, XspI enzyme, and any combination thereof.
  • Transformed DNA is optionally purified.
  • DNA purification methods suitable for use herein are well known in the art.
  • the detection reagents and diagnostic kits described in this application can be used to detect the methylation status or methylation level.
  • methylation status refers to the presence or absence of one or more methylated nucleotide bases in a nucleic acid molecule.
  • a nucleic acid molecule containing methylated cytosine is considered methylated (eg, the methylation state of the nucleic acid molecule is methylated).
  • Nucleic acid molecules that do not contain any methylated nucleotides are considered unmethylated.
  • a nucleic acid may be characterized as "unmethylated” if it is not methylated at a particular locus (eg, the locus of a particular single CpG dinucleotide) or a particular combination of loci, even if It is methylated at other loci in the same gene or molecule, as well.
  • methylation status describes the state of methylation of a nucleic acid, such as a genomic sequence or a marker of interest as described herein, a DNA region, or a fragment thereof.
  • methylation status refers to the methylation-related characteristics of a nucleic acid segment at a specific genomic locus. Such characteristics include, but are not limited to, whether any cytosine (C) residue within the DNA sequence is methylated, the location of one or more methylated C residues, methylation throughout any specific region of the nucleic acid Frequency or percentage of C and allelic differences in methylation due to, for example, differences in allelic origins.
  • C cytosine
  • Methods refers to the relative concentration, absolute concentration, or pattern of methylated C or unmethylated C throughout any particular region of nucleic acid in a biological sample. For example, if one or more cytosine (C) residues within a nucleic acid sequence are methylated, it may be said to be “hypermethylated” or have “increased methylation,” whereas if within a DNA sequence If one or more cytosine (C) residues are unmethylated, they may be said to be “demethylated” or have “reduced methylation.” Likewise, if one or more cytosine (C) residues within a nucleic acid sequence are methylated compared to another nucleic acid sequence (e.g.
  • the sequence is considered to be different from the other nucleic acid sequence.
  • the nucleic acid sequence is hypermethylated or has increased methylation.
  • one or more cytosine (C) residues within a DNA sequence are unmethylated compared to another nucleic acid sequence (e.g. from a different region or from a different individual, etc.)
  • the sequence is considered to be different from the other.
  • the nucleic acid sequence is demethylated or has reduced methylation.
  • Methylation level represents the proportion (or percentage, fraction, ratio, degree) of one or more sites in a methylated state.
  • the methylation level of a region (or a group of sites) is the average of the methyl levels of all sites in the region (or of all sites in the group). Therefore, an increase or decrease in methylation levels in a region does not mean an increase or decrease in methylation levels at all methylation sites in the region.
  • the process of converting results from methods of detecting DNA methylation (eg, simplified methylation sequencing) into methylation levels is known in the art. Methylation levels can be determined, for example, by quantitative analysis of the amount of intact DNA present after restriction digestion with methylation-sensitive restriction enzymes.
  • methylation levels as in the above example can be used as a quantitative indicator of methylation status. This is particularly useful when the methylation levels of sequences in a sample need to be compared to threshold levels.
  • the methylation level/status of one or more CpG dinucleotide sequences within a DNA sequence can It can be determined by various analysis methods known in the art, preferably quantitative analysis methods.
  • Exemplary analytical methods include: polymerase chain reaction, including real-time polymerase chain reaction, digital polymerase chain reaction, and bisulfite conversion-based PCR (e.g., methylation-specific PCR). , MSP)) and its sequences within 5kb upstream and/or within 5kb downstream; nucleic acid sequencing; whole-genome methylation sequencing (RRBS) and its sequences within 5kb upstream and/or within 5kb downstream; simplified methylation sequencing; Mass-based separation (e.g.
  • detection includes detection of either strand at a gene or locus.
  • quantitative analysis is performed by real-time PCR.
  • real-time PCR include HeavyMethylTM PCR described by Cottrell et al., Nucl. Acids Res. 32:e10, 2003; MethyLight TM PCR described by Eads et al., Cancer Res. 59:2302-2306, 1999; Rand Headloop PCR described by et al., Nucl. Acids Res. 33:e 127, 2005.
  • HeavyMethyl TM PCR refers to an art-recognized real-time PCR technique in which one or more non-extensible nucleic acid (e.g., oligonucleotide) blocks are combined with subunits in a methylation-specific manner.
  • Bisulfate-treated nucleic acid binding i.e., blocker specifically binds to unmutated DNA under moderate to high stringency conditions.
  • the amplification reaction is performed using one or more primers, which may optionally be methylation specific but flanked by one or more blockers.
  • the blocker binds and no PCR product is produced.
  • the methylation level of the nucleic acids in the sample is determined using a TaqManTM assay substantially as described, for example, in Holland et al., Proc. Natl. Acad. Sci. USA, 88:7276-7280, 1991.
  • Methods of Methods of Methods of Methods of Methods of Methods of Methods refers to an art-recognized fluorescence-based real-time PCR technology employing dual-labeled fluorescent oligonucleotide probes called TaqMan TM probes and designed to Hybridizes to CpG-rich sequences located between the forward and reverse amplification primers.
  • the TaqMan TM probes contain a fluorescent "reporter moiety” and "quencher moiety” covalently bound to a linker moiety (eg, phosphoramidite) linked to the nucleotide of the TaqMan TM oligonucleotide.
  • TaqMan TM probes that hybridize to CpG-rich sequences are cleaved by the 5' nuclease activity of Taq polymerase, thereby generating a signal that is detected in a real-time manner during the PCR reaction.
  • molecular beacons can be used as detectable probes, and the system does not rely on the 5'-3' exonuclease activity of the DNA polymerase used (see Mhlanga and Malmberg, Methods 25: 463-471, 2001).
  • Headloop PCR refers to an art-recognized type of real-time PCR that selectively amplifies target nucleic acids but inhibits non-target nucleic acids by extending the 3' stem loop to form a hairpin structure that cannot further provide a template for amplification. Amplification of target variants.
  • the real-time PCR is multiplex real-time PCR.
  • the term “multiple” may refer to the use of more than one marker, each marker having at least one different detection characteristic, such as a fluorescence characteristic (e.g., excitation wavelength, emission wavelength, emission intensity, FWHM (half maximum) full width at height) or fluorescence lifetime) or unique Nucleic acid or protein sequence characteristics, analysis or other analytical methods that can simultaneously determine the presence and/or amount of multiple markers (eg, multiple nucleic acid sequences).
  • a fluorescence characteristic e.g., excitation wavelength, emission wavelength, emission intensity, FWHM (half maximum) full width at height) or fluorescence lifetime
  • unique Nucleic acid or protein sequence characteristics analysis or other analytical methods that can simultaneously determine the presence and/or amount of multiple markers (eg, multiple nucleic acid sequences).
  • nucleic acid sequencing is performed by nucleic acid sequencing.
  • Exemplary methods of nucleic acid sequencing are known in the art, see, e.g., Frommer et al., Proc. Natl. Acad. Sci. USA 89:1827-1831, 1992; Clark et al., Nucl. Acids Res. 22: 2990-2997,1994.
  • identification of methyl groups in a DNA sequence can be facilitated by comparing a sequence obtained from a sample without bisulfite treatment or a known nucleotide sequence of a target region with a sequence obtained from a sample treated with bisulfite. Cytosine.
  • a thymine residue detected at any cytosine position in a bisulfite-treated sample compared to an untreated sample can be considered a mutation caused by bisulfite treatment, i.e., the presence of a thymine residue at that site Methylated cytosine.
  • Methods for sequencing DNA include, for example, the dideoxy chain termination method or the Maxam-Gilbert method (see Sambrook et al., Molecular Cloning, A Laboratory Manual (2nd Ed., CSHP, New York 1989)) , Pyrosequencing (see Uhlmann et al., Electrophoresis, 23:4072-4079, 2002), solid-phase pyrosequencing (see Landegren et al., Genome Res., 8(8):769-776, 1998), Solid-phase microsequencing (see, e.g., Southern et al., Genomics, 13:1008-1017, 1992), microsequencing using FRET (see, e.g., Chen and Kwok, Nucleic Acids Res. 25:347-353, 1997), Ligation sequencing or ultra-deep sequencing (see Marguiles et al., Nature 437(7057):376-80(2005)).
  • quantitative analysis is performed by mass-based separation (eg, electrophoresis, mass spectrometry).
  • mass-based separation eg, electrophoresis, mass spectrometry
  • COBRA bisulfite restriction analysis
  • This method utilizes restriction enzymes between methylated and unmethylated nucleic acids following treatment with compounds that selectively mutate unmethylated cytosine residues (e.g., bisulfite). Differences in recognition sites.
  • the restriction endonuclease Taq1 cleaves the sequence TCGA, which upon bisulfite treatment of unmethylated nucleic acids will be TTGA and therefore will not be cleaved.
  • the digested and/or undigested nucleic acids are then detected using detection means known in the art, such as electrophoresis and/or mass spectrometry.
  • detection means known in the art such as electrophoresis and/or mass spectrometry.
  • different techniques are used to detect nucleic acid differences in amplified products based on differences in nucleotide sequence and/or secondary structure after treatment with compounds that selectively mutate unmethylated cytosine residues, e.g. Methylation-specific single-strand conformation analysis (MS-SSCA) (Bianco et al., Hum.
  • MS-SSCA Methylation-specific single-strand conformation analysis
  • MS-DGGE methylation-specific denaturing gradient gel electrophoresis
  • MS-DHPLC methylation-specific denaturing high-performance liquid chromatography
  • quantitative analysis is performed by target capture (eg, hybridization, microarray).
  • target capture eg, hybridization, microarray
  • Suitable detection methods by hybridization are known in the art, such as Southern, dot blot, slot blot or other nucleic acid hybridization formats (Kawai et al., Mol. Cell. Biol. 14:7421-7427, 1994; Gonzalgo et al. al., Cancer Res. 57:594-599, 1997).
  • probes used in hybridization analysis are detectably labeled.
  • nucleic acid-based probes used in hybridization analysis are unlabeled.
  • Such unlabeled probes can be immobilized on a solid support such as a microarray and can hybridize to detectably labeled target nucleic acid molecules.
  • a microarray is a methylation-specific microarray, which Can be used to distinguish sequences with converted cytosine residues from sequences with unconverted cytosine residues (see Adorjan et al., Nucl. Acids Res., 30:e21, 2002).
  • Hybridization-based analysis can also be used on nucleic acids treated with methylation-sensitive restriction enzymes.
  • the methylation status of CpG dinucleotide sequences within a DNA sequence can be determined by oligonucleotide probes that hybridize to bisulfite-treated DNA simultaneously with a PCR amplification primer.
  • the primers may be methylation-specific primers or standard primers.
  • detection reagent is a reagent used to detect the presence, absence or amount of nucleic acid in a quantitative analysis step.
  • detection reagent is selected from the group consisting of fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.
  • quantitative analysis includes amplifying the treated DNA using a quantitative primer pair and a DNA polymerase.
  • quantitative primer pair refers to one or more primer pairs used in the quantitative analysis step.
  • the quantitative primer pair is capable of hybridizing to at least 9 consecutive nucleotides of the treated DNA under stringent conditions, moderate stringency conditions or high stringency conditions.
  • the quantitative analysis includes determining the presence or levels of one or more markers of interest based on the presence or levels of multiple CpG dinucleotides, TpG dinucleotides, or CpA dinucleotides in the treated DNA. Methylation levels. In some embodiments, the quantitative analysis includes determining the methylation level of cytosine residues based on the presence or level of one or more CpG dinucleotides in the treated DNA. In some embodiments, the quantitative analysis includes determining the methylation level of cytosine residues based on the presence or levels of one or more TpG dinucleotides in the treated DNA. In some embodiments, the quantitative analysis includes determining the methylation level of cytosine residues based on the presence of CpA dinucleotides in the treated DNA.
  • the quantitative analysis step is performed by dividing the processed DNA product into multiple components. In some embodiments, a plurality of different quantitative analysis tests are performed on a plurality of components, wherein the processed DNA product, if present in the component, is quantified in one of the plurality of components. Different combinations. In some embodiments, control markers in each fraction are quantified.
  • MSP Methylation-specific PCR: a novel PCR assay for methylation status of CpGislands. Proc Natl Acad Sci USA. 1996 September 3; 93 (18): 9821 -6 and United States Patent No. 6,265,171) quantitatively analyzed the methylation level of each target marker respectively. For example, by using one or more primers that specifically hybridize to unconverted sequences under moderate and/or high stringency conditions, an amplification product is produced only when the template contains methylated cytosines at CpG sites.
  • the quantitative primer pair is designed to amplify at least a portion of the processed DNA product, ie, the quantitative analysis is designed as a nested PCR.
  • Nested PCR is a modification of PCR designed to increase sensitivity and specificity. Nested PCR involves the use of two primer sets and two consecutive PCR reactions. A first round of amplification is performed to produce a first amplicon, and a second round of amplification is performed using a primer pair in which one or both primers anneal to sites within the region bounded by the initial primer pair, i.e., the second Primer pairs are considered "nested" within the first primer pair. In this way, it does not contain The background amplification product from the first PCR reaction of the correct internal sequence is not further amplified in the second PCR reaction.
  • the PCR reaction solution contains Taq DNA polymerase, PCR buffer, primers, probes, dNTPs, and Mg 2+ .
  • the Taq DNA polymerase is hot start Taq DNA polymerase.
  • the final concentration of Mg 2+ is 1.0-20.0mM; the concentration of each primer is 100-500nM; the concentration of each probe is 100-500nM.
  • Exemplary PCR reaction conditions are: pre-denaturation at 95°C for 5 minutes; denaturation at 95°C for 15 seconds, annealing and extension at 60°C for 60 seconds, 50 cycles.
  • the methods of the present application include a pre-amplification step.
  • One of the purposes of preamplifying a target marker is to increase the amount of the target marker in the processed DNA.
  • the term “amplification” generally refers to any process that results in an increase in the number of copies of a molecule or a group of related molecules.
  • “Amplification” when used with respect to a polynucleotide molecule refers to the production of multiple copies of a polynucleotide molecule or multiple copies of a portion of a polynucleotide molecule, usually starting from a small amount of polynucleotide, in which the amplified substance ( amplicons, PCR amplicons) are usually detectable.
  • Amplification of polynucleotides encompasses multiple chemical and enzymatic processes.
  • Forms of amplification include polymerase chain reaction (reverse transcription PCR, PCR), strand displacement amplification (SDA) reaction, transcription-mediated amplification (TMA) reaction, nucleic acid sequence-based amplification (NASBA) reaction or ligation Enzyme chain reaction (LCR) generates multiple copies of DNA from one or a few copies of a template RNA or DNA molecule.
  • the markers in the treated DNA can be pre-amplified with pre-amplification primers.
  • primer refers to a single-stranded oligonucleotide that is capable of polymerizing under appropriate conditions (e.g., buffer and temperature) with four different nucleoside triphosphates and reagents used for polymerization ( For example, DNA polymerase) serves as the starting point for template-directed DNA synthesis.
  • the length of the primer depends, for example, on the intended use of the primer, and is usually in the range of 15 to 30 nucleotides. Short primer molecules generally require lower temperatures to form sufficiently stable hybridization complexes with the template.
  • a primer site is the area on the template to which the primer hybridizes.
  • a primer pair is a set of primers that includes a 5' forward primer that hybridizes to the 5' end of the sequence to be amplified and a 3' reverse primer that hybridizes to the complementary strand at the 3' end of the sequence to be amplified.
  • Those skilled in the art can design primers based on the markers to be amplified based on common knowledge in the art (see, for example, PCR Primer: A Laboratory Manual, Cold Spring Harbor Laboratories, NY, 1995).
  • a primer designed for the purposes of the present invention may include at least one CpG site, or the amplification product obtained from the primer may include at least one CpG site.
  • Tools for designing primers for detecting DNA methylation status are also known in the art, such as MethPrimer (Li LC and Dahiya R. MethPrimer: designing primers for methylation PCRs. Bioinformatics.
  • any target marker (each at least a part of the target marker or a subregion of the target marker) in the processed DNA can be preamplified.
  • complementary refers to hybridization or base pairing between nucleotides or nucleic acids, for example, between two strands of a double-stranded DNA molecule, or primers on a single-stranded nucleic acid to be sequenced or amplified between the binding site and the oligonucleotide primer.
  • mutual The complementary nucleotides are usually A and T (or A and U), or C and G.
  • the nucleotides of one strand are optimally aligned, compared, and have appropriate nucleotide insertions or deletions, they are at least about 80% (usually at least about 90% to 95%, more preferably) consistent with those of the other strand.
  • RNA or DNA molecules Two single-stranded RNA or DNA molecules are said to be complementary if their nucleotide pairs are approximately 98% to 100%).
  • complementarity exists when an RNA strand or a DNA strand hybridizes to its complementary sequence under selective hybridization conditions.
  • selective hybridization will occur when there is at least about 65% (preferably at least about 75%, more preferably at least about 90%) complementarity over a stretch of at least 14 to 25 nucleotides. See M. Kanehisa, Nucleic Acids Res. 12:203 (1984), incorporated herein by reference.
  • the pre-amplification primer pool includes at least one methylation-specific primer pair. In some embodiments, the pre-amplification primer pool includes a plurality of methylation-specific primer pairs. In some embodiments, the pre-amplification step is performed by methylation-specific PCR ("MSP"), which is PCR using methylation-specific primers. This technique (i.e. MSP) has been described in Herman et al., (supra).
  • MSP methylation-specific PCR
  • methylation-specific primer pair refers to a primer pair specifically designed to recognize CpG sites to exploit differences in methylation to amplify a specific target marker in processed DNA.
  • Primers act only on molecules with or without a specific methylation state.
  • a primer can be an oligonucleotide that can specifically hybridize in a methylation-specific manner to a specific CpG site with methylation under stringent conditions, moderate stringency conditions, or high stringency conditions, but not to those without. Hybridization of methylated specific CpG sites. Therefore, the primers will specifically amplify target markers that are methylated at specific CpG sites.
  • the primer may be an oligonucleotide that specifically hybridizes to a specific unmethylated CpG site in a methylation-specific manner under stringent conditions, moderate stringency conditions, or high stringency conditions, but Cannot hybridize to methylated specific CpG sites. Therefore, the primers will specifically amplify target markers that are not methylated at specific CpG sites. Therefore, in the present application, using methylation-specific primers in the pre-amplification of at least one target marker within the treated DNA, methylated and unmethylated CpG sites can be distinguished.
  • Methylation-specific primer pairs of the present application comprise at least one primer that hybridizes to bisulfite-treated CpG dinucleotides.
  • sequence of the primer specific for methylated DNA contains at least one CpG dinucleotide
  • sequence of the primer specific for unmethylated DNA contains a "T” at the C position of the CpG, and/or contains "A” at the G position in the CpG.
  • a methylation-specific primer pair typically includes a forward primer and a reverse primer, each of which includes an oligonucleotide sequence that is consistent with one of the target markers (or a subunit of the target marker). region) hybridizes under stringent conditions, moderate stringency conditions, or high stringency conditions, wherein at least 9 consecutive nucleotides of one of the target markers (or a subregion of the target marker)
  • the acid contains at least one (eg 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more) CpG sites.
  • hybridization may refer to a process in which two single-stranded polynucleotides combine non-covalently to form a stable double-stranded polynucleotide.
  • the resulting double-stranded polynucleotide can be a "hybrid” or "double-stranded.”
  • the salt concentration in “hybridization conditions” is typically less than about 1M, often less than about 500mM and can be less than about 200mM.
  • “Hybridization buffer” includes buffered saline solutions, such as 5% SSPE, or other such buffers known in the art.
  • Hybridization temperatures can be as low as 5°C, but are usually above 22°C, and more typically above about 30°C, and often above 37°C.
  • Crossbreeding is usually Performed under stringent conditions, i.e., conditions under which the sequence will hybridize to its target sequence but not to other non-complementary sequences. Stringent conditions depend on the sequence and vary from case to case. For example, longer fragments may require higher hybridization temperatures than short fragments to hybridize specifically. Because other factors may affect the stringency of hybridization, including base composition and length of complementary strands, the presence of organic solvents, and the extent of base mismatching, a combination of parameters is more important than an absolute measurement of any one parameter alone.
  • Tm melting temperature
  • Other references eg, Allawi and SantaLucia, Jr., Biochemistry, 36:10581-94 (1997)
  • Other references include alternative calculation methods that take into account structural and environmental as well as sequence characteristics, etc. when calculating Tm.
  • hybrid stability is a function of ion concentration and temperature.
  • hybridization reactions are performed under lower stringency conditions and then washed in a different but higher stringency wash solution.
  • Exemplary stringent conditions include a pH of about 7.0 to about 8.3, a temperature of at least 25°C, and a sodium ion (or other salt) concentration of at least 0.01M to no more than 1M.
  • 5x SSPE 750mM NaCl, 50mM sodium phosphate, 5mM EDTA, pH 7.4
  • a temperature of approximately 30°C is suitable for allele-specific hybridization, although the appropriate temperature depends on the length and/or GC content of the hybridization region.
  • the "hybridization stringency" to determine the percentage of mismatches can be as follows: 1) High stringency: 0.1x SSPE, 0.1% SDS, 65°C; 2) Medium stringency (also called moderate stringency): 0.2 x SSPE, 0.1% SDS, 50°C; 3) Low stringency: 1.0x SSPE, 0.1% SDS, 50°C. It is understood that the same stringency can be achieved using alternative buffers, salts, and temperatures.
  • moderately stringent hybridization may refer to conditions that allow a nucleic acid molecule (eg, a probe) to bind to a complementary nucleic acid molecule.
  • Hybridizing nucleic acid molecules generally have at least 60% identity, including, for example, at least 70%, 75%, 80%, 85%, 90%, or 95% identity.
  • Moderately stringent conditions can be equivalent to the following conditions: 42°C, 50% formamide, 5x Denhardt's solution, 5x SSPE, 0.2% SDS for hybridization, followed by washing at 42°C, 0.2x SSPE, 0.2% SDS.
  • Highly stringent conditions can be provided by hybridizing at 42°C, 50% formamide, 5x Denhardt's solution, 5x SSPE, 0.2% SDS, followed by washing at 65°C, 0.1x SSPE and 0.1% SDS.
  • Low stringency hybridization can be equivalent to the following conditions: 22°C, 10% formamide, 5x Denhardt's solution, 6x SSPE, 0.2% SDS hybridization, followed by washes in 1x SSPE, 0.2% SDS at 37°C.
  • Denhardt's solution contains 1% polysucrose, 1% polyvinylpyrrolidone and 1% bovine serum albumin (BSA).
  • BSA bovine serum albumin
  • 20x SSPE sodium chloride, sodium phosphate, EDTA
  • the preamplification primer pool also includes a control primer pair for amplifying a control marker.
  • a control marker is a nucleic acid with known characteristics (e.g., sequence is known, copy number per cell is known) that is used with the experiment Targets (e.g., nucleic acids of unknown concentration) are compared.
  • the control may be an endogenous, preferably an invariant gene, against which the experimental nucleic acid or target nucleic acid under analysis may be normalized. Such normalized controls may occur due to inter-sample differences, such as sample processing, analytical efficiency, etc., and allow accurate inter-sample data comparison, quantitative analysis of amplification efficiency and bias.
  • the present application uses RRBS technology to detect the methylation level of the CpG site of the target marker of interest, and then calculates the methylation haplotype ratio (MHF) of the marker as the marker. DNA methylation levels. Calculation of MHF can be performed as described in this application.
  • methylation levels of one or more of the target markers described herein can be used to determine cancer.
  • the methylation level of CpG sites in a target marker described herein can be detected in a sample, and then the methylation haplotype ratio (MHF) of the target marker can be calculated as The DNA methylation level of this marker.
  • MHF methylation haplotype ratio
  • MHF can be calculated by the following formula:
  • i represents the target methylation interval
  • h represents the target methylation haplotype
  • Ni represents the number of reads located in the target methylation interval
  • Ni represents the number of reads containing the target methylation haplotype.
  • the average methylation level can also be calculated, and for each target region the average level of methylation within the region is calculated.
  • the formula is as follows:
  • N C, i is the number of reads with the base C of the CpG site (that is, the number of reads that are methylated at this site )
  • N T,i is the number of reads at the CpG site whose base is T (that is, the number of sequencing reads that are unmethylated at this site).
  • model LogisticRegression()
  • MHF methylation level value
  • b the intercept value
  • T the transpose
  • This paper constructs a training set based on the DNA methylation level of each marker in the training set samples.
  • the threshold defined by the Youden index of the training set is used as the cancer prediction threshold, and the cancer prediction thresholds of each marker described in this paper are obtained.
  • the cancer prediction threshold of each marker can be found in Table 8, Table 11, Table 15 and Table 19 of this article.
  • the MHF of the target marker in each sample is calculated according to the above formula, and the target marker is obtained through the trained model The prediction score of Colorectal cancer risk.
  • the MHF of the target marker in each sample is calculated according to the above formula, and the target marker is obtained through the trained model If the prediction score is higher than the threshold of the target marker shown in Table 11, it is determined that the patient has gastric cancer, or is at risk of gastric cancer.
  • the MHF of the target marker in each sample is calculated according to the above formula, and the target marker is obtained through the trained model If the prediction score is higher than the threshold of the target marker shown in Table 15, it is determined that the patient has esophageal cancer or is at risk of esophageal cancer.
  • the MHF of the target marker in each sample is calculated according to the above formula, and the target marker is obtained through the trained model If the prediction score is higher than the threshold of the target marker shown in Table 19, it is determined that the patient has liver cancer or is at risk of liver cancer.
  • each sample can calculate its respective MHF from the detected methylation levels of CpG sites in each target marker.
  • the MHFs of the two or more target markers obtained from all samples are used for training to obtain the parameters of the above prediction model formula.
  • the prediction model score y is obtained, and the y is compared with the two or more targets described in the training set.
  • the Youden index obtained by the marker is compared to a threshold defined above which a patient is diagnosed with cancer or is at risk of cancer.
  • those skilled in the art can also determine an individual's risk of cancer based on various factors, such as age, gender, medical history, family history, symptoms, etc.
  • the present invention provides a methylation detection or diagnostic kit and a diagnostic reagent or a diagnostic composition for identifying cancer (eg, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer), the kit and the composition comprising Reagents for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest described herein.
  • cancer eg, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer
  • the kit and the composition comprising Reagents for detecting the methylation status or level of at least one CpG dinucleotide of one or more markers of interest described herein.
  • primers and/or probe molecules may be included in the kits and compositions.
  • the primers include a pair of primers capable of hybridizing to the target marker to be detected or its target region under stringent conditions, medium stringency conditions or high stringency conditions. Primers may also include primers that detect internal controls such as ACTB
  • the primers are packaged in a single container or in separate containers.
  • the kit further comprises one or more blocking oligonucleotides.
  • kits and compositions further comprise detection reagents.
  • the detection reagent is selected from the group consisting of fluorescent probes, intercalating dyes, chromophore-labeled probes, radioisotope-labeled probes, and biotin-labeled probes.
  • the kit may also include a DNA polymerase and/or a container suitable for storing a biological sample obtained from the individual.
  • the kit further contains instructions for use and/or explanations of the test results of the kit.
  • kits and compositions may also include reagents for enzymatic or non-enzymatic transformation.
  • the kit further includes a bisulfite reagent or a methylation-sensitive restriction enzyme (MSRE).
  • the bisulfite reagent is selected from the group consisting of ammonium bisulfite, sodium bisulfite, potassium bisulfite, calcium bisulfite, magnesium bisulfite, aluminum bisulfite, sulfurous acid Hydrogen ions, and any combination thereof.
  • the bisulfite reagent is sodium bisulfite.
  • the MSRE is selected from the group consisting of HpaII enzyme, SalI enzyme, enzyme, ScrFI enzyme, BbeI enzyme, NotI enzyme, SmaI enzyme, XmaI enzyme, MboI enzyme, BstBI enzyme, ClaI enzyme, MluI enzyme, NaeI enzyme, NarI enzyme, PvuI enzyme, SacII enzyme, HhaI enzyme and any combination thereof.
  • kits and compositions may also include converted positive standards in which unmethylated cytosine is converted to a base that does not bind to guanine.
  • the positive standard may be fully methylated.
  • kits and compositions may also include PCR reaction reagents.
  • the PCR reaction reagents include Taq DNA polymerase, PCR buffer, dNTPs, and Mg 2+ .
  • kits and compositions further comprise standard reagents useful for performing CpG position-specific methylation analysis, wherein the analysis includes one or more of the following technologies: MS-SNuPE, MSP, MethyLight TM , HeavyMethylTM, COBRA and nucleic acid sequencing.
  • kits and compositions may include additional reagents selected from the group consisting of buffers (e.g., restriction enzymes, PCR, storage or wash buffers), DNA recovery reagents or kits (e.g., precipitation, Ultrafiltration, affinity column) and DNA recovery components, etc.
  • buffers e.g., restriction enzymes, PCR, storage or wash buffers
  • DNA recovery reagents or kits e.g., precipitation, Ultrafiltration, affinity column
  • DNA recovery components e.g., DNA recovery components, etc.
  • the kit of the present application may further comprise one or more of the following components known in the field of DNA enrichment: a protein component that selectively binds methylated DNA; a triplex-forming nucleic acid component , one or more linkers, optionally in a suitable solution; substances or solutions for ligation, such as ligases, buffers; substances or solutions for column chromatography; immunology-based Substances or solutions for enrichment (e.g. immunoprecipitation); substances or solutions used for nucleic acid amplification, e.g. PCR; a dye or dyes, if suitable for use as a coupling agent, if suitable for use in solution; used for Substances or solutions for carrying out hybridization; and/or substances or solutions for carrying out washing steps.
  • a protein component that selectively binds methylated DNA such as ligases, buffers
  • substances or solutions for column chromatography immunology-based Substances or solutions for enrichment (e.g. immunoprecipitation); substances or solutions used for nucleic
  • compositions of the present application contain isolated nucleic acid molecules selected from one or more of the following: shown in any one of SEQ ID NO: 1-47. .
  • compositions of the present application contain isolated nucleic acid molecules selected from one or more of the following: shown in any one of SEQ ID NO: 48-95.
  • compositions of the present application contain isolated nucleic acid molecules selected from one or more of the following: shown in any one of SEQ ID NO: 96-138.
  • compositions of the present application contain isolated nucleic acid molecules selected from one or more of the following: any one of SEQ ID NOs: 139-340.
  • the present application also includes a medium recording the sequence of the isolated nucleic acid molecule described herein and optionally its methylation information for comparison with gene methylation sequencing data to determine the presence of the nucleic acid molecule, content and/or methylation levels.
  • the medium is a card printed with the sequence and optionally its methylation information, such as a paper, plastic, metal, glass card.
  • the medium is a computer-readable medium storing the sequence and optionally its methylation information and a computer program.
  • the following steps are achieved: converting the methylation information of the sample to The basal sequencing data is compared with the sequence to obtain the presence, content and/or methylation level of nucleic acid molecules containing the sequence in the sample.
  • the present application also includes an apparatus for identifying cancer (eg, colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer), the apparatus including a memory, a processor, and a program stored on the memory and executable on the processor.
  • Computer program, the processor implements the following steps when executing the program: (1) obtaining the methylation level of one or more target markers or target regions thereof selected from the following in the sample, (2) Whether it is cancer (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer) is determined based on the methylation level of (1).
  • the obtaining step is performed by any one of the methods described in Part IV of this application; preferably, the interpretation is performed by any one of the methods described in Part V of this application.
  • This application also provides the use of the isolated nucleic acid molecules described in this application as detection targets in the diagnosis of cancer (eg, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer).
  • cancer eg, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer.
  • the methylation markers and technical solutions provided by this application effectively solve the problem of low sensitivity of current diagnostic technology.
  • it helps in early diagnosis and early treatment of cancer (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer) to improve the cure rate.
  • the present invention provides a method for screening the risk of cancer (eg, colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer), diagnosing cancer (eg, colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer).
  • cancer eg, colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer
  • Liver cancer a method for assessing the prognosis of cancer (e.g., colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer), comprising: (1) detecting the cancer described in the present application (e.g., colorectal cancer, gastric cancer, esophageal cancer, etc.) in a sample of the subject and/or liver cancer)-related sequences (one or more markers), for example, by sequencing; (2) comparing the methylation levels of the markers in step (1) with corresponding reference levels, (3) ) Screen for cancer (e.g., colorectal, gastric, esophageal, and/or liver cancer) risk, diagnose cancer (e.g., colorectal, gastric, esophageal, and/or liver cancer), or evaluate cancer (e.g., colorectal cancer) based on the comparison results Rectal, gastric, esophageal, and/or liver cancer) prognosis.
  • the method further comprises: (1) detecting
  • step (1) can be any detection method suitable for detecting genomic DNA methylation.
  • step (1) includes: treating genomic DNA with a transformation reagent to convert unmethylated cytosine into a base with a lower guanine-binding capacity than cytosine (such as uracil); using primers to perform PCR amplification, the primers are suitable for amplifying the transformed sequences of the cancer (for example, colorectal cancer, gastric cancer, esophageal cancer and/or liver cancer) related sequences described in the application; by the presence or absence of the amplification product, or Sequence identification (e.g., probe-based PCR detection identification or DNA sequencing identification) confirms Determine the methylation level of at least one CpG.
  • a transformation reagent to convert unmethylated cytosine into a base with a lower guanine-binding capacity than cytosine (such as uracil)
  • primers are suitable for amplifying the transformed sequences of the cancer (for example, colorectal
  • step (1) may also include: treating genomic DNA with a methylation-sensitive restriction endonuclease; performing PCR amplification using primers suitable for amplifying the colorectal cancer-related sequence contained in the present application.
  • the sequence of at least one CpG dinucleotide; the methylation level of at least one CpG is determined by the content of the amplification product.
  • the comparison in step (2) includes: directly comparing the methylation level of the marker in step (1) with a reference level, or calculating a score and comparing the score of the methylation level of the marker and corresponding reference scores.
  • the score is calculated using a logistic regression model.
  • step (3) includes: when the methylation level of the marker is greater than a reference level, or the score of the methylation level is greater than the reference score, the subject has cancer (eg, colorectal cancer, gastric cancer , esophageal cancer and/or liver cancer), risk of cancer (e.g., colorectal cancer, stomach cancer, esophageal cancer, and/or liver cancer) or poor prognosis of cancer (e.g., colorectal cancer, stomach cancer, esophageal cancer, and/or liver cancer) .
  • cancer eg, colorectal cancer, gastric cancer , esophageal cancer and/or liver cancer
  • risk of cancer e.g., colorectal cancer, stomach cancer, esophageal cancer, and/or liver cancer
  • poor prognosis of cancer e.g., colorectal cancer, stomach cancer, esophageal cancer, and/or liver cancer
  • the reference level or reference score is a reference methylation level or score that can be used as a basis for diagnosis or screening.
  • levels or scores may be determined by comparing samples from subjects with cancer (e.g., colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer) or risk subjects to healthy subjects, without cancer (e.g., colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer). liver cancer) or risk subjects to obtain comparisons between samples.
  • the reference level or reference score may also be that of a healthy subject, a subject free of cancer (eg, colorectal cancer, gastric cancer, esophageal cancer, and/or liver cancer) or risk.
  • the reference level or reference score can be derived from one subject or a population of at least two subjects. Reference levels can be selected by those skilled in the art based on the desired sensitivity and specificity.
  • a reagent for detecting the methylation status or level of at least one CpG dinucleotide of one or more target markers in the preparation of detection reagents or diagnostic kits for diagnosing gastric cancer, and for determining one or more Application of a device for methylation status or level of at least one CpG dinucleotide of a target marker in preparing a diagnostic kit for diagnosing gastric cancer; wherein the one or more target markers are selected from the following sequence (1) - Any 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22 in (48) ,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47 or all 48 sequences:
  • (22) Contains chr5:92906255:92906617 (SEQ ID NO: 69) and its sequence within 5kb upstream and/or within 5kb downstream;
  • the one or more target markers include the (3), (8), (13), (15), (17), (19), (22), (25), (29), ( or
  • the one or more target markers include the (2), (6), (7), (8), (12), (15), (19), (25), (28), ( or
  • the one or more target markers include the sequences described in items (3), (13), (14), (20), (22), (28), (30) and (36); or
  • the one or more target markers include the sequences described in items (3), (13), (27), (30) and (35); or
  • the one or more target markers include the (7), (14), (22), (26), (35), (38), (40), (43), (47) and ( The sequence described in item 48).
  • the one or more target markers are selected from the group consisting of (7), (14), (22), (26), (35), ( 38), (40), (43), (47) and (48) any of the sequences described in 1, 2, 3, 4, 5, 6, 7, 8 or 9.
  • the target marker includes the sequence described in item (40), and any one or more sequences in items (1)-(39) and (41)-(48); or
  • the target marker includes the sequence described in item (47), and any one or more sequences in items (1)-(46) and (48); or
  • the target marker includes the sequence described in item (43), and any one or more sequences in items (1)-(42) and (44)-(48); or
  • the target marker includes the sequence described in item (26), and any one or more sequences in items (1)-(25) and (27)-(48); or
  • the target marker includes the sequence described in item (35), and any one or more sequences in items (1)-(34) and (36)-(48); or
  • the target marker includes the sequence described in item (14), and any one or more sequences in items (1)-(13) and (15)-(48); or
  • the target marker includes the sequence described in item (38), and any of items (1)-(37) and (39)-(48). means one or more sequences; or
  • the target marker includes the sequence described in item (22), and any one or more sequences in items (1)-(21) and (23)-(48); or
  • the target marker includes the sequence described in item (7), and any one or more sequences in items (1)-(6) and (8)-(48); or
  • the target marker includes the sequence described in item (48), and any one or more sequences in items (1)-(47).
  • the target marker includes within 1 kb, preferably 500 bp, of each starting site of each sequence of SEQ ID NO: 48-95.
  • the target marker is a sequence containing the The gene sequence of any sequence of SEQ ID NO:48-95 and the length is within 400bp.
  • reagents include primers and/or probe molecules
  • the primer molecule is identical to, complementary to, or hybridizes to the one or more target markers under stringent conditions and contains at least 9 consecutive nucleotides
  • the probe molecule is identical to the one or more target markers.
  • the amplified products of the target markers are hybridized under stringent conditions.
  • reagent is a reagent required for implementing genome simplified methylation sequencing technology.
  • a diagnostic reagent or diagnostic kit for detecting the methylation status or methylation level of at least one CpG dinucleotide of one or more target markers to diagnose gastric cancer which contains a method for detecting one or more A reagent for the methylation status or level of at least one CpG dinucleotide of multiple target markers; wherein the one or more target markers are as described in any one of embodiments 1-6.
  • the diagnostic reagent or diagnostic kit includes primers and/or probe molecules; preferably, the primer molecules are the same as, complementary to, or hybridizes to the one or more target markers under stringent conditions and contains at least 9 consecutive nucleotides, and the probe molecule hybridizes to the amplification product of the one or more target markers under stringent conditions ;
  • the diagnostic reagent or diagnostic kit further includes primer molecules and/or probe molecules for detecting the internal reference gene ACTB.
  • At least one reagent or set of reagents for distinguishing between methylated and unmethylated CpG dinucleotides within at least one target region of genomic DNA in a kit for the preparation of a method for detecting and/or classifying gastric cancer in an individual
  • the method includes contacting genomic DNA isolated from the individual biological sample with the at least one reagent or set of reagents, wherein the target region is equivalent to or complementary to one or more target markers
  • the target marker is as described in any one of embodiments 1-6.
  • One or more reagents, an amplification enzyme, and at least one compound comprising: converting an unmethylated cytosine base at position 5 into uracil or other bases detectably different from cytosine in hybridization performance
  • the one or more target markers are as described in any one of Embodiments 1-6.
  • step b) the genomic DNA or fragments thereof are treated with a reagent selected from the group consisting of bisulfite, acid sulfite, metabisulfite and combinations thereof.
  • thermostable DNA polymerase as the amplification enzyme, using a polymerase lacking 5'-3' exonuclease activity, using a polymerase chain Contact or amplify nucleic acid molecules by reacting and/or producing a detectably labeled amplification product.
  • the one or more target markers are as described in any one of Embodiments 1-6.
  • amplification product is determined by hybridizing at least one nucleic acid or peptide nucleic acid that is equivalent to or complementary to a compound selected from the group consisting of A fragment of at least 16 bases long of the sequence of the one or more target markers.
  • a device for detecting and diagnosing gastric cancer in an individual comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor performing the following steps when executing the program: 1) Obtain the methylation level or methylation status of at least one CpG dinucleotide of one or more target markers in the sample, and (2) Interpret the methylation level or methylation status according to (1) stomach cancer;
  • the one or more target markers are as described in any one of Embodiments 1-6.
  • a method for assessing the presence and/or progression of esophageal cancer comprising determining the presence of the modification state of any one or more DNA regions selected from Table 1, or its complementary region, or the above-mentioned fragments in the sample to be tested and/or content.
  • a method for assessing the presence and/or progression of esophageal cancer comprising determining a DNA region within 5k bp upstream or downstream of any one of SEQ ID NO: 96 to 138 in the sample to be tested, or its complement The presence and/or content of the modified state of the region, or fragment described above.
  • a method for assessing the presence and/or progression of esophageal cancer comprising determining a region selected from the sample to be tested within 5 kbp upstream or downstream of SEQ ID NO: 105 and where any one or more genes in Table 2 are located. The presence and/or content of the modification state of the DNA region, or fragment thereof.
  • nucleic acid comprises cell-free free nucleic acid.
  • the base with the modified state does not substantially change after conversion, and the base without the modified state changes to Other bases that are different from the base may be cleaved after conversion.
  • the deamination reagent comprises bisulfite or an analog thereof.
  • the method for determining the presence and/or content of a modification state comprises determining the presence and/or content of a DNA region or a fragment thereof having the modification state. .
  • a nucleic acid comprising a region capable of binding any one or more DNA regions selected from Table 1, or its complementary region domain, or the region transformed from the above, or the sequence of the fragment above.
  • a nucleic acid comprising a DNA region capable of binding to a DNA region within 5 kbp upstream or downstream shown in any one of SEQ ID NO: 96 to 138, or a complementary region thereof, or a region transformed from the above , or a sequence of the above fragments.
  • a nucleic acid comprising a DNA region capable of binding to a region within 5k bp upstream or downstream of SEQ ID NO: 105 and any one or more genes in Table 2, or a complementary region thereof, or the above The transformed region, or the sequence of the above fragment.
  • a kit comprising the nucleic acid of any one of embodiments 19-21.
  • nucleic acid according to any one of embodiments 19-21 and/or the kit according to embodiment 22 in the preparation of disease detection products.
  • nucleic acid according to any one of embodiments 19-21, and/or the kit according to embodiment 22, for the preparation of a substance for assessing the presence and/or progression of esophageal cancer.
  • nucleic acid according to any one of embodiments 19 to 21, and/or the kit according to embodiment 22, for preparing a substance for determining the modification state of the DNA region or fragment thereof.
  • a method for preparing nucleic acids comprising designing a DNA region capable of binding based on the modification state of any one or more DNA regions selected from Table 1, or their complementary regions, or the above-mentioned transformed regions, or the above-mentioned fragments.
  • the DNA region, or its complementary region, or the above-mentioned transformed region, or the nucleic acid of the above-mentioned fragment is designed.
  • a method for preparing nucleic acid comprising a DNA region within 5 kbp upstream or downstream shown in any one of SEQ ID NO: 96 to 138, or a complementary region thereof, or a region transformed from the above, or the modified state of the above-mentioned fragment, and design a nucleic acid capable of binding to the above-mentioned DNA region, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • a method for preparing nucleic acid comprising a region selected from the group consisting of a region within 5k bp upstream or downstream of SEQ ID NO: 105 and a DNA region where any one or more genes in Table 2 are located, or a complementary region thereof, or the above
  • the modified state of the transformed region or the above-mentioned fragment is used to design a nucleic acid capable of binding to the DNA region, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region comprising selected from the group consisting of Any one or more DNA regions in Table 1, or their complementary regions, or the above-mentioned transformed regions, or the sequences of the above-mentioned fragments.
  • nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region comprising a substance selected from the group consisting of SEQ ID NO: 96 to The DNA region within 5 kbp upstream or downstream of any one of 138, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • nucleic acids, nucleic acid sets and/or kits for determining the modification status of a DNA region comprising a substance selected from the group consisting of SEQ ID NO: The region within 5k bp upstream or downstream of 105 and the DNA region where any one or more genes in Table 2 are located, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • a storage medium recording a program capable of executing the method described in any one of embodiments 1-18.
  • An apparatus comprising the storage medium of embodiment 33, and optionally further comprising a processor coupled to the storage medium, the processor configured to perform The program is executed to implement the method described in any one of Embodiments 1-18.
  • a method for assessing the presence and/or progression of liver cancer comprising determining the presence and modification status of any one or more DNA regions selected from Table 3 in Table 3, or their complementary regions, or the above-mentioned fragments in the sample to be tested. /or content.
  • a method for assessing the presence and/or progression of liver cancer comprising determining a DNA region within 1 kbp upstream or downstream of any one of SEQ ID NO: 139 to 340 in the sample to be tested, or its complementary region , or the presence and/or content of the modification state of the above-mentioned fragments.
  • a method for assessing the presence and/or progression of liver cancer including determining the modification status of a DNA region within 1k bp upstream or downstream of any one or more genes selected from Table 4 in the sample to be tested, or its fragments the presence and/or content of.
  • nucleic acid comprises cell-free free nucleic acid.
  • the base with the modified state does not substantially change after conversion, and the base without the modified state changes to Other bases that are different from the base may be cleaved after conversion.
  • the deamination reagent comprises bisulfite or an analog thereof.
  • the method for determining the presence and/or content of a modification state comprises determining the presence and/or content of a DNA region or a fragment thereof having the modification state. .
  • a nucleic acid comprising a sequence capable of binding to any one or more DNA regions selected from Table 3, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • a nucleic acid comprising a DNA region capable of binding to a DNA region within 1 kbp upstream or downstream shown in any one of SEQ ID NO: 139 to 340, or a complementary region thereof, or a region transformed from the above , or a sequence of the above fragments.
  • a nucleic acid comprising a DNA region capable of binding to a DNA region within 1 kbp upstream or downstream of any one or more genes selected from Table 4, or a complementary region thereof, or a region transformed from the above, or a region above sequence of fragments.
  • a kit comprising the nucleic acid of any one of embodiments 19-21.
  • nucleic acid according to any one of embodiments 19-21 and/or the kit according to embodiment 22 in the preparation of disease detection products.
  • nucleic acid according to any one of embodiments 19-21, and/or the kit according to embodiment 22, for the preparation of a substance for assessing the presence and/or progression of esophageal cancer.
  • nucleic acid according to any one of embodiments 19 to 21, and/or the kit according to embodiment 22, for preparing a substance for determining the modification state of the DNA region or fragment thereof.
  • a method for preparing nucleic acids comprising designing a DNA region capable of binding based on the modification state of any one or more DNA regions selected from Table 3, or their complementary regions, or the above-mentioned transformed regions, or the above-mentioned fragments.
  • the DNA region domain, or its complementary region, or the above-mentioned transformed region, or the nucleic acid of the above-mentioned fragment is designed.
  • a method for preparing nucleic acid comprising a DNA region within 1 kbp upstream or downstream shown in any one of SEQ ID NO: 139 to 340, or a complementary region thereof, or a region transformed from the above, or the modified state of the above-mentioned fragment, and design a nucleic acid capable of binding to the above-mentioned DNA region, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment.
  • a method for preparing nucleic acids comprising a DNA region within 1 kbp upstream or downstream of any one or more genes selected from Table 4, or a complementary region thereof, or a region transformed from the above, or a region transformed from the above According to the modified state of the fragment, a nucleic acid capable of binding to the DNA region, or its complementary region, or the above-mentioned transformed region, or the above-mentioned fragment is designed.
  • the DNA region includes any one or more DNA regions selected from Table 3, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • the DNA region includes a sequence selected from the DNA region within 1 kbp upstream or downstream shown in any one of SEQ ID NO: 139 to 340, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • the DNA region includes a DNA region selected from the DNA region within 1 kbp upstream or downstream of any one or more genes in Table 4, or its complementary region, or the above-mentioned transformed region, or the sequence of the above-mentioned fragment.
  • a storage medium recording a program capable of executing the method described in any one of embodiments 1-18.
  • An apparatus comprising the storage medium of embodiment 33, and optionally further comprising a processor coupled to the storage medium, the processor configured to perform The program is executed to implement the method described in any one of Embodiments 1-18.
  • Example 1 Colorectal cancer sample processing and methylation marker screening
  • cfDNA circulating cell-free DNA
  • plasma aliquots were thawed and processed immediately using the QIAamp circulating nucleic acid extraction kit (Qiagen 55114) according to the manufacturer's instructions.
  • the extracted cfDNA concentration was quantified using qubit3.0.
  • Sodium bisulfite conversion of cytosine bases was performed using a bisulfite conversion kit (ThermoFisher, MECOV50). According to the manufacturer's instructions, 20 ng of genomic DNA or ctDNA was transformed and purified for downstream applications.
  • the conversion is performed using enzymatic methods, preferably deaminase treatment, or the conversion is performed using non-enzymatic methods, preferably bisulfite or bisulfate treatment, more preferably using Calcium bisulfite, sodium bisulfite, potassium bisulfite, ammonium bisulfite, sodium bisulfate, potassium bisulfate and ammonium bisulfate treatment.
  • the library was constructed using the MethylTitan (CN201910515830) method.
  • the MethylTitan method is as follows.
  • the DNA converted by bisulfite is dephosphorylated and then ligated to a universal illumina sequencing adapter with a molecular tag (UMI).
  • UMI molecular tag
  • the transformed DNA is subjected to a semi-targeted PCR reaction to target the target region required for amplification.
  • sample-specific barcodes and full-length Illumina sequencing adapters are added to the target DNA molecules through a PCR reaction.
  • the final library is then quantified using Illumina's KAPA library quantification kit (KK4844) and sequenced on an Illumina sequencer.
  • the MethylTitan library construction method can effectively enrich the needs when using a smaller amount of DNA, especially cfDNA. target fragment, while the modified method can well preserve the methylation status of the original DNA, and ultimately analyze adjacent CpG methylated cytosine (a given target may have several to dozens of CpGs, depending on (for a given region), the entire methylation pattern of that particular region can serve as a unique signature, rather than comparing the status of individual bases.
  • the reference genome data used in this example comes from the UCSC database (UCSC: hg19, http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz).
  • the nucleotide numbering of the sites involved in the present invention corresponds to the nucleotide position numbering of hg19.
  • MHF methylation haplotype ratio
  • i represents the target methylation interval
  • h represents the target methylation haplotype
  • Ni represents the number of reads located in the target methylation interval
  • Ni represents the number of reads containing the target methylation haplotype.
  • N C, i is the number of reads with the base C of the CpG site (that is, the number of reads that are methylated at this site )
  • N T,i is the number of reads at the CpG site whose base is T (that is, the number of sequencing reads that are unmethylated at this site)
  • the KNN algorithm For missing values in the data matrix, use the KNN algorithm to perform missing data interpolation.
  • the training set is used to train the interpolator using the KNN algorithm, and then the training set matrix and the test set matrix are interpolated respectively.
  • Example 2 Targeted methylation sequencing to screen colorectal cancer-specific methylation sites
  • the inventor screened out 47 methylation markers from a large number of candidate regions. Their genomic locations and associated genes are shown in Table 1.
  • Methylation marker-associated genes refer to genes whose TSS is within 100Kb of the methylation marker and is the closest.
  • the sequences shown in SEQ ID NO: 1-47 were selected as the methylation markers used in the examples.
  • the methylation levels of all CpG sites of each methylation marker can be determined by MethylTitan methylation sequencing. method to obtain.
  • the average methylation level of all CpG sites in each region, the methylation level of a single CpG site, and the combination of CpG site methylation haplotypes within the region can be used as colorectal cancer markers.
  • Figure 2 Box plot shows the distribution of methylation levels of 47 methylation markers in colorectal cancer and non-colorectal cancer in the training set.
  • Figure 3 Box plot shows the distribution of methylation levels of 47 methylation markers in colorectal cancer and non-colorectal cancer in the test set.
  • the distribution of average methylation levels within the methylation marker region in colorectal cancer and colorectal cancer-free cfDNA samples is significantly different, with good differentiation Effect.
  • the P value in Table 7 is the Mann Whitney U Test P value, and the methylation level represents the median methylation level of the cfDNA samples in this group.
  • the statistical results in Table 7 also show that the methylation levels of the 47 methylation markers in this application have significant differences between colorectal cancer and non-colorectal cancer samples (P ⁇ 0.001), which is a good indicator of colorectal cancer. Methylation markers.
  • the methylation of a single marker was used. Based on the base level data, the model is trained in the training set data of Example 1, and the test set samples are used to verify the performance of the model.
  • model LogisticRegression().
  • the formula of the model is as follows, where x is the methylation level value of the sample target marker, and w is the different The coefficient of marker, b is the intercept value, and y is the model prediction score:
  • model.fit (Traindata,TrainPheno), where TrainData is the data of the target methylation site in the training set samples, and TrainPheno is the trait of the training set samples (colorectal cancer is 1, no colorectal cancer Cancer is 0), and determine the relevant threshold of the model based on the samples of the training set.
  • TestPred model.predict_proba(TestData), where TestData is the data of the target methylation site in the test set samples, and TestPred is the model prediction score. Use the prediction score and predict the results according to the above threshold. Determine whether the sample is colorectal cancer.
  • Each single methylation marker in this application can be used as a colorectal cancer marker.
  • Logistic regression modeling is used to set a threshold according to the training set. If it is greater than the threshold, it is predicted to be colorectal cancer, and vice versa, it is predicted to be non-colorectal cancer.
  • the training set Both the test set and the test set can achieve good accuracy, specificity and sensitivity, and other machine learning models can also achieve similar results.
  • This example uses the methylation levels of all 47 methylation markers to construct a logistic regression machine learning model ALLMODEL, which can accurately distinguish colorectal cancer and non-colorectal cancer samples in the data.
  • the specific steps are basically the same as Example 2, except that the data input of all 47 target methylation marker combinations (SEQ ID NO: 1-47) is used Model.
  • the distribution of model prediction scores in the training set and test set is shown in Figure 4.
  • the ROC curve is shown in Figure 5.
  • the AUC for distinguishing colorectal cancer and colorectal cancer-free samples reached 0.965.
  • the AUC for distinguishing colorectal cancer and colorectal cancer-free samples reached 0.965.
  • the threshold was set to 0.441, which is greater than This value predicts colorectal cancer, otherwise it predicts no colorectal cancer.
  • the training set accuracy is 0.894
  • the training set specificity is 0.932
  • the training set sensitivity is 0.859
  • the test set accuracy is 0.892.
  • the test set specificity is 0.914, and the test set sensitivity is 0.867. This model can better distinguish colorectal cancer and colorectal cancer-free samples from samples.
  • this example selects SEQ ID NO: 4, SEQ ID NO: 11, SEQ ID NO: 15, SEQ ID NO from the methylation levels of all 47 methylation markers. :18,SEQ ID NO:19,SEQ ID NO:30,SEQ ID NO:34,SEQ ID NO:37,SEQ ID NO:41 A total of 9 methylation markers were used to construct the logistic regression machine learning model SUBMODEL1.
  • the method of constructing the machine learning model is also consistent with Example 3, but the relevant samples only use the data of the above 9 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 6.
  • the ROC of the model The curve is shown in Figure 7. It can be seen that in the training set and test set of this model, the scores of colorectal cancer and colorectal cancer-free samples are significantly different from the scores of other cancer types.
  • the AUC of distinguishing adenocarcinoma and colorectal cancer-free samples in the training set of this model has reached 0.921. In the test set, the AUC for distinguishing colorectal cancer and colorectal cancer-free samples reached 0.917.
  • the threshold was set to 0.502.
  • training The set accuracy is 0.854
  • the training set specificity is 0.822
  • the training set sensitivity is 0.885
  • the test set accuracy is 0.800
  • the test set specificity is 0.800
  • the test set sensitivity is 0.800, which illustrates the good performance of the combined model.
  • this example selects SEQ ID NO: 1, SEQ ID NO: 21, SEQ ID NO: 29, SEQ ID NO from the methylation levels of all 47 methylation markers. :36, SEQ ID NO:44, SEQ ID NO:47 A total of 6 methylation markers were used to construct the logistic regression machine learning model SUBMODEL2.
  • the method of constructing the machine learning model is also consistent with Example 3, but the relevant samples only use the data of the above 6 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 8.
  • the ROC of this model The curve is shown in Figure 9. It can be seen that the model has a significant difference in the scores of colorectal cancer and colorectal cancer-free samples in the training set and test set.
  • the AUC of the model in the training set to distinguish colorectal cancer and colorectal cancer-free samples reached 0.916.
  • colorectal cancer and colorectal cancer The AUC of distinguishing rectal cancer samples reached 0.879, and the threshold was set to 0.392.
  • the value is greater than this value, it is predicted to be colorectal cancer, and vice versa, it is predicted to be no colorectal cancer.
  • the accuracy of the training set is 0.841
  • the specificity of the training set is 0.841.
  • the sensitivity of the training set is 0.877
  • the sensitivity of the training set is 0.822
  • the accuracy of the test set is 0.785
  • the specificity of the test set is 0.714
  • the sensitivity of the test set is 0.867, which illustrates the good performance of the combined model.
  • this example selects SEQ ID NO: 6, SEQ ID NO: 10, SEQ ID NO: 13, SEQ ID NO from the methylation levels of all 47 methylation markers. :14, SEQ ID NO:22, SEQ ID NO:28, SEQ ID NO:43, a total of 7 methylation markers were used to construct the logistic regression machine learning model SUBMODEL3.
  • the method of constructing the machine learning model is also consistent with Example 3, but the relevant samples only use the data of the above 7 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 10.
  • the ROC of the model The curve is shown in Figure 11. It can be seen that in the training set and test set of this model, the scores of colorectal cancer and colorectal cancer-free samples are significantly different from the scores of other cancer types.
  • the AUC of distinguishing adenocarcinoma and colorectal cancer-free samples in the training set of this model has reached 0.911. In the test set, the AUC for distinguishing colorectal cancer and non-colorectal cancer samples reached 0.932.
  • the threshold was set to 0.507.
  • training The set accuracy is 0.848, the training set specificity is 0.973, the training set sensitivity is 0.731, the test set accuracy is 0.815, the test set specificity is 0.971, and the test set sensitivity is 0.633, which illustrates the good performance of the combined model.
  • Example 8 Gastric cancer sample processing and methylation marker screening
  • Example 2 The sample processing, strategy and data preprocessing process are the same as in Example 1. After calculating the MHF to extract the methylation information, the methylation haplotype data matrix is performed:
  • the training set is further divided into 3 points and 3-fold cross validation is performed. Screen markers based on the average AUC of 3-fold cross validation.
  • the marker obtained in step 3 is based on the Logistic Regression model, uses the training data for model training, and verifies the model effect in the test data.
  • the specific methylation markers selected for gastric cancer are as follows: SEQ ID NO: 48 located in MPC1 or upstream and downstream of the gene; SEQ ID NO: 49 located in GALNT18 or upstream and downstream of the gene; SEQ ID NO: 49 located in TIMP2 or upstream and downstream of the gene.
  • SEQ ID NO: 50 located upstream and downstream of the gene; SEQ ID NO: 51 located within IRF4 or upstream and downstream of the gene; SEQ ID NO: 52 located within CACNA1C or upstream and downstream of the gene; SEQ ID NO: 52 located within HOXD4 or upstream and downstream of the gene ID NO: 53; SEQ ID NO: 54 located within TBX20 or upstream and downstream of the gene; SEQ ID NO: 55 located within NXPH1 or upstream and downstream of the gene; SEQ ID NO: 56 located within CYP26B1 or upstream and downstream of the gene; SEQ ID NO:57 located within PITX1 or upstream and downstream of this gene; SEQ ID NO:58 located within VAX1 or upstream and downstream of this gene; SEQ ID NO:59 located within LHX5 or upstream and downstream of this gene; located within ARC or this gene SEQ ID NO: 60 located upstream and downstream of the gene; SEQ ID NO: 61 located within LZTS1 or upstream and downstream of the gene;
  • the methylation level of the methylation marker region increases or decreases in the cfDNA of gastric cancer patients (see Table 10).
  • the obtained sequences of the 48 methylation markers are SEQ ID NO: 48-95.
  • the methylation levels of all CpG sites of each methylation marker can be obtained by MethylTitan methylation sequencing.
  • the methylation levels calculated by MHF in each region can be used as gastric cancer markers.
  • the methylation levels within the methylation marker region in the gastric cancer and non-gastric cancer populations in the test set are shown in Table 10.
  • Table 10 the distribution of methylation levels in the methylation marker region is significantly different between people with gastric cancer and those without gastric cancer. It has good discrimination effect and has a significant difference (P ⁇ 0.01). It is A good methylation marker in gastric cancer.
  • Sequence preprocessing for each target region, calculate the ratio value of each MHF (Methylated Haplotype Fraction) methylation haplotype in the region.
  • model LogisticRegression().
  • x is the methylation level value of the sample target marker
  • w are the coefficients of different markers
  • b is the intercept value
  • y is the model prediction score:
  • TestPred model.predict_proba(TestData), where TestData is the data of the target methylation site in the test set samples, and TestPred is the model prediction score. Use the prediction score and calculate the result according to the above The threshold determines whether the sample is gastric cancer.
  • Table 11 The effect of the single target marker logistic regression model in this embodiment is shown in Table 11. As can be seen from Table 11, all target markers can reach an AUC of more than 0.5 in both the test set and the training set, and they are all good gastric cancer markers.
  • This embodiment uses the methylation levels of all 48 target markers of gastric cancer to construct a logistic regression machine learning model, and accurately distinguishes whether the subject has gastric cancer samples in the data.
  • the specific steps are basically the same as Example 2, except that the data input model of all 48 target marker combinations (SEQ ID NO: 48-95) is used.
  • the distribution of model prediction scores in the training set and test set is shown in Figure 13.
  • the ROC curve is shown in Figure 14.
  • the AUC for distinguishing gastric cancer and non-gastric cancer samples reached 0.922, which can better distinguish between gastric cancer and non-gastric cancer samples.
  • the threshold is set to 0.53
  • a value greater than this value is predicted to be gastric cancer, and a value less than this value is predicted to be no gastric cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 73%, which illustrates the good performance of the combined model.
  • this example selects SEQ ID NO: 50, 55, 60, 62, 64, 66, 69, 72, 76 from the methylation levels of all 48 methylation markers , 78, 84, 85, 87, 88, 89, 90, 92, 94 and 95, a total of 19 target markers were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 9, but the relevant samples only use the data of the above 19 target markers.
  • the model scores of the model in the training set and the test set are shown in Figure 15, and the ROC curve of the model is shown in Figure 16 . It can be seen that there is a significant difference in the scores of gastric cancer and non-gastric cancer samples in the training set and test set of this model.
  • the AUC of the test set of this model reached 0.919, which illustrates the good performance of this combined model.
  • the threshold is set to 0.54
  • a value greater than this value is predicted to be gastric cancer
  • a value less than this value is predicted to be no gastric cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 78%, which illustrates the good performance of the combined model.
  • this example extracted the methylation levels of all 48 methylation markers.
  • a total of 19 target markers were selected from SEQ ID NO: 49, 53, 54, 55, 59, 62, 66, 72, 75, 79, 80, 83, 84, 87, 89, 90, 91, 93 and 95 to construct a Machine learning model for logistic regression.
  • the method of constructing the machine learning model is also consistent with Example 2, but the relevant samples only use the data of the above 19 target markers.
  • the model scores of the model in the training set and the test set are shown in Figure 17, and the ROC curve of the model is shown in Figure 18 . It can be seen that there is a significant difference in the scores of gastric cancer and non-gastric cancer samples in the training set and test set of this model.
  • the AUC of the test set of this model reached 0.913, which illustrates the good performance of this combined model.
  • the threshold is set to 0.49, a value greater than this value is predicted to be gastric cancer, and a value less than this value is predicted to be no gastric cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 65%, which illustrates the good performance of the combined model.
  • this example selects SEQ ID NO:50, SEQ ID NO:60, SEQ ID NO:61, and SEQ ID NO from the methylation levels of all 48 methylation markers. :67, SEQ ID NO:69, SEQ ID NO:75, SEQ ID NO:77, and SEQ ID NO:84, a total of 8 target markers were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 9, but the relevant samples only use the data of the above eight target markers.
  • the model scores of the model in the training set and the test set are shown in Figure 19, and the ROC curve of the model is shown in Figure 20 . It can be seen that there is a significant difference in the scores of gastric cancer and non-gastric cancer samples in the training set and test set of this model.
  • the AUC of the test set of this model reached 0.872, which illustrates the good performance of this combined model.
  • the threshold is set to 0.46, a value greater than this value is predicted to be gastric cancer, and a value less than this value is predicted to be no gastric cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 56%, which illustrates the good performance of the combined model.
  • this example selects SEQ ID NO:50, SEQ ID NO:60, SEQ ID NO:74, and SEQ ID NO from the methylation levels of all 48 methylation markers. :77, SEQ ID NO:82 A total of 5 target markers were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 9, but the relevant samples only use the data of the above five target markers.
  • the model scores of the model in the training set and the test set are shown in Figure 21, and the model ROC curve is shown in Figure 22. It can be seen that there is a significant difference in the scores of gastric cancer and non-gastric cancer samples in the training set and test set of this model.
  • the AUC of the test set of this model reached 0.856, which illustrates the good performance of this combined model.
  • the threshold is set to 0.52. A value greater than this value is predicted to be gastric cancer, and a value less than this value is predicted to be no gastric cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 48%, which illustrates the good performance of the combined model.
  • Example 15 Esophageal cancer sample processing and methylation marker screening
  • Example 2 The sample processing, strategy and data preprocessing process are the same as those in Example 1. After the methylation haplotype data matrix, the characteristic methylation haplotypes are found according to the training set sample grouping:
  • pvlaue logist_model.pvalues. Among them, The methylation markers corresponding to the smallest MHF constitute the candidate methylation haplotype.
  • step 3 10 times to traverse all the data, and calculate the AUC of the test data each time. After repeating 10 times, calculate the average AUC of the 10 times. If the AUC of the training data increases, the candidate methylation haplotype is retained as a characteristic methylation marker, otherwise it is discarded and the obtained methylation marker is used using the GREAT tool (great.stanford.edu/great/public-3.0 .0/html/index.php) for gene annotation (see Table 13).
  • the target genes in the methylation markers were gene annotated using the GREAT tool (great.stanford.edu/great/public-3.0.0/html/3.0.0/html/index.php).
  • the marker region is associated with adjacent genes, and the region is annotated with adjacent genes.
  • the association is divided into two processes. First, find the regulatory domain of each gene, and then associate the genes covering the regulatory domain of this region with this region.
  • SKI (+2024) can represent a marker located 93 bp downstream of the transcription start site (TSS) of the SKI gene
  • EPS8L3 (-28150) can represent a marker 28150 bp upstream of the transcription start site (TSS) of the EPS8L3 gene. place markers.
  • the methylation level of the methylation marker region increases or decreases in cfDNA of esophageal cancer patients (see Table 14).
  • the obtained sequences of the 43 methylation markers are SEQ ID NO: 96-138.
  • the methylation levels of all CpG sites of each methylation marker can be obtained by MethylTitan methylation sequencing.
  • the average methylation level of all CpG sites in each region, as well as the methylation level of a single CpG site, can be used as a marker for esophageal cancer.
  • the average methylation levels within the methylation marker region in the esophageal cancer and non-esophageal cancer populations in the test set are shown in Table 14.
  • Table 14 the distribution of average methylation levels in the methylation marker region is significantly different between people with esophageal cancer and those without esophageal cancer, with good discriminating effect and significant difference (P ⁇ 0.01 ), is a good methylation marker for esophageal cancer.
  • Sequence preprocessing for each target region, calculate the ratio value of each MHF (Methylated Haplotype Fraction) methylation haplotype in the region.
  • model LogisticRegression().
  • x is the methylation level value of the sample target marker
  • w are the coefficients of different markers
  • b is the intercept value
  • y is the model prediction score:
  • TestPred model.predict_proba(TestData)[:,1], where TestData is the data of the target methylation site in the test set sample, TestPred is the model prediction score, use this prediction score and judge whether the sample is esophageal cancer based on the above threshold.
  • This example uses the methylation levels of all 43 methylation markers to construct a logistic regression machine learning model, which can accurately distinguish esophageal cancer and non-esophageal cancer samples in the data.
  • the specific steps are basically the same as in Example 16, except that the data input model of all 43 target methylation marker combinations (SEQ ID No: 96-138) is used.
  • the distribution of model prediction scores in the training set and test set is shown in Figure 23.
  • the ROC curve is shown in Figure 24.
  • the AUC for distinguishing between esophageal cancer and non-esophageal cancer samples reached 0.935.
  • the specificity of the training set was 95%
  • the sensitivity of the test set reached 84.3%.
  • the threshold was set to 0.383, which is greater than If the value is higher, it is predicted to be esophageal cancer, otherwise it is predicted to be no esophageal cancer, which shows that the methylation marker of this application can better distinguish between samples with esophageal cancer and samples without esophageal cancer.
  • this example randomly selected SEQ ID Nos: 100, 103, 109, 110, 113, 120, 121, 125, A total of 16 methylation markers, 128, 130, 132, 133, 134, 135, 137 and 138, were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 16, but the relevant samples only use the data of the above 16 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 25.
  • the ROC of this model The curve is shown in Figure 26. It can be seen that in the training set and test set of this model, the scores of esophageal cancer and non-esophageal cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reached 0.920. When the threshold is set to 0.431, it is greater than the predicted value. It is esophageal cancer. If the value is less than this value, it is predicted that there is no esophageal cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 75.8%, which illustrates that multiple markers can be selected from the methylation markers in this application. combination, its model has good performance.
  • this example randomly selected SEQ ID Nos: 102, 107, 108, 110, 112, 120, 121, 123, A total of 17 methylation markers, 124, 125, 130, 131, 132, 133, 134, 135 and 137, were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 16, but the relevant samples only use the data of the above 16 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 27.
  • the ROC of the model The curve is shown in Figure 28. It can be seen that in the training set and test set of this model, the scores of esophageal cancer and non-esophageal cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reached 0.916. When the threshold is set to 0.431, it is greater than the predicted value. It is esophageal cancer. If the value is less than this value, it is predicted that there is no esophageal cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 59.4%, which illustrates that multiple markers can be selected from the methylation markers in this application. combination, its model has good performance.
  • This application obtained 43 significantly different methylated nucleic acids based on the methylation levels of related genes in plasma cfDNA. fragment. Based on the above-mentioned single methylated nucleic acid fragment marker, or a marker group composed of multiple methylated nucleic acid fragments, an esophageal cancer risk prediction model can be effectively identified with high sensitivity and specificity, and is suitable for esophageal cancer. Cancer screening and diagnosis.
  • This example uses the methylation levels of 27 methylation markers to construct a logistic regression machine learning model, which can accurately distinguish esophageal cancer and non-esophageal cancer samples in the data.
  • the specific steps are basically the same as in Example 16, except that 27 target methylation marker combinations (SEQ ID No: 98, 100, 102, 103, 107, 108, 109, 111, 112, 113, 114, 116, 117, 121, 123, 124, 125, 127, 128, 130, 131, 133, 134, 135, 136, 137 and 138).
  • the distribution of model prediction scores in the training set and test set is shown in Figure 29.
  • the ROC curve is shown in Figure 30.
  • the AUC for distinguishing esophageal cancer and non-esophageal cancer samples reached 0.930.
  • the specificity of the training set was 95%
  • the sensitivity of the test set reached 57.6%.
  • the threshold was set to 0.425, which is greater than If the value is higher, it is predicted to be esophageal cancer, otherwise it is predicted to be no esophageal cancer, which shows that the methylation marker of this application can better distinguish between samples with esophageal cancer and samples without esophageal cancer.
  • this example selected SEQ ID Nos: 102, 109, 116, 117, 127, 134 and 135 from the methylation levels of the 27 methylation markers in Example 20, a total of 7 A logistic regression machine learning model was constructed for each methylation marker.
  • the method of constructing the machine learning model is also consistent with Example 16, but the relevant samples only use the data of the above 7 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 31.
  • the ROC of this model The curve is shown in Figure 32. It can be seen that in the training set and test set of this model, the scores of esophageal cancer and non-esophageal cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reaches 0.900. When the threshold is set to 0.50, it is greater than the predicted value. It is esophageal cancer. If the value is less than this value, it is predicted that there is no esophageal cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 57.6%, which illustrates that multiple markers can be selected from the methylation markers in this application. combination, its model has good performance.
  • this example selected SEQ ID Nos: 121, 125, 130, 133, 134, 135 and 136 from the methylation levels of the 27 methylation markers in Example 20, a total of 7 A logistic regression machine learning model was constructed for each methylation marker.
  • the method of building a machine learning model is also the same as in Example 16, but the relevant samples only use the above 7 steps in this example.
  • the model scores of the model in the training set and the test set are shown in Figure 33, and the ROC curve of the model is shown in Figure 34. It can be seen that in the training set and test set of this model, the scores of esophageal cancer and non-esophageal cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reached 0.890. When the threshold is set to 0.594, it is greater than the predicted value. It is esophageal cancer. If the value is less than this value, it is predicted that there is no esophageal cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 65.6%, which illustrates that multiple markers can be selected from the methylation markers in this application. combination, its model has good performance.
  • This example uses the methylation levels of 23 methylation markers to construct a logistic regression machine learning model, which can accurately distinguish esophageal cancer and non-esophageal cancer samples in the data.
  • the specific steps are basically the same as in Example 16, except that all 23 target methylation marker combinations (SEQ ID No: 96, 97, 99, 101, 104, 105, 106, 110, 115, 118 , 119, 120, 121, 122, 125, 126, 129, 130, 132, 133, 134, 135 and 137) data input model.
  • the distribution of model prediction scores in the training set and test set is shown in Figure 35.
  • the ROC curve is shown in Figure 36.
  • the AUC for distinguishing between esophageal cancer and non-esophageal cancer samples reached 0.934.
  • the training set specificity was 95%
  • the test set sensitivity reached 95%. 64%, set the threshold to 0.41. If it is greater than this value, it is predicted to be esophageal cancer, otherwise it is predicted to be no esophageal cancer. This shows that the methylation markers of this application can better distinguish between esophageal cancer and no esophageal cancer from samples. sample.
  • this example selects SEQ ID Nos: 96, 97, 99, 104, 105, 106, 110, 118, 120 from the methylation levels of the methylation markers in Example 23 A total of 17 methylation markers, 122, 125, 126, 129, 130, 132, 133 and 135, were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 16, but the relevant samples only use the data of the above 17 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 37.
  • the ROC of this model The curve is shown in Figure 38. It can be seen that in the training set and test set of this model, the scores of esophageal cancer and non-esophageal cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reached 0.900, and when the threshold was set to 0.508, A value greater than this value is predicted to be esophageal cancer, and a value less than this value is predicted to be no esophageal cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 56.3%, indicating that many methylation markers can be selected from the application. A combination of markers, the model has good performance.
  • this example selects SEQ ID Nos: 96, 97, 99, 105, 110, 118, 119, 120, 121, 122, 129, 130, 134, 135 and 137, a total of 15 A machine learning model of logistic regression was constructed for methylation markers.
  • the method of constructing the machine learning model is also consistent with Example 16, but the relevant samples only use the data of the above 15 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 39.
  • the ROC of this model The curve is shown in Figure 40. It can be seen that in the training set and test set of this model, the scores of esophageal cancer and non-esophageal cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reached 0.906. When the threshold is set to 0.511, it is greater than the predicted value. It is esophageal cancer. If the value is less than this value, it is predicted that there is no esophageal cancer.
  • the specificity in the training set is 95%, and the sensitivity in the test set reaches 59.4%, which illustrates that multiple markers can be selected from the methylation markers in this application. combination, its model has good performance.
  • Example 26 Liver cancer sample processing and methylation marker screening
  • liver cancer blood samples A total of 276 liver cancer blood samples and 393 liver cancer-free blood samples were collected. All enrolled patients signed informed consent forms. The sample information is shown in Table 16.
  • Example 2 The sample processing, strategy and data preprocessing process are the same as those in Example 1. After the methylation haplotype data matrix, the characteristic methylation haplotypes are found according to the training set sample grouping:
  • pvlaue logist_model.pvalues. Among them, The methylation markers corresponding to the smallest MHF constitute the candidate methylation haplotype.
  • step 3 10 times to traverse all the data, and calculate the AUC of the test data each time. After repeating 10 times, calculate the average AUC of the 10 times. If the AUC of the training data increases, the candidate methylation haplotype is retained as a characteristic methylation marker, otherwise it is discarded and the obtained methylation marker is used using the GREAT tool (great.stanford.edu/great/public-3.0 .0/html/index.php) for gene annotation (see Table 17).
  • the target genes in the methylation markers were gene annotated using the GREAT tool (great.stanford.edu/great/public-3.0.0/html/3.0.0/html/index.php).
  • the marker region is associated with adjacent genes, and the region is annotated with adjacent genes.
  • the association is divided into two processes. First, find the regulatory domain of each gene, and then associate the genes covering the regulatory domain of this region with this region.
  • SKI (+2024) can represent a marker located 93 bp downstream of the transcription start site (TSS) of the SKI gene
  • EPS8L3 (-28150) can represent a marker 28150 bp upstream of the transcription start site (TSS) of the EPS8L3 gene. place markers.
  • the methylation level of the methylation marker region increases or decreases in the cfDNA of liver cancer patients (see Table 18).
  • the obtained sequences of 202 methylation markers are as SEQ ID NO: 139-340.
  • the methylation levels of all CpG sites of each methylation marker can be obtained by MethylTitan methylation sequencing.
  • the average methylation level of all CpG sites in each region, as well as the methylation level of a single CpG site, can be used as liver cancer markers.
  • the average methylation levels within the methylation marker regions in the test set for people with liver cancer and those without liver cancer are shown in Table 18.
  • Table 18 the distribution of average methylation levels in the methylation marker region is significantly different between people with liver cancer and those without liver cancer. It has good discrimination effect and has a significant difference (P ⁇ 0.01). It is a good liver cancer methylation marker.
  • Sequence preprocessing for each target region, calculate the ratio value of each MHF (Methylated Haplotype Fraction) methylation haplotype in the region.
  • model LogisticRegression().
  • x is the methylation level value of the sample target marker
  • w are the coefficients of different markers
  • b is the intercept value
  • y is the model prediction score:
  • TestPred model.predict_proba(TestData)[:,1], where TestData is the data of the target methylation site in the test set sample, TestPred is the model prediction score, use this prediction score and judge whether the sample is liver cancer based on the above threshold.
  • This example uses the methylation levels of all 202 liver cancer methylation markers to construct a logistic regression machine learning model, which can accurately distinguish liver cancer and non-liver cancer samples in the data.
  • the specific steps are basically the same as Example 27, except that the data input model of all 202 target methylation marker combinations (SEQ ID No: 139-340) is used.
  • the distribution of model prediction scores in the training set and test set is shown in Figure 41.
  • the ROC curve is shown in Figure 42.
  • the AUC for distinguishing liver cancer and non-liver cancer samples reached 0.986.
  • the specificity of the training set was 99%, the sensitivity of the test set reached 91%.
  • the threshold was set to 0.58. If it is greater than this value, It is predicted to be liver cancer, otherwise it is predicted to be no liver cancer, which can better distinguish between liver cancer and non-liver cancer samples.
  • this example randomly selected SEQ ID Nos: 176, 183, 187, 195, 196, 209, 210, 214, A total of 25 methylation markers, 220, 225, 227, 228, 241, 245, 246, 269, 270, 286, 293, 299, 301, 302, 326, 329 and 337, were used to construct a logistic regression machine learning model.
  • the method of constructing the machine learning model is also consistent with Example 27, but the relevant samples only use the data of the above 25 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 43.
  • the ROC of this model The curve is shown in Figure 44. It can be seen that in the training set and test set of this model, the scores of liver cancer and non-liver cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reaches 0.938. When the threshold is set to 0.673, a value greater than this value is predicted to be liver cancer. , less than this value is predicted as no liver cancer.
  • the specificity in the training set is 99%, and the sensitivity in the test set reaches 76%, which shows that the combined model is good. good performance.
  • this example selected SEQ ID No: 139, 140, 143, 144, 164, 165, 175, 176, 178 from the methylation levels of all 202 methylation markers ,183,184,190,192,194,195,199,203,204,206,208,210,213,215,216,218,220,224,234,235,237,253,265,266,267 , 269, 270, 271, 272, 281, 286, 301, 306, 314, 315, 317, 320, 321, 322, 323, 333, 336 and 338, a total of 52 methylation markers were used to construct a logistic regression machine Learning model.
  • the method of constructing the machine learning model is also consistent with Example 27, but the relevant samples only use the data of the above 52 markers in this example.
  • the model scores of the model in the training set and test set are shown in Figure 45.
  • the ROC of this model The curve is shown in Figure 46. It can be seen that in the training set and test set of this model, the scores of liver cancer and non-liver cancer samples are significantly different from the scores of other cancer types.
  • the AUC of the test set of this model reaches 0.959. When the threshold is set to 0.58, a value greater than this value is predicted to be liver cancer. , less than this value is predicted as no liver cancer, the specificity in the training set is 99%, and the sensitivity in the test set reaches 71%, which illustrates the good performance of the combined model.
  • This application obtained 202 methylated nucleic acid fragments with significant differences based on the methylation levels of relevant genes in plasma cfDNA.
  • a liver cancer risk prediction model can be effectively identified with high sensitivity and specificity, and is suitable for liver cancer screening. Check and diagnose.

Abstract

一种癌症鉴别方法,以及甲基化标志物及其检测试剂在制备用于诊断对象的癌症的试剂盒中的用途。

Description

鉴别癌症的甲基化标志物及应用 技术领域
本申请涉及生物医药领域,具体的涉及一种癌症早期诊断和筛查方法
背景技术
根据世界卫生组织国际癌症研究机构(IARC)最新发布的《2020年全球最新癌症负担数据》显示:2020年全球新发癌症病例1929万例,其中中国新发癌症457万人。恶性肿瘤已经成为严重威胁人民生命健康的第一杀手,“早发现、早诊断、早治疗”是目前公认的对抗癌症最有效的手段。
结直肠癌是世界上第三高发和致死癌症,而且近年来随着饮食结构等生活方式变化,结直肠癌的发病率逐渐提高,极大危害人类的身体健康。结直肠癌治愈率与癌症分期密切相关,I期和II期结直肠癌患者五年生存期达到80%,而III期患者五年生存期下降至50%,IV期患者五年生存期仅为8%。遗憾的是,多数患者在结直肠癌早期无明显症状,就诊时已处于癌症中晚期,错失最佳治疗时期。因此,风险人群定期筛查结直肠癌,对于提高结直肠癌治疗效果,挽救患者生命具有重要意义。临床应用中,肠镜(colonoscopy)是结直肠癌诊断的金标准,但其具有操作难度高,前期准备耗时久,患者痛苦较大,花费高等缺点,不适合进行大规模筛查。目前有多种无创检测的方式可用于结直肠癌筛查,如基于粪便的粪便免疫化学试验(FIT)等,这些方法具有方便快捷等优点,但其检测性能较差,容易受到饮食等因素干扰,灵敏性和特异性都比较低。
胃癌是全球范围内第二普遍发生的癌症类型,而且几乎三分之二的病发案例都是在发展中国家。根据现有数据,胃癌是男性人群中发病率第四的癌种,在女性人群的癌症癌种发病率排名第七。目前,胃癌已经成为人们健康的严重威胁。寻找便捷有效的早期胃癌诊断方法,对降低其导致的病死率以及提高其生存率起到至关重要的作用。其中肿瘤标志物是一种重要的检查手段,可在简单、经济的条件下,为临床诊断及治疗、为病人减轻筛查费用提供有效的证据。血液是胃癌筛查候选肿瘤标志物的首选来源,基于血液的生物标志物提供了整个患者身体的概况,包括原发性肿瘤、转移性疾病、免疫应答和肿瘤周围基质。常见的胃癌血液标志物包括CEA、CA19-9、CA72-4等。这几种肿瘤标志物都存在敏感性不高的特点,检出率仅在50%左右。另外,特异性较差也是一项很大的缺陷。比如,CA19-9血清水平在多种腺 癌(B包括胰腺癌、肝胆管癌、胃癌)中均有升高。CEA在多种癌症甚至非癌疾病中均有升高等。由于敏感性不高、特异性较差,在实际临床,尤其作为胃癌的早期筛查应用中,这些血液标志物的使用较为受限。
食管癌是全世界最常见的恶性肿瘤之一,其具有高发病率,高死亡率的特点,成为人们健康的严重威胁。早期食管癌症状不明显,并且无特异性诊断方法,因此大部分食管癌患者确诊时已属中晚期。肿瘤标志物也是一种食管癌的重要检查手段。之前已有的研究主要是单一的血清标志物在食管癌患者与正常对照的前后差异,比如miR-138,但其敏感性和特异性还不能达到预期。但也有部分研究对血清标志物进行组合检测,比如多个小RNA联合检测,尽管如此,其敏感性和特异性提升有限。此外,尽管近几十年来关于几种血清生物标志物的功效有了的大量数据积累,但缺乏实施其用于食管癌患者的指南和标准早起检测方案。循环肿瘤DNA(ctDNA)分子来源于凋亡或坏死的肿瘤细胞,携带来自早期恶性肿瘤的肿瘤特异性DNA甲基化标记,近年来被研究为开发多种癌症的无创早期筛查工具的有前景的新靶点。然而,大多数这些研究未取得有效的结果。
肝癌在我国是一种严重威胁健康的一种癌症,肝癌起病隐匿,患者一旦出现临床症状,病情往往已经处于中晚期而失去根治性治疗的机会,预后极其凶险;因此肝癌患者越早得到确诊,治疗效果越好,生存率才能越高。目前常见的检测手段如甲胎蛋白测定:是用免疫方法测定产生的胚胎性抗原,为目前诊断肝细胞癌特异性最高的方法之一,对诊断肝细胞肝癌具有相对专一性。对无肝癌其它证据,α-FP对流免疫电泳法阳性或定量>500ng/ml持续一个月以上,并能排除妊娠,活动性肝病,生殖腺胚胎性肿瘤等即可诊断为肝细胞癌。血液酶学检查:肝癌病人血清中γ-谷氨酰转肽酶,硷性磷酸酶和乳酸脱氢酶的同功酶等可高于正常,但由于缺乏特异性,多作为辅助诊断。
因此,寻找便捷有效的癌症诊断或预后复发监控的方法,对降低其导致的病死率以及提高其生存率起到至关重要的作用。
发明内容
本申请提供一种早期非侵入性鉴别癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)的甲基化标志物及应用,基于本申请的生物标志物组群在血浆中的甲基化水平,可以便捷、准确、高效地鉴别(例如,结直肠癌、胃癌、食管癌和/或肝癌)患者,为(例如,结直肠癌、胃癌、食管癌和/或肝癌)的早期诊断提供了新方法,本申请的检测过程无创,安全性高,便于大规模临床应用。本申请只需检测数个甚至一个基因的甲基化水平即可检测良恶性,显著 减少了目标检测区域,提升了技术的应用范围,可以囊括更多的样本。本申请的甲基化标志物、检测方法和/或试剂盒在癌症的早期诊断和复发监控等应用中具有稿灵敏性和特异性的特点。
本申请通过对患者样品的DNA甲基化标志物甲基化水平进行检测,利用检测的甲基化水平数据根据诊断模型预测评分,用以区分结癌症患者和非癌症患者,可以实现早期筛查过程中更高准确率、更低成本的癌症早期诊断的目的。
一方面,本申请提供了一种结直肠癌甲基化标志物,其是分离的来自哺乳动物的核酸分子,所述核酸分子的序列包括:(1)SEQ ID NO:1-47中任一种或多种(例如至少6个、至少7个、至少8个或至少9个)或全部所示的序列或其互补序列或变体,所述变体是与相应序列具有至少70%序列同一性的变体,并且所述变体中的甲基化位点未发生突变,或(2)(1)的经处理的序列,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基。
在一个或多个实施方案中,所述(1)选自以下任一组:
(1.1)以下序列中任一种或多种或全部:SEQ ID NO:4或其互补序列或变体、SEQ ID NO:11或其互补序列或变体、SEQ ID NO:15或其互补序列或变体、SEQ ID NO:18或其互补序列或变体、SEQ ID NO:19或其互补序列或变体、SEQ ID NO:30或其互补序列或变体、SEQ ID NO:34或其互补序列或变体、SEQ ID NO:37或其互补序列或变体、SEQ ID NO:41或其互补序列或变体,任选还包括SEQ ID NO:1-47中其余序列的任一种或多种或其互补序列或变体,
(1.2)以下序列中任一种或多种或全部:SEQ ID NO:1或其互补序列或变体、SEQ ID NO:21或其互补序列或变体、SEQ ID NO:29或其互补序列或变体、SEQ ID NO:36或其互补序列或变体、SEQ ID NO:44或其互补序列或变体、SEQ ID NO:47或其互补序列或变体,任选还包括SEQ ID NO:1-47中其余序列的任一种或多种或其互补序列或变体,
(1.3)以下序列中任一种或多种或全部:SEQ ID NO:6或其互补序列或变体、SEQ ID NO:10或其互补序列或变体、SEQ ID NO:13或其互补序列或变体、SEQ ID NO:14或其互补序列或变体、SEQ ID NO:22或其互补序列或变体、SEQ ID NO:28或其互补序列或变体、SEQ ID NO:43或其互补序列或变体,任选还包括SEQ ID NO:1-47中其余序列的任一种或多种或其互补序列或变体。
在一个或多个实施方案中,所述甲基化位点是连续的CpG。
在一个或多个实施方案中,所述甲基化标志物可以是所述序列区域中任意一个或者多个CpG位点。
在一个或多个实施方案中,所述核酸分子用作检测样品中相应序列的DNA甲基化水平 的内标或对照。
另一方面,本申请提供了检测DNA甲基化的试剂,用于筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后,所述试剂包含检测对象的样品中标志物的甲基化水平的试剂,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2。
在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述标志物包含至少3个CpG二核苷酸。
在一个或多个实施方案中,所述DNA序列包括DNA正义链或反义链。
在一个或多个实施方案中,所述的片段长度为1-1000bp,优选1-700bp。在一个或多个实施方案中,所述片段是基因序列的启动子区域或其部分。在一个或多个实施方案中,所述片段包含至少1个,优选至少3个CpG二核苷酸。优选地,所述标志物具有本申请所述的核酸分子的序列。
在一个或多个实施方案中,所述试剂是与所述标志物或其经转化的序列杂交的引物分子。所述引物分子能扩增出所述标志物或其经转化的变体。在一个或多个实施方案中,所述引物序列为甲基化特异的或非特异的。所述引物分子至少9bp。
在一个或多个实施方案中,所述试剂是与标志物或其经转化的序列杂交的探针分子。在一个或多个实施方案中,所述探针还含有可检测物。在一个或多个实施方案中,所述可检测物是5’端荧光报告基团和3’端标记淬灭基团。在一个或多个实施方案中,所述荧光报告基 因选自Cy5、FAM和VIC。所述探针分子至少12bp。
在一个或多个实施方案中,所述样品来自哺乳动物,优选人。
另一方面,本申请提供了记载有DNA序列或其片段和/或其甲基化信息的介质,所述DNA序列包括:
(i)以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1和SIM2,
或(ii)(i)的经处理的序列,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基。
在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述介质用于与基因甲基化测序数据比对以确定含所述序列或其片段的核酸分子的存在、含量和/或甲基化水平。
在一个或多个实施方案中,所述标志物包含至少3个CpG二核苷酸。
在一个或多个实施方案中,所述DNA序列包括DNA正义链或反义链。
在一个或多个实施方案中,所述的片段长度为1-1000bp,优选1-700bp。在一个或多个实施方案中,所述片段是基因序列的启动子区域或其部分。在一个或多个实施方案中,所述片段包含至少1个,优选至少3个CpG二核苷酸。优选地,所述标志物具有本申请所述的核酸分子SEQ ID NO:1-47中任一项所示的序列。
在一个或多个实施方案中,所述介质是印有所述DNA序列或其片段和/或其甲基化信息的载体,包括卡片,例如纸质、塑料、金属、玻璃卡片。
在一个或多个实施方案中,所述介质是存储有所述序列和/或其甲基化信息和计算机程序 的计算机可读介质,当所述计算机程序被处理器执行时,实现下述步骤:将样品的甲基化测序数据与所述序列或信息比较,从而获得所述样品中含所述序列的核酸分子的存在、含量和/或甲基化水平。含所述序列的核酸分子的存在、含量和/或甲基化水平用于筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后。
另一方面,本申请还提供了以下(a)和任选的(b)在制备用于筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后的试剂盒中的用途,
(a)用于确定对象的样品中标志物的甲基化水平的试剂或装置,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,
(b)所述标志物或其经处理的核酸分子,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基,
其中,所述DNA序列包括以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1和SIM2。
在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述标志物包含至少3个CpG二核苷酸。
在一个或多个实施方案中,所述DNA序列包括DNA正义链或反义链。
在一个或多个实施方案中,所述片段长度为1-1000bp,优选1-700bp。在一个或多个实施方案中,所述片段是基因序列的启动子区域或其部分。在一个或多个实施方案中,所述片段包含至少1个,优选至少3个CpG二核苷酸。优选地,所述标志物具有本申请所述的核酸分子SEQ ID NO:1-47中任一项所示的序列。
在一个或多个实施方案中,(b)所述核酸分子是包含SEQ ID NO:1-47中任一项所示序列的核酸分子。
在一个或多个实施方案中,所述试剂包含引物分子和/或探针分子。
在一个或多个实施方案中,所述试剂包含与所述标志物或其经转化的序列杂交的引物分子。所述引物分子能扩增出所述DNA序列或其片段或它们的经转化的变体。在一个或多个实施方案中,所述引物序列为甲基化特异的或非特异的。所述引物分子至少9bp。
在一个或多个实施方案中,所述试剂是与所述标志物或其经转化的序列杂交的探针分子。在一个或多个实施方案中,所述探针还含有可检测物。在一个或多个实施方案中,所述可检测物是5’端荧光报告基团和3’端标记淬灭基团。在一个或多个实施方案中,所述荧光报告基因选自Cy5、FAM和VIC。所述探针分子至少12bp。
在一个或多个实施方案中,所述试剂包含本文任一实施方案所述的介质。
在一个或多个实施方案中,所述试剂盒是非侵入性诊断试剂盒。
在一个或多个实施方案中,所述对象是哺乳动物,优选人。
在一个或多个实施方案中,所述样品来自哺乳动物的组织、细胞或体液,例如肠组织样本、血液、血清或血浆。所述哺乳动物优选为人。在一个或多个实施方案中,所述样品包括基因组DNA。优选地,所述样品是血液。
在一个或多个实施方案中,所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基。所述转化使用酶促方法进行,优选脱氨酶处理,或所述转化使用非酶促方法进行,优选用亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其组合处理。
在一个或多个实施方案中,所述试剂盒还包括PCR反应试剂。优选地,所述PCR反应试剂包括DNA聚合酶、PCR缓冲液、dNTP、Mg2+
在一个或多个实施方案中,所述试剂盒还包括检测DNA甲基化的其他试剂,所述其他试剂是选自以下方法的一个或多个中所用的试剂:基于重亚硫酸盐转化的PCR(例如甲基化特异性PCR)、DNA测序(如亚硫酸氢盐测序、全基因组甲基化测序、简化甲基化测序)、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析和质谱(例如飞行质谱)。优选地,所述其他试剂选自以下一种或多种:重亚硫酸盐、亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其衍生物,甲基化敏感或不敏感的限制性内切酶,酶切缓冲液,荧光染料,荧光淬灭剂,荧光报告剂,外切核酸酶,碱性磷酸酶,内标和对照物。
在一个或多个实施方案中,PCR的反应液包含Taq DNA聚合酶、PCR缓冲液、dNTPs、KCl、MgCl2和(NH4)2SO4。优选地,Taq DNA聚合酶为热启动Taq DNA聚合酶。优选地,Mg2+终浓度为1.0-10.0mM。
在一个或多个实施方案中,所述筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后包括:比较标记物的甲基化水平和相应的参考水平,并根据评分筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后。
在一个或多个实施方案中,所述比较包括:直接比较标记物的甲基化水平和参考水平,或者通过计算得出评分并比较标记物的甲基化水平的评分和相应的参考评分。优选地,所述计算通过构建逻辑回归模型进行。
另一方面,本申请还提供了一种用于筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后的方法,包括:
(1)检测对象的样品中标志物的甲基化水平,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括以下基因序列中的一个或多个或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
(2)比较步骤(1)中标记物的甲基化水平和相应的参考水平,
(3)根据比较结果筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后。
在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述标志物包含至少3个CpG二核苷酸。
在一个或多个实施方案中,所述DNA序列包括DNA正义链或反义链。
在一个或多个实施方案中,所述片段长度为1-1000bp,优选1-700bp。在一个或多个实施方案中,所述片段是基因的启动子区域。在一个或多个实施方案中,所述片段包含至少1个,优选至少3个CpG二核苷酸。优选地,所述标志物具有本发明第一方面所述的核酸分子的序列。
在一个或多个实施方案中,所述方法在步骤(1)之前还包含从对象获取含有DNA的生物样品的步骤,例如DNA抽提和/或质检。
在一个或多个实施方案中,步骤(1)包括使用本申请所述的引物分子、探针分子和/或介质,和任选的本申请所述的核酸分子,进行所述检测。
在一个或多个实施方案中,所述检测包括但不限于:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱。
在一个或多个实施方案中,所述检测是DNA测序。在一个或多个实施方案中,所述DNA测序的测序深度至少10X,优选20X,更优选30X。
在一个或多个实施方案中,所述样品来自哺乳动物的组织、细胞、体液,例如肠组织样本、血液、血清或血浆。所述哺乳动物优选为人。优选地,所述样品是血液。
在一个或多个实施方案中,所述样品包括基因组DNA。
在一个或多个实施方案中,所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使其中未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。所述转化使用酶促方法进行,优选脱氨酶处理,或所述转化使用非酶促方法进行,优选用亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其组合处理。
在一个或多个实施方案中,步骤(2)中的比较包括:直接比较步骤(1)中标记物的甲基化水平和参考水平,或者通过计算得出评分并比较标记物的甲基化水平的评分和相应的参考评分。优选地,所述评分通过逻辑回归模型进行计算。
在一个或多个实施方案中,步骤(3)包括:当标记物的甲基化水平大于参考水平,或者甲基化水平的评分大于参考评分,则所述对象有形成结直肠癌的风险、患有结直肠癌或结直肠癌预后不良。
另一方面,本申请还提供了筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后的试剂盒,包含:
(a)用于确定对象的样品中标志物的甲基化水平的试剂或装置,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,和
任选的(b)所述标志物或其经处理的核酸分子,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基,
其中,所述DNA序列包括以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2。
在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述标志物包含至少3个CpG二核苷酸。
在一个或多个实施方案中,所述DNA序列包括DNA正义链或反义链。
在一个或多个实施方案中,所述片段长度为1-1000bp,优选1-700bp。在一个或多个实施方案中,所述片段是基因的启动子区域。在一个或多个实施方案中,所述片段包含至少1个,优选至少3个CpG二核苷酸。优选地,所述标志物包含本申请所述的核酸分子的序列。
在一个或多个实施方案中,所述试剂盒适用于本申请任一实施方案所述的用途。
在一个或多个实施方案中,所述核酸分子是本申请所述的核酸分子。
在一个或多个实施方案中,所述试剂包含引物分子和/或探针分子。
在一个或多个实施方案中,所述试剂包含与所述DNA序列或其片段或它们的经转化的序列杂交的引物分子。所述引物分子能扩增出所述DNA序列或其片段或它们的经转化的变体。在一个或多个实施方案中,所述引物序列为甲基化特异的或非特异的。所述引物分子至少9bp。
在一个或多个实施方案中,所述试剂是与所述DNA序列或其片段或它们的经转化的序列杂交的探针分子。在一个或多个实施方案中,所述探针还含有可检测物。在一个或多个实 施方案中,所述可检测物是5’端荧光报告基团和3’端标记淬灭基团。在一个或多个实施方案中,所述荧光报告基因选自Cy5、FAM和VIC。所述探针分子至少12bp。
在一个或多个实施方案中,所述试剂包含本申请任一实施方案所述的介质。
在一个或多个实施方案中,所述试剂盒是非侵入性诊断试剂盒。
在一个或多个实施方案中,所述对象是哺乳动物,优选人。
在一个或多个实施方案中,所述样品来自哺乳动物的组织、细胞或体液,例如肠组织样本、血液、血清或血浆。所述哺乳动物优选为人。所述样品包括基因组DNA。优选地,所述样品是血液。
在一个或多个实施方案中,所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基。所述转化使用酶促方法进行,优选脱氨酶处理,或所述转化使用非酶促方法进行,优选用亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其组合处理。
在一个或多个实施方案中,所述试剂盒还包括PCR反应试剂。优选地,所述PCR反应试剂包括DNA聚合酶、PCR缓冲液、dNTP、Mg2+
在一个或多个实施方案中,所述试剂盒还包括检测DNA甲基化的试剂,所述试剂是选自以下方法的一个或多个中所用的试剂:基于重亚硫酸盐转化的PCR(例如甲基化特异性PCR)、DNA测序(如亚硫酸氢盐测序、全基因组甲基化测序、简化甲基化测序)、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱(例如飞行质谱)。优选地,所述试剂选自以下一种或多种:重亚硫酸盐及其衍生物、甲基化敏感或不敏感的限制性内切酶、酶切缓冲液、荧光染料、荧光淬灭剂、荧光报告剂、外切核酸酶、碱性磷酸酶、内标、对照物。
另一方面,本申请还提供了一种用于筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现以下步骤:
(1)获取对象的样品中标志物的甲基化水平,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括以下基因序列中的一个或多个或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、 ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
(2)比较步骤(1)中标记物的甲基化水平和相应的参考水平,
(3)根据比较结果筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后。
在一个或多个实施方案中,所述DNA序列包括以下基因序列:
在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
在一个或多个实施方案中,所述标志物包含至少3个CpG二核苷酸。
在一个或多个实施方案中,所述DNA序列包括DNA正义链或反义链。
在一个或多个实施方案中,所述片段长度为1-1000bp,优选1-700bp。在一个或多个实施方案中,所述片段是基因的启动子区域。在一个或多个实施方案中,所述片段包含至少1个,优选至少3个CpG二核苷酸。优选地,所述标志物具有本申请所述的核酸分子的序列。
在一个或多个实施方案中,步骤(1)之前还包含获取DNA的步骤,例如DNA抽提和/或质检。
在一个或多个实施方案中,步骤(1)包括使用本申请所述的引物分子、探针分子和/或介质,和任选的本申请所述的核酸分子,检测样品中所述序列的甲基化水平。在一个或多个实施方案中,所述检测包括但不限于:基于重亚硫酸盐转化的PCR、DNA测序(如亚硫酸氢盐测序、全基因组甲基化测序、简化甲基化测序)、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱(例如飞行质谱)。在一个或多个实施方案中,所述检测是DNA测序。优选地,所述DNA测序的测序深度至少10X,优选20X,更优选30X。
在一个或多个实施方案中,所述样品来自哺乳动物的组织、细胞或体液,例如肠组织样本、血液、血清或血浆。所述哺乳动物优选为人。在一个或多个实施方案中,所述样品包括基因组DNA。优选地,所述样品是血液。
在一个或多个实施方案中,所述DNA序列是:相应标志物在基因组中的序列、或其经转 化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。所述转化使用酶促方法进行,优选脱氨酶处理,或所述转化使用非酶促方法进行,优选用亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其组合处理。
在一个或多个实施方案中,步骤(2)中的比较包括:直接比较步骤(1)中标记物的甲基化水平和参考水平,或者通过计算得出评分并比较标记物的甲基化水平的评分和相应的参考评分。优选地,所述评分通过逻辑回归模型进行计算。
在一个或多个实施方案中,步骤(3)包括:当标记物的甲基化水平大于参考水平,或者甲基化水平的评分大于参考评分,则所述对象有形成结直肠癌的风险、患有结直肠癌或结直肠癌预后不良。
另一方面,本申请提供检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂在制备诊断胃癌的检测试剂或诊断试剂盒中的应用,以及用于确定一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的装置在制备诊断胃癌的诊断试剂盒中的应用;其中,所述一个或多个目标标志物选自以下序列(1)-(48)中的任意1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47条或全部48条序列:
(1)含chr6:166970625:166970825(SEQ ID NO:48)及其上游5kb以内和/或下游5kb以内的序列;
(2)含chr11:11600237:11600617(SEQ ID NO:49)及其上游5kb以内和/或下游5kb以内的序列;
(3)含chr17:76929754:76929954(SEQ ID NO:50)及其上游5kb以内和/或下游5kb以内的序列;
(4)含chr6:391738:391938(SEQ ID NO:51)及其上游5kb以内和/或下游5kb以内的序列;
(5)含chr12:2282090:2282290(SEQ ID NO:52)及其上游5kb以内和/或下游5kb以内的序列;
(6)含chr2:177030134:177030449(SEQ ID NO:53)及其上游5kb以内和/或下游5kb以内的序列;
(7)含chr7:35301095:35301411(SEQ ID NO:54)及其上游5kb以内和/或下游5kb以内的序列;
(8)含chr7:8482114:8482413(SEQ ID NO:55)及其上游5kb以内和/或下游5kb以内的序列;
(9)含chr2:72371208:72371433(SEQ ID NO:56)及其上游5kb以内和/或下游5kb以内的序列;
(10)含chr5:134364359:134364559(SEQ ID NO:57)及其上游5kb以内和/或下游5kb以内的序列;
(11)含chr10:118892523:118892723(SEQ ID NO:58)及其上游5kb以内和/或下游5kb以内的序列;
(12)含chr12:113901298:113901498(SEQ ID NO:59)及其上游5kb以内和/或下游5kb以内的序列;
(13)含chr8:143613755:143613955(SEQ ID NO:60)及其上游5kb以内和/或下游5kb以内的序列;
(14)含chr8:20375580:20375780(SEQ ID NO:61)及其上游5kb以内和/或下游5kb以内的序列;
(15)含chr7:107499318:107499518(SEQ ID NO:62)及其上游5kb以内和/或下游5kb以内的序列;
(16)含chr6:1378941:1379141(SEQ ID NO:63)及其上游5kb以内和/或下游5kb以内的序列;
(17)含chr15:34786976:34787337(SEQ ID NO:64)及其上游5kb以内和/或下游5kb以内的序列;
(18)含chr1:156405314:156405514(SEQ ID NO:65)及其上游5kb以内和/或下游5kb以内的序列;
(19)含chr8:10588811:10589173(SEQ ID NO:66)及其上游5kb以内和/或下游5kb以内的序列;
(20)含chr4:85418610:85418919(SEQ ID NO:67)及其上游5kb以内和/或下游5kb以内的序列;
(21)含chr5:140871317:140871517(SEQ ID NO:68)及其上游5kb以内和/或下游5kb以内的序列;
(22)含chr5:92906255:92906617(SEQ ID NO:69)及其上游5kb以内和/或下游5kb以内的序列;
(23)含chr14:57265398:57265598(SEQ ID NO:70)及其上游5kb以内和/或下游5kb以内的序列;
(24)含chr19:19650947:19651147(SEQ ID NO:71)及其上游5kb以内和/或下游5kb以内的序列;
(25)含chr11:20618486:20618686(SEQ ID NO:72)及其上游5kb以内和/或下游5kb以内的序列;
(26)含chr7:73407894:73408161(SEQ ID NO:73)及其上游5kb以内和/或下游5kb以内的序列;
(27)含chr16:82660460:82660774(SEQ ID NO:74)及其上游5kb以内和/或下游5kb以内的序列;
(28)含chr13:24844736:24844936(SEQ ID NO:75)及其上游5kb以内和/或下游5kb以内的序列;
(29)含chr20:55500358:55500677(SEQ ID NO:76)及其上游5kb以内和/或下游5kb以内的序列;
(30)含chr10:123923943:123924143(SEQ ID NO:77)及其上游5kb以内和/或下游5kb以内的序列;
(31)含chr20:59827678:59827907(SEQ ID NO:78)及其上游5kb以内和/或下游5kb以内的序列;
(32)含chr20:62330559:62330808(SEQ ID NO:79)及其上游5kb以内和/或下游5kb以内的序列;
(33)含chr19:13209774:13209974(SEQ ID NO:80)及其上游5kb以内和/或下游5kb以内的序列;
(34)含chr16:2085778:2086156(SEQ ID NO:81)及其上游5kb以内和/或下游5kb以内的序列;
(35)含chr6:108488634:108488917(SEQ ID NO:82)及其上游5kb以内和/或下游5kb以内的序列;
(36)含chr12:115124911:115125191(SEQ ID NO:83)及其上游5kb以内和/或下游5kb以内的序列;
(37)含chr10:124896740:124897020(SEQ ID NO:84)及其上游5kb以内和/或下游5kb以内的序列;
(38)含chr14:55243006:55243206(SEQ ID NO:85)及其上游5kb以内和/或下游5kb以内的序列;
(39)含chr13:36729096:36729334(SEQ ID NO:86)及其上游5kb以内和/或下游5kb以内的序列;
(40)含chr2:10444997:10445197(SEQ ID NO:87)及其上游5kb以内和/或下游5kb以内的序列;
(41)含chr9:2157701:2157901(SEQ ID NO:88)及其上游5kb以内和/或下游5kb以内的序列;
(42)含chr12:57529619:57529819(SEQ ID NO:89)及其上游5kb以内和/或下游5kb以内的序列;
(43)含chr1:119527250:119527450(SEQ ID NO:90)及其上游5kb以内和/或下游5kb以内的序列;
(44)含chr1:119532788:119532988(SEQ ID NO:91)及其上游5kb以内和/或下游5kb以内的序列;
(45)含chr15:96909441:96909641(SEQ ID NO:92)及其上游5kb以内和/或下游5kb以内的序列;
(46)含chr1:146551463:146551747(SEQ ID NO:93)及其上游5kb以内和/或下游5kb以内的序列;
(47)含chr17:35293755:35293955(SEQ ID NO:94)或其上下游各5kb以内的序列;和
(48)含chr17:59482763:59482963(SEQ ID NO:95)或其上下游各5kb以内的序列。
在一个或多个实施方案中,所述一个或多个胃癌目标标志物包括所述第(3)、(8)、(13)、(15)、(17)、(19)、(22)、(25)、(29)、(31)、(37)、(38)、(40)、(41)、(42)、(43)、(45)、(47)和(48)项所述的序列。
在一个或多个实施方案中,所述一个或多个胃癌目标标志物包括所述第(2)、(6)、(7)、(8)、(12)、(15)、(19)、(25)、(28)、(32)、(33)、(36)、(37)、(40)、(42)、(43)、(44)、(46)和(48)项所述的序列。
在一个或多个实施方案中,所述一个或多个胃癌目标标志物包括所述第(3)、(13)、(14)、(20)、(22)、(28)、(30)和(36)项所述的序列;或
在一个或多个实施方案中,所述一个或多个胃癌目标标志物包括所述第(3)、(13)、(27)、 (30)和(35)项所述的序列。
在一个或多个实施方案中,所述一个或多个胃癌目标标志物包括所述第(7)、(14)、(22)、(26)、(35)、(38)、(40)、(43)、(47)和(48)项所述的序列。
在一个或多个实施方案中,所述一个或多个胃癌目标标志物选自所述第(7)、(14)、(22)、(26)、(35)、(38)、(40)、(43)、(47)和(48)项中任意1、2、3、4、5、6、7、8或9项所述的序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(40)项所述序列,以及第(1)-(39)和(41)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(47)项所述序列,以及第(1)-(46)和(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(43)项所述序列,以及第(1)-(42)和(44)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(26)项所述序列,以及第(1)-(25)和(27)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(35)项所述序列,以及第(1)-(34)和(36)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(14)项所述序列,以及第(1)-(13)和(15)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(38)项所述序列,以及第(1)-(37)和(39)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(22)项所述序列,以及第(1)-(21)和(23)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(7)项所述序列,以及第(1)-(6)和(8)-(48)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括第(48)项所述序列,以及第(1)-(47)中的任意一条或多条序列。
在一个或多个实施方案中,所述胃癌目标标志物包括所述SEQ ID NO:48-95中任一项序列各起始位点的上游1kb以内、优选500bp以内、更优选300bp以内、更优选100bp以内的序列和/或各末端位点的下游1kb以内、优选500bp以内、优选300bp以内、优选100bp以内的序列;优选地,所述目标标志物是含有所述SEQ ID NO:48-95任一序列且长度为400bp以 内的基因序列。
在一个或多个实施方案中,所述第(1)到第(48)项所述的序列分别是SEQ ID NO:48-95所示的序列。
另一方面,本申请提供检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂在制备诊断食管癌的检测试剂或诊断试剂盒中的应用,以及用于确定一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的装置在制备诊断食管癌的诊断试剂盒中的应用;其中,所述一个或多个食管癌目标标志物选自SEQ ID NO:96-138中的任意1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42或全部43条序列,以及其上游5kb以内和/或下游5kb以内的序列。在一个或多个实施方案中,所述目标标志物包括所述SEQ ID NO:96-138中任一项序列各起始位点的上游1kb以内、优选500bp以内、更优选300bp以内、更优选100bp以内的序列和/或各末端位点的下游1kb以内、优选500bp以内、优选300bp以内、优选100bp以内的序列;优选地,所述目标标志物是含有所述SEQ ID NO:96-138任一序列且长度为400bp以内的基因序列。
另一方面,本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自下表1的染色体范围编号1至43的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量:
表1


另一方面,本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
另一方面,本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:105上游或下游5k bp以内的区域以及下表2基因编号为1至76的基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。
表2

另一方面,本申请提供了一种核酸,所述核酸包含能够结合选自本申请上表1的染色体范围编号1至43的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
另一方面,本申请提供了一种核酸,所述核酸包含能够结合选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
另一方面,本申请提供了一种核酸,所述核酸包含能够结合选自SEQ ID NO:105上游或下游5k bp以内的区域以及本申请上表2的基因编号为1至76的基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。
另一方面,本申请提供了一种试剂盒,包含本申请所述的核酸。
另一方面,本申请提供了一种制备核酸的方法,包含根据选自本申请上述的染色体范围编号1至43的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
另一方面,本申请提供了一种制备核酸的方法,包含根据选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
另一方面,本申请提供了一种制备核酸的方法,包含根据选自SEQ ID NO:105上游或下游5k bp以内的区域以及本申请上表2的基因编号为1至76的基因所在的DNA区域、或其片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
另一方面,本申请提供了用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估食管癌的存在和/或进展的物质中的应用,所述用于确定的DNA区域包含选自本申请上表1的染色体范围编号1至43的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
另一方面,本申请提供了用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估食管癌的存在和/或进展的物质中的应用,所述用于确定的DNA区域包含选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
另一方面,本申请提供了用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估食管癌的存在和/或进展的物质中的应用,所述用于确定的DNA区域包含选自SEQ ID NO:105上游或下游5k bp以内的区域以及本申请上表2的基因编号为1至76的基因所在的DNA区域、或其片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
另一方面,本申请提供检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂在制备诊断肝癌的检测试剂或诊断试剂盒中的应用,以及用于确定一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的装置在制备诊断肝癌的诊断试剂盒中的应用;其中,所述一个或多个肝癌目标标志物选自SEQ ID NO:139-340中的任意 1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47、48、49、50、51、52、53、54、55、56、57、58、59、60、61、62、63、64、65、66、67、68、69、70、71、72、73、74、75、76、77、78、79、80、81、82、83、84、85、86、87、88、89、90、91、92、93、94、95、96、97、98、99、100、101、102、103、104、105、106、107、108、109、110、111、112、113、114、115、116、117、118、119、120、121、122、123、124、125、126、127、128、129、130、131、132、133、134、135、136、137、138、139、140、141、142、143、144、145、146、147、148、149、150、151、152、153、154、155、156、157、158、159、160、161、162、163、164、165、166、167、168、169、170、171、172、173、174、175、176、177、178、179、180、181、182、183、184、185、186、187、188、189、190、191、192、193、194、195、196、197、198、199、200、201或全部202条序列,以及其上游5kb以内和/或下游5kb以内的序列。在一个或多个实施方案中,所述目标标志物包括所述SEQ ID NO:139-340中任一项序列各起始位点的上游1kb以内、优选500bp以内、更优选300bp以内、更优选100bp以内的序列和/或各末端位点的下游1kb以内、优选500bp以内、优选300bp以内、优选100bp以内的序列;优选地,所述目标标志物是含有所述SEQ ID NO:139-340任一序列且长度为400bp以内的基因序列。
另一方面,本申请提供了一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自下表3染色体范围编号44至245上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
表3







本申请提供了一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
本申请提供了一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自表4基因编号为77至354的基因所在上游或下游5k bp以内的DNA区域、或其片段的修饰状态的存在和/或含量。
表4





本申请提供了一种核酸,所述核酸包含能够结合选自上表3的染色体范围编号44至245上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
本申请提供了一种核酸,所述核酸包含能够结合选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
本申请提供了一种核酸,所述核酸包含能够结合选自上表4基因编号为77至354的基因 所在上游或下游5k bp以内的DNA区域、或其片段的修饰状态的存在和/或含量。
本申请提供了一种试剂盒,包含本申请所述的核酸。
本申请提供了一种制备核酸的方法,包含根据选自上表3的染色体范围编号44至245上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
本申请提供了一种制备核酸的方法,包含根据选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
本申请提供了一种制备核酸的方法,包含根据选自上表4基因编号为77至354的基因所在上游或下游5k bp以内的DNA区域、或其片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
本申请提供了用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估肝癌的存在和/或进展的物质中的应用,所述用于确定上游或下游5k bp以内的DNA区域包含选自上表3的染色体范围编号44至245上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
本申请提供了用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估肝癌的存在和/或进展的物质中的应用,所述用于确定上游或下游5k bp以内的DNA区域包含选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
本申请提供了用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估肝癌的存在和/或进展的物质中的应用,所述用于确定上游或下游5k bp以内的DNA区域包含选自本申请基因编号为77至354的基因所在上游或下游5k bp以内的DNA区域、或其片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
在一个或多个实施方案中,所述试剂包括引物和/或探针分子;优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交。
在一个或多个实施方案中,所述试剂为实施基因组简化甲基化测序技术所需的试剂。
另一方面,本申请还提供用于检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或甲基化水平以诊断癌症的诊断试剂或诊断试剂盒,其包含用于检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂;其中,所述一个或多个目标标志物如上述所述。
在一个或多个实施方案中,所述诊断试剂或诊断试剂盒包括引物和/或探针分子;优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交。
在一个或多个实施方案中,所述诊断试剂或诊断试剂盒还包括检测内参基因ACTB的引物分子和/或探针分子。
在一个或多个实施方案中,所述诊断试剂或诊断试剂盒还包括选自以下的一种或多种物质:PCR缓冲液、聚合酶、dNTP、限制性内切酶、酶切缓冲液、荧光染料、荧光淬灭剂、荧光报告剂、外切核酸酶、碱性磷酸酶、内标、对照物、KCl、MgCl2和(NH4)2SO4
在一个或多个实施方案中,所述试剂还包括下述一个或多个方法中所用的试剂:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析和质谱。
在一个或多个实施方案中,所述试剂选自以下一种或多种:重亚硫酸盐及其衍生物、荧光染料、荧光淬灭剂、荧光报告剂、内标和对照物。
另一方面,本申请还提供区分基因组DNA至少一个靶区域内甲基化和未甲基化CpG二核苷酸的至少一种试剂或成组试剂在制备用于检测和/或分类个体中癌症的方法的试剂盒中的用途,其中所述方法包括使从所述个体生物样品中分离的基因组DNA与所述至少一种试剂或成组试剂接触,其中所述靶区域相同于、等同于或互补于一个或多个目标标志物的至少16个连续核苷酸的序列,其中所述连续核苷酸包含至少一个CpG二核苷酸序列,由此至少部分地提供对癌症的检测和/或分类,其中,所述一个或多个目标标志物如上述所述。
另一方面,本申请还提供将5位未甲基化的胞嘧啶碱基转化为尿嘧啶或在杂交性能方面可检测地不同于胞嘧啶的其它碱基的一种或多种试剂、扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中胃癌的方法的试剂盒中的用途,其中所述方法包括:
a)从所述个体生物样品分离基因组DNA;
b)用所述一种或多种试剂处理a)的所述基因组DNA或其片段;
c)使所述经处理的基因组DNA或其经处理的片段与所述扩增酶和所述至少一种引物接触,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物,其中所述经处理的基因组DNA或其片段被扩增以产生至少一种扩增产物或不被扩增;以及
d)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,或者反映所述一个或多个目标标志物的多个CpG二核苷酸平均甲基化状态或水平的均值或值,由此至少部分地检测和/或分类个体中的癌症;
其中,所述一个或多个目标标志物如上述所述。
在一个或多个实施方案中,其中步骤b)中,使用选自亚硫酸氢盐、酸式亚硫酸盐、焦亚硫酸盐及其组合的试剂处理所述基因组DNA或其片段。
在一个或多个实施方案中,其中c)中,通过使用耐热DNA聚合酶作为所述扩增酶、使用缺乏5’-3’外切酶活性的聚合酶、使用聚合酶链式反应和/或产生带有可检测标记的扩增产物进行核酸分子的接触或扩增。
在一个或多个实施方案中,其中c)中的接触或扩增包括使用甲基化特异的引物。
另一方面,本申请还提供一种或多种甲基化敏感限制酶和扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中癌症的方法的试剂盒中的用途,其中,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物;所述方法包括:
a)从所述个体生物样品分离基因组DNA;
b)以所述一种或多种甲基化敏感限制酶消化a)所述的基因组DNA或其片段,使所得消化产物与所述扩增酶和所述至少一种引物接触;和
c)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,由此至少部分地检测和/或分类个体中的癌症;
其中,所述一个或多个目标标志物如上述所述。
在一个或多个实施方案中,通过杂交至少一种核酸或肽核酸来确定扩增产物的存在与否,所述至少一种核酸或肽核酸等同于或互补于选自所述一个或多个目标标志物的序列的至少16碱基长片段。
另一方面,本申请还提供一种在个体中检测和/或分类个体中癌症的方法,所述方法包括如下步骤:
a)从所述个体生物样品分离基因组DNA;
b)b1)用一种或多种试剂处理a)的所述基因组DNA或其片段,所述一种或多种试剂能将5位未甲基化的胞嘧啶碱基转化为尿嘧啶或在杂交性能方面可检测地不同于胞嘧啶的其 它碱基;或b2)以一种或多种甲基化敏感限制酶消化a)所述的基因组DNA或其片段,
c)使b)所得处理产物或消化产物与扩增酶和至少一种包含至少9个连续核苷酸的引物接触,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物,其中所述处理产物或消化产物被扩增以产生至少一种扩增产物或不被扩增;和
d)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,或者反映所述一个或多个目标标志物的多个CpG二核苷酸平均甲基化状态或水平的均值或值,由此至少部分地检测和/或分类个体中的癌症;
其中,所述一个或多个目标标志物如上述所述。
在一个或多个实施方案中,其中步骤b1)中,使用选自亚硫酸氢盐、酸式亚硫酸盐、焦亚硫酸盐及其组合的试剂处理所述基因组DNA或其片段。
在一个或多个实施方案中,其中c)中,通过使用耐热DNA聚合酶作为所述扩增酶、使用缺乏5’-3’外切酶活性的聚合酶、使用聚合酶链式反应和/或产生带有可检测标记的扩增产物来进行核酸分子的接触或扩增。
在一个或多个实施方案中,其中c)中的接触或扩增包括使用甲基化特异的引物。
在一个或多个实施方案中,通过杂交至少一种核酸或肽核酸来确定扩增产物的存在与否,所述至少一种核酸或肽核酸相同于、等同于或互补于选自所述一个或多个目标标志物的序列的至少16碱基长片段。
另一方面,本申请还提供衍生自一个或多个目标标志物的经处理的核酸在制备用于诊断癌症的试剂盒中的用途,其中所述处理适合于将所述一个或多个目标标志物的至少一个未甲基化的胞嘧啶碱基转化至尿嘧啶或在杂交上可检测地不同于胞嘧啶的其它碱基,所述一个或多个目标标志物如上述所述。
另一方面,本申请还提供用于检测并诊断个体癌症的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤:(1)获取样品中一个或多个目标标志物的至少一个CpG二核苷酸的甲基化水平或甲基化状态,和(2)根据(1)的甲基化水平或甲基化状态判读癌症;其中,所述一个或多个目标标志物如上述所述。
另一方面,本申请提供了一种储存介质,其记载可以运行本申请所述的方法的程序。
另一方面,本申请提供了一种设备,其包含本申请所述的储存介质,以及任选地还包含耦接至所述储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现本申请所述的方法。
本领域技术人员能够从下文的详细描述中容易地洞察到本申请的其它方面和优势。下文的详细描述中仅显示和描述了本申请的示例性实施方式。如本领域技术人员将认识到的,本申请的内容使得本领域技术人员能够对所公开的具体实施方式进行改动而不脱离本申请所涉及发明的精神和范围。相应地,本申请的附图和说明书中的描述仅仅是示例性的,而非为限制性的。
附图说明
本申请所涉及的发明的具体特征如所附权利要求书所显示。通过参考下文中详细描述的示例性实施方式和附图能够更好地理解本申请所涉及发明的特点和优势。对附图简要说明书如下:
图1显示的是筛选结直肠癌甲基化标志物流程。
图2显示的是训练集结直肠癌和非结直肠癌样品甲基化水平分布。
图3显示的是测试集结直肠癌和非结直肠癌样品甲基化水平分布。
图4显示的是结直肠癌ALLMODEL预测分值分布图。
图5显示的是结直肠癌ALLMODEL ROC曲线图。
图6显示的是结直肠癌SUBMODEL1预测分值分布图。
图7显示的是结直肠癌SUBMODEL1 ROC曲线图。
图8显示的是结直肠癌SUBMODEL2预测分值分布图。
图9显示的是结直肠癌SUBMODEL2 ROC曲线图。
图10显示的是结直肠癌SUBMODEL3预测分值分布图。
图11显示的是结直肠癌SUBMODEL3 ROC曲线图。
图12显示的是胃癌单个甲基化标志物性能判别流程图。
图13显示的是胃癌所有目标标志物构建的模型在训练集和测试集样本中的模型预测分值分布图。
图14显示的是胃癌所有目标标志物构建的模型在训练集和测试集样本中诊断胃癌的ROC曲线图。
图15显示的是胃癌标志物构建的模型在训练集和测试集样本中的模型预测分值分布图。
图16显示的是胃癌标志物构建的模型在训练集和测试集样本中诊断胃癌的ROC曲线。
图17显示的是胃癌标志物构建的模型在训练集和测试集样本中的模型预测分值分布图。
图18显示的是胃癌标志物构建的模型在训练集和测试集样本中诊断胃癌的ROC曲线。
图19显示的是胃癌标志物构建的模型在训练集和测试集样本中的模型预测分值分布图。
图20显示的是胃癌标志物构建的模型在训练集和测试集样本中诊断胃癌的ROC曲线。
图21显示的是胃癌标志物构建的模型在训练集和测试集样本中的模型预测分值分布图。
图22显示的是胃癌标志物构建的模型在训练集和测试集样本中诊断胃癌的ROC曲线。
图23显示的是预测模型诊断食管癌的ROC曲线。
图24显示的是食管癌预测模型在各组的预测得分分布。
图25显示的是16个食管癌甲基化标志物组合的预测模型诊断食管癌的ROC曲线。
图26显示的是16个食管癌甲基化标志物组合的预测模型在各组的预测得分分布。
图27显示的是16个食管癌甲基化标志物组合的预测模型诊断食管癌的ROC曲线。
图28显示的是16个食管癌甲基化标志物组合的预测模型在各组的预测得分分布。
图29显示的是预测模型诊断食管癌的ROC曲线。
图30显示的是食管癌预测模型在各组的预测得分分布。
图31显示的是7个食管癌甲基化标志物组合的预测模型诊断食管癌的ROC曲线。
图32显示的是7个食管癌甲基化标志物组合的预测模型在各组的预测得分分布。
图33显示的是7个食管癌甲基化标志物组合的预测模型诊断食管癌的ROC曲线。
图34显示的是7个食管癌甲基化标志物组合的预测模型在各组的预测得分分布。
图35显示的是预测模型诊断食管癌的ROC曲线。
图36显示的是食管癌预测模型在各组的预测得分分布。
图37显示的是17个食管癌甲基化标志物组合的预测模型诊断食管癌的ROC曲线。
图38显示的是17个食管癌甲基化标志物组合的预测模型在各组的预测得分分布。
图39显示的是15个食管癌甲基化标志物组合的预测模型诊断食管癌的ROC曲线。
图40显示的是15个食管癌甲基化标志物组合的预测模型在各组的预测得分分布。
图41显示的是预测模型诊断肝癌的ROC曲线。
图42显示的是肝癌预测模型在各组的预测得分分布。
图43显示的是25个肝癌甲基化标志物组合的预测模型诊断肝癌的ROC曲线。
图44显示的是25个肝癌甲基化标志物组合的预测模型在各组的预测得分分布。
图45显示的是52个肝癌甲基化标志物组合的预测模型诊断肝癌的ROC曲线。
图46显示的是52个肝癌甲基化标志物组合的预测模型在各组的预测得分分布。
具体实施方式
以下由特定的具体实施例说明本申请发明的实施方式,熟悉此技术的人士可由本说明书 所公开的内容容易地了解本申请发明的其他优点及效果。
术语定义
需注意的是,在本申请的说明书和权利要求书中,单数形式的“一个”、“一种”和“所述”均包括其复数形式,除非上下文另有说明。因此,例如,“一种试剂”包括多种试剂。
在本申请的说明书和权利要求书,除非另有说明,否则术语“包含”、“包括”或“含有”是指含有所列出的数值、步骤或成分,但也不排除还含有其他数值、步骤或成分。
如本申请所用,术语“甲基化标志物”是指这样的目的核酸或基因区域、甲基化位点:其甲基化水平或基于甲基化水平的计算模型的得分指示癌症状态。如本文所用,术语“目标标志物”是指这样的目的核酸或基因区域:其甲基化水平指示着对象是否患有癌症。术语“甲基化标志物”或“目标标志物”应被认为包括其所有转录变体及其所有启动子和调控元件。如本领域技术人员所理解的,已知某些基因在个体之间表现出等位基因变异或单核苷酸多态性(“SNP”)。SNP包括不同长度的简单的重复序列(例如二核苷酸和三核苷酸重复)的插入和缺失。因此,本申请应被理解为扩展到由任何其他突变、多态性或等位基因变异产生的标志物/基因的所有形式。另外,应当理解,术语“甲基化标志物”应既包括标志物或基因的正义链序列,也包括标志物或基因的反义链序列。
本申请所用的术语“甲基化标志物”或“目标标志物”被宽泛地解释为既包括1)在生物样品或基因组DNA中发现的原始标志物(处于特定的甲基化),也包括2)其经过处理的序列(例如亚硫酸氢盐转化后的对应区域或甲基化敏感的限制性内切酶MSRE处理后的对应区域)。亚硫酸氢盐转化后的对应区域与基因组序列中的目标标志物不同之处在于,一个或多个未甲基化的胞嘧啶残基被转化为尿嘧啶碱基、胸腺嘧啶碱基或在杂交行为上与胞嘧啶不同的其他碱基。经MSRE处理的对应区域与基因组序列中的目标标志物不同之处在于,该序列在一个或多个MSRE切割位点处被切割。本申请的甲基化标志物或目标标志物还包括非酶促法转化(如亚硫酸氢盐转化后的对应区域),以及酶促法转化(如MSRE转化)后获得的对应区域。
在一些实施方式中,本申请的目标标志物也包括上述各基因的各类变体。变体包括来自相同区域的、与本文所述的基因或区域具有至少90%、91%、92%、93%、94%、95%、96%、97%、98%、99%的序列同一性(即,具有一个或多个缺失、插入、取代、反向序列等)的核酸序列。因此,本申请内容应理解为延伸至实现相同结果的此类变体,尽管事实上个体间的实际核酸序列具有微小的遗传变异。
如本文所用,术语“序列同一性的百分比(%)”是指候选序列的氨基酸(或核酸)残基和 参考序列的氨基酸(或核酸)残基进行序列比对后的相同百分比,比对时可以引入间隔(如有必要)以使得相同的氨基酸(或核酸)数目达到最多。换言之,氨基酸序列(或核酸序列)的序列同一性百分比(%)可以通过用与参考序列相同的氨基酸残基(或碱基)的数目除以候选序列或参考序列中氨基酸残基(或碱基)的总数(以较短者为准)来计算。氨基酸残基的保守取代可以被认为或可以不被认为是相同的残基。可以通过以下方式来确定氨基酸(或核酸)序列同一性的百分比,例如,可以使用公开的工具如BLASTN、BLASTp(可在美国国家生物技术信息中心(NCBI)的网站上获得,也可参见Altschul S.F.et al.,J.Mol.Biol.,215:403–410(1990);Stephen F.et al.,Nucleic Acids Res.,25:3389–3402(1997))、ClustalW2(可在欧洲生物信息研究所的网站上找到),也可参见Higgins D.G.et al.,Methods in Enzymology,266:383-402(1996);Larkin M.A.et al.,Bioinformatics(Oxford,England),23(21):2947-8(2007))和ALIGN或Megalign(DNASTAR)软件。本领域技术人员可以使用所述工具提供的默认参数,或者可以(例如,通过选择合适的算法)定制适合比对的参数。
本申请的甲基化标志物或目标标志物也包括上述基因的起始位点上游5kb和末端位点下游5kb经非酶促法转化(如亚硫酸氢盐转化)后的对应区域或经酶促方法处理(如甲基化敏感限制酶处理)后的对应区域。
本申请所述“甲基化水平”指所涉CpG位点的甲基化水平或所涉序列中多个或所有CpG位点的平均甲基化水平。本发明的示例性实施方案中,位点的甲基化水平通常是指该位点甲基化C的百分比,如果该CpG位点所有C都是未甲基化的,其甲基化水平就为零。甲基化水平还可以是其他类型的计算结果,这在本领域技术人员的知识范围内。此外,序列的甲基化水平上升或下降并不表示区域中所有CpG位点的甲基化水平都上升或下降。本领域知晓将检测DNA甲基化的方法(例如简化甲基化测序)所得结果转化为甲基化水平的过程。例如,根据每个基因启动子区检测到的CpG位点的甲基化水平,计算平均甲基化,将其作为该基因启动子区DNA甲基化水平。在一些实施方案中,通过MethylTitan(CN201910515830,鹍远)甲基化测序方法获得甲基化水平。甲基化水平可经标准化。
本申请所述“甲基化信息”包括与序列中可能被甲基化的胞嘧啶相关的特征信息。所述可能被甲基化的胞嘧啶通常是CpG中的C。此类特征包括但不限于:序列内的任何胞嘧啶(C)残基是否为甲基化的,一个或多个甲基化位点(如CpG二核苷酸)的位置和/或其甲基化水平,核酸的任何特定区域的甲基化水平、甲基化C的频率或百分比、甲基化C或未甲基化C的相对浓度、绝对浓度或模式、甲基化单倍型比值(MHL)、平均甲基化水平(AMF),以及由于例如等位基因起点的差异而导致的甲基化等位基因差异。例如,如果核酸序列内的一个或多个胞嘧啶(C)残基是甲基化的,则其可称为“超甲基化”或具有“增加的甲基化”,而如果DNA序列内的一个或多个胞嘧啶(C)残基是未甲基化的,则其可称为“去甲基化”或具有“减 少的甲基化”。
可对所测基因的甲基化水平进行数学分析,获得评分。术语“甲基化评分”表示使用数学方法(例如数学模型)对甲基化水平进行计算获得的数值。对于检测的样品而言,当评分大于阈值,则判定结果为阳性,即为癌症或具有患癌症风险或癌症预后不良,否则为阴性。本领域知晓常规数学分析的方法以及确定阈值的过程,示例性的方法是数学模型,包括但不限于回归模型、支持向量机、随机森林等。例如,对于差异甲基化标志物,对训练组样本构建支持向量机(SVM),利用模型统计检测结果的准确率,敏感性和特异性以及预测值特征曲线(ROC)下面积(AUC),统计测试集样本预测得分。又如,对于差异甲基化标志物的甲基化水平构建逻辑回归(Logistic Regression),利用模型统计检测结果的准确率,敏感性和特异性以及预测值特征曲线(ROC)下面积(AUC),统计测试集样本预测得分。
本申请所用的术语“对象”或“个体”包括人类和非人类的动物。非人类动物包括所有脊椎动物,例如哺乳动物和非哺乳动物。在一些实施方式中,对象是人类。
本申请中,术语“基因”包括所涉基因的在基因组上的编码序列和非编码序列。其中非编码序列包括内含子、启动子和调节元件或序列等。
本发明中的分子诊断,除了癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)的早期诊断,还包括癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)晚期诊断,且也包括癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)筛选、风险评估、预后、疾病识别。早期诊断指的是在发生和/或转移之前发现癌症的可能性,优选在可观察到组织或者细胞的形态学变化之前。
本申请术语“变体”或“突变体”是指与参照序列相比,通过一个或多个核苷酸的插入、缺失或取代使核酸序列发生变化同时保留其与其他核酸杂交能力的多核苷酸。本申请任一实施方案所述的突变体包括与参照序列具有至少70%,优选至少80%,优选至少85%,优选至少90%,优选至少95%,优选至少97%的序列相同性并保留参照序列的生物学活性的核苷酸序列。可采用例如NCBI的BLASTn计算两条比对的序列之间的序列相同性。突变体还包括在参照序列的和核苷酸序列中具有一个或多个突变(插入、缺失或取代)、同时仍保留参照序列生物学活性的核苷酸序列。所述多个突变通常指1-10个以内,例如1-8个、1-5个或1-3个。取代可以是嘌呤核苷酸与嘧啶核苷酸之间的取代,也可以是嘌呤核苷酸之间或嘧啶核苷酸之间的取代。取代优选是保守性取代。例如,在本领域中,用性能相近或相似的核苷酸进行保守性取代时,通常不会改变多核苷酸的稳定性和功能。保守性取代例如嘌呤核苷酸之间的(A与G)的互换,嘧啶核苷酸之间的(T或U与C)的互换。因此,在本发明多核苷酸中用来自同一残基替换一个或几个位点,将不会在实质上影响其活性。此外,本发明的变体中的甲基化位点(例如连续的CG)未发生突变。即本发明方法检测的是相应序列中的可甲基化位点的甲基化情况,对于非可甲基化位点的碱基可以发生突变。通常,甲基化位点是连续的CpG二核 苷酸。
如本申请所述,DNA或RNA的碱基可发生转化。本申请所述“转化”、“胞嘧啶转化”或“CT转化”是利用非酶促或酶促方法处理DNA,将未修饰的胞嘧啶碱基(cytosine,C)转化为与鸟嘌呤结合能力低于胞嘧啶的碱基(例如尿嘧啶碱基(uracil,U))的过程。本领域周知进行胞嘧啶转化的非酶促或酶促方法。示例性地,非酶促方法包括使用转化试剂例如亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐处理,例如亚硫酸氢钙、亚硫酸氢钠、亚硫酸氢钾、亚硫酸氢铵、亚硫酸氢镁、亚硫酸氢铝、亚硫酸氢根离子、重硫酸钠、重硫酸钾和重硫酸铵,及其任意组合。示例性地,酶促方法包括脱氨酶处理。经转化的DNA任选经纯化。适用于本申请的DNA纯化方法本领域周知。
本发明中的“诊断”,除了结直肠癌的早期诊断,还包括结直肠癌晚期诊断,且也包括结直肠癌筛选、风险评估、预后、疾病识别。早期诊断指的是在转移之前发现癌症的可能性,优选在可观察到组织或者细胞的形态学变化之前。
发明详述
I标志物
a)结直肠癌甲基化标志物
发明人经过研究,从大量基因中筛选出47个基因,发现这些基因(例如启动子区域)的甲基化水平与结直肠癌的性质有关:TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2。本发明提供了对样品(特别是血液)的上述基因进行甲基化检测,基于其甲基化水平利用数学模型分辨结直肠癌,实现结直肠癌非侵入性精准诊断的目的。
因此,本申请中,结直肠癌的甲基化标志物包括DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括上述基因序列中的1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46或47种,例如至少6个、至少7个、至少8个或至少9个。在一个或多个实施方案中,所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的1、2、3、4、5、6、7或8种,任选还包括(p)中的其他基因序列中的一种或多种或全部。在一个或多个实施方案中,所述DNA序列包括选自TTLL10、ACTR3B、 BARX1、CUX2、DNM2和SIM2的1、2、3、4、5或6种,任选还包括(p)中的其他基因序列中的一种或多种或全部。在一个或多个实施方案中,所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的1、2、3、4、5、6或7种,任选还包括(p)中的其他基因序列中的一种或多种或全部。本发明提供这些标志物及其检测试剂在筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后中的用途和方法。本申请中所使用的术语“结直肠癌”具有本领域通常的含义,包括存在于结肠、直肠和/或阑尾的肿瘤。
在一个或多个实施方案中,结直肠癌的性质与上述基因的片段的甲基化有关。这样的片段可以来自一种或多种所述基因序列。所述片段的长度为1bp-1kb,优选1bp-700bp;所述片段包含相应基因的染色体区域中的一个或多个甲基化位点。所述片段例如是上述基因的启动子区域。通常,转录起始位点(Transcription Start Sites,TSS)上游1k bp、下游200bp的DNA序列界定为启动子区。如果一个基因有多个转录本(即有多个启动子区),则可选择其中任意启动子区。在一些实施方案中,检测的片段含有至少3个CpG二核苷酸。因此,进一步地,结直肠癌的性质与表5所示的各基因的SEQ ID NO:1-47所示的片段的甲基化水平相关。
本申请所述“结直肠癌相关序列”包括上述47个基因中任意、其上游或下游20kb以内(优选5kb以内)的序列、或它们的片段、或上述47个序列(SEQ ID NO:1-47)或其互补序列的任意组合。在公共数据库(例如NCBI网站)中可以获得上述基因在Hg19基因组中的序列,以及各基因上游或下游20kb的序列。
上述基因在人染色体中的位置如下表5所示,其中碱基编号对应于参考基因组HG19:
表5甲基化标志物基因及位置

b)胃癌甲基化标志物
本申请所述的胃癌甲基化标志物选自下组基因序列(Hg19坐标)中的任意1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47个或全部48个:
含chr6:166970625:166970825(SEQ ID NO:48)及其上游5kb以内和/或下游5kb以内的序列;
含chr11:11600237:11600617(SEQ ID NO:49)及其上游5kb以内和/或下游5kb以内 的序列;
含chr17:76929754:76929954(SEQ ID NO:50)及其上游5kb以内和/或下游5kb以内的序列;
含chr6:391738:391938(SEQ ID NO:51)及其上游5kb以内和/或下游5kb以内的序列;
含chr12:2282090:2282290(SEQ ID NO:52)及其上游5kb以内和/或下游5kb以内的序列;
含chr2:177030134:177030449(SEQ ID NO:53)及其上游5kb以内和/或下游5kb以内的序列;
含chr7:35301095:35301411(SEQ ID NO:54)及其上游5kb以内和/或下游5kb以内的序列;
含chr7:8482114:8482413(SEQ ID NO:55)及其上游5kb以内和/或下游5kb以内的序列;
含chr2:72371208:72371433(SEQ ID NO:56)及其上游5kb以内和/或下游5kb以内的序列;
含chr5:134364359:134364559(SEQ ID NO:57)及其上游5kb以内和/或下游5kb以内的序列;
含chr10:118892523:118892723(SEQ ID NO:58)及其上游5kb以内和/或下游5kb以内的序列;
含chr12:113901298:113901498(SEQ ID NO:59)及其上游5kb以内和/或下游5kb以内的序列;
含chr8:143613755:143613955(SEQ ID NO:60)及其上游5kb以内和/或下游5kb以内的序列;
含chr8:20375580:20375780(SEQ ID NO:61)及其上游5kb以内和/或下游5kb以内的序列;
含chr7:107499318:107499518(SEQ ID NO:62)及其上游5kb以内和/或下游5kb以内的序列;
含chr6:1378941:1379141(SEQ ID NO:63)及其上游5kb以内和/或下游5kb以内的序列;
含chr15:34786976:34787337(SEQ ID NO:64)及其上游5kb以内和/或下游5kb以内的序列;
含chr1:156405314:156405514(SEQ ID NO:65)及其上游5kb以内和/或下游5kb以内的序列;
含chr8:10588811:10589173(SEQ ID NO:66)及其上游5kb以内和/或下游5kb以内 的序列;
含chr4:85418610:85418919(SEQ ID NO:67)及其上游5kb以内和/或下游5kb以内的序列;
含chr5:140871317:140871517(SEQ ID NO:68)及其上游5kb以内和/或下游5kb以内的序列;
含chr5:92906255:92906617(SEQ ID NO:69)及其上游5kb以内和/或下游5kb以内的序列;
含chr14:57265398:57265598(SEQ ID NO:70)及其上游5kb以内和/或下游5kb以内的序列;
含chr19:19650947:19651147(SEQ ID NO:71)及其上游5kb以内和/或下游5kb以内的序列;
含chr11:20618486:20618686(SEQ ID NO:72)及其上游5kb以内和/或下游5kb以内的序列;
含chr7:73407894:73408161(SEQ ID NO:73)及其上游5kb以内和/或下游5kb以内的序列;
含chr16:82660460:82660774(SEQ ID NO:74)及其上游5kb以内和/或下游5kb以内的序列;
含chr13:24844736:24844936(SEQ ID NO:75)及其上游5kb以内和/或下游5kb以内的序列;
含chr20:55500358:55500677(SEQ ID NO:76)及其上游5kb以内和/或下游5kb以内的序列;
含chr10:123923943:123924143(SEQ ID NO:77)及其上游5kb以内和/或下游5kb以内的序列;
含chr20:59827678:59827907(SEQ ID NO:78)及其上游5kb以内和/或下游5kb以内的序列;
含chr20:62330559:62330808(SEQ ID NO:79)及其上游5kb以内和/或下游5kb以内的序列;
含chr19:13209774:13209974(SEQ ID NO:80)及其上游5kb以内和/或下游5kb以内的序列;
含chr16:2085778:2086156(SEQ ID NO:81)及其上游5kb以内和/或下游5kb以内的序列;
含chr6:108488634:108488917(SEQ ID NO:82)及其上游5kb以内和/或下游5kb以内的序列;
含chr12:115124911:115125191(SEQ ID NO:83)及其上游5kb以内和/或下游5kb以 内的序列;
含chr10:124896740:124897020(SEQ ID NO:84)及其上游5kb以内和/或下游5kb以内的序列;
含chr14:55243006:55243206(SEQ ID NO:85)及其上游5kb以内和/或下游5kb以内的序列;
含chr13:36729096:36729334(SEQ ID NO:86)及其上游5kb以内和/或下游5kb以内的序列;
含chr2:10444997:10445197(SEQ ID NO:87)及其上游5kb以内和/或下游5kb以内的序列;
含chr9:2157701:2157901(SEQ ID NO:88)及其上游5kb以内和/或下游5kb以内的序列;
含chr12:57529619:57529819(SEQ ID NO:89)及其上游5kb以内和/或下游5kb以内的序列;
含chr1:119527250:119527450(SEQ ID NO:90)及其上游5kb以内和/或下游5kb以内的序列;
含chr1:119532788:119532988(SEQ ID NO:91)及其上游5kb以内和/或下游5kb以内的序列;
含chr15:96909441:96909641(SEQ ID NO:92)及其上游5kb以内和/或下游5kb以内的序列;
含chr1:146551463:146551747(SEQ ID NO:93)及其上游5kb以内和/或下游5kb以内的序列;
含chr17:35293755:35293955(SEQ ID NO:94)或其上下游各5kb以内的序列;和
含chr17:59482763:59482963(SEQ ID NO:95)或其上下游各5kb以内的序列。
在一些实施方案中,本申请所述的一个或多个胃癌甲基化标志物包括:含chr17:76929754:76929954(SEQ ID NO:50)及其上游5kb以内和/或下游5kb以内的序列;含chr7:8482114:8482413(SEQ ID NO:55)及其上游5kb以内和/或下游5kb以内的序列;含chr8:143613755:143613955(SEQ ID NO:60)及其上游5kb以内和/或下游5kb以内的序列;含chr7:107499318:107499518(SEQ ID NO:62)及其上游5kb以内和/或下游5kb以内的序列;含chr15:34786976:34787337(SEQ ID NO:64)及其上游5kb以内和/或下游5kb以内的序列;含chr8:10588811:10589173(SEQ ID NO:66)及其上游5kb以内和/或下游5kb以内的序列;含chr5:92906255:92906617(SEQ ID NO:69)及其上游5kb以内和/或下游5kb以内的序列;含chr11:20618486:20618686(SEQ ID NO:72)及其上游5kb以内和/或下游5kb以内的序列;含chr20:55500358:55500677(SEQ ID NO:76)及其上游5kb以内和/或下游5kb以内的序列;含chr20:59827678:59827907(SEQ ID NO:78)及其上游5kb以内和/或下游5kb以内的序列;含chr10:124896740: 124897020(SEQ ID NO:84)及其上游5kb以内和/或下游5kb以内的序列;含chr14:55243006:55243206(SEQ ID NO:85)及其上游5kb以内和/或下游5kb以内的序列;含chr2:10444997:10445197(SEQ ID NO:87)及其上游5kb以内和/或下游5kb以内的序列;含chr9:2157701:2157901(SEQ ID NO:88)及其上游5kb以内和/或下游5kb以内的序列;含chr12:57529619:57529819(SEQ ID NO:89)及其上游5kb以内和/或下游5kb以内的序列;含chr1:119527250:119527450(SEQ ID NO:90)及其上游5kb以内和/或下游5kb以内的序列;含chr15:96909441:96909641(SEQ ID NO:92)及其上游5kb以内和/或下游5kb以内的序列;含chr17:35293755:35293955(SEQ ID NO:94)或其上下游各5kb以内的序列;和含chr17:59482763:59482963(SEQ ID NO:95)或其上下游各5kb以内的序列。
在一些实施方案中,本申请所述的一个或多个胃癌甲基化标志物包括:含chr11:11600237:11600617(SEQ ID NO:40)及其上游5kb以内和/或下游5kb以内的序列;含chr2:177030134:177030449(SEQ ID NO:53)及其上游5kb以内和/或下游5kb以内的序列;含chr7:35301095:35301411(SEQ ID NO:54)及其上游5kb以内和/或下游5kb以内的序列;含chr7:8482114:8482413(SEQ ID NO:55)及其上游5kb以内和/或下游5kb以内的序列;含chr12:113901298:113901498(SEQ ID NO:59)及其上游5kb以内和/或下游5kb以内的序列;含chr7:107499318:107499518(SEQ ID NO:62)及其上游5kb以内和/或下游5kb以内的序列;含chr8:10588811:10589173(SEQ ID NO:66)及其上游5kb以内和/或下游5kb以内的序列;含chr11:20618486:20618686(SEQ ID NO:72)及其上游5kb以内和/或下游5kb以内的序列;含chr13:24844736:24844936(SEQ ID NO:75)及其上游5kb以内和/或下游5kb以内的序列;含chr20:62330559:62330808(SEQ ID NO:79)及其上游5kb以内和/或下游5kb以内的序列;含chr19:13209774:13209974(SEQ ID NO:80)及其上游5kb以内和/或下游5kb以内的序列;含chr12:115124911:115125191(SEQ ID NO:83)及其上游5kb以内和/或下游5kb以内的序列;含chr10:124896740:124897020(SEQ ID NO:84)及其上游5kb以内和/或下游5kb以内的序列;含chr2:10444997:10445197(SEQ ID NO:87)及其上游5kb以内和/或下游5kb以内的序列;含chr12:57529619:57529819(SEQ ID NO:89)及其上游5kb以内和/或下游5kb以内的序列;含chr1:119527250:119527450(SEQ ID NO:90)及其上游5kb以内和/或下游5kb以内的序列;含chr1:119532788:119532988(SEQ ID NO:91)及其上游5kb以内和/或下游5kb以内的序列;含chr1:146551463:146551747(SEQ ID NO:93)及其上游5kb以内和/或下游5kb以内的序列;和含chr17:59482763:59482963(SEQ ID NO:95)或其上下游各5kb以内的序列。
在一些实施方案中,本申请所述的一个或多个胃癌甲基化标志物包括:含chr17:76929754:76929954(SEQ ID NO:50)及其上游5kb以内和/或下游5kb以内的序列;含chr8:143613755:143613955(SEQ ID NO:60)及其上游5kb以内和/或下游5kb以内的序列;含chr8:20375580:20375780(SEQ ID NO:61)及其上游5kb以内和/或下游5kb以内的序列;含chr4:85418610:85418919(SEQ ID NO:67)及其上游5kb以内和/或下游5kb以内的序列;含chr5:92906255: 92906617(SEQ ID NO:69)及其上游5kb以内和/或下游5kb以内的序列;含chr13:24844736:24844936(SEQ ID NO:75)及其上游5kb以内和/或下游5kb以内的序列;含chr10:123923943:123924143(SEQ ID NO:77)及其上游5kb以内和/或下游5kb以内的序列;和含chr12:115124911:115125191(SEQ ID NO:83)及其上游5kb以内和/或下游5kb以内的序列。
在一些实施方案中,本申请所述的一个或多个胃癌甲基化标志物包括:含chr17:76929754:76929954(SEQ ID NO:50)及其上游5kb以内和/或下游5kb以内的序列;含chr8:143613755:143613955(SEQ ID NO:60)及其上游5kb以内和/或下游5kb以内的序列;含chr16:82660460:82660774(SEQ ID NO:74)及其上游5kb以内和/或下游5kb以内的序列;含chr10:123923943:123924143(SEQ ID NO:77)及其上游5kb以内和/或下游5kb以内的序列;和含chr6:108488634:108488917(SEQ ID NO:82)及其上游5kb以内和/或下游5kb以内的序列。
在一些实施方案中,本申请所述的一个或多个胃癌甲基化标志物的Hg坐标区域选自以下序列中的任意一个或任意多个的组合:含chr7:35301095:35301411(SEQ ID NO:54)及其上游5kb以内和/或下游5kb以内的序列;含chr8:20375580:20375780(SEQ ID NO:61)及其上游5kb以内和/或下游5kb以内的序列;含chr5:92906255:92906617(SEQ ID NO:69)及其上游5kb以内和/或下游5kb以内的序列;含chr7:73407894:73408161(SEQ ID NO:73)及其上游5kb以内和/或下游5kb以内的序列;含chr6:108488634:108488917(SEQ ID NO:82)及其上游5kb以内和/或下游5kb以内的序列;含chr14:55243006:55243206(SEQ ID NO:85)及其上游5kb以内和/或下游5kb以内的序列;含chr2:10444997:10445197(SEQ ID NO:88)及其上游5kb以内和/或下游5kb以内的序列;含chr1:119527250:119527450(SEQ ID NO:50)及其上游5kb以内和/或下游5kb以内的序列;含chr17:35293755:35293955(SEQ ID NO:84)或其上下游各5kb以内的序列;和含chr17:59482763:59482963(SEQ ID NO:95)或其上下游各5kb以内的序列。
所述染色体坐标与2009年2月发布的人类基因组数据库Hg19版本一致(在本申请中称为“Hg19坐标”)。在一些实施方案中,本申请所述的胃癌甲基化标志物包括上述SEQ ID NO:48-95各序列各起始位点的上游3kb以内、优选2kb以内、更优选1kb以内、更优选500bp以内、更优选300bp以内、更优选100bp以内的序列和/或各末端位点的下游3kb以内、优选2kb以内、优选1kb以内、优选500bp以内、优选300bp以内、优选100bp以内的序列。在一些实施方案中,本申请所述的胃癌甲基化标志物是含有上述SEQ ID NO:48-95任一序列且长度为1000bp以内、优选600bp以内、更优选400bp以内的基因序列。
在一些实施方案中,本申请所述的胃癌甲基化标志物选自SEQ ID NO:48-95中的任意1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47条或全部48条序列。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括:SEQ ID NO:50、SEQ ID NO:55、 SEQ ID NO:60、SEQ ID NO:62、SEQ ID NO:64、SEQ ID NO:66、SEQ ID NO:69、SEQ ID NO:72、SEQ ID NO:76、SEQ ID NO:78、SEQ ID NO:84、SEQ ID NO:85、SEQ ID NO:87、SEQ ID NO:88、SEQ ID NO:89、SEQ ID NO:90、SEQ ID NO:92、SEQ ID NO:94和SEQ ID NO:95。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括:SEQ ID NO:49、SEQ ID NO:53、SEQ ID NO:54、SEQ ID NO:55、SEQ ID NO:59、SEQ ID NO:62、SEQ ID NO:66、SEQ ID NO:74、SEQ ID NO:75、SEQ ID NO:79、SEQ ID NO:80、SEQ ID NO:83、SEQ ID NO:84、SEQ ID NO:87、SEQ ID NO:89、SEQ ID NO:90、SEQ ID NO:91、SEQ ID NO:93和SEQ ID NO:95。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括:SEQ ID NO:50、SEQ ID NO:60、SEQ ID NO:61、SEQ ID NO:67、SEQ ID NO:69、SEQ ID NO:75、SEQ ID NO:77和SEQ ID NO:83。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括:SEQ ID NO:50、SEQ ID NO:60、SEQ ID NO:74、SEQ ID NO:77和SEQ ID NO:82。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括以下序列中的任意一条或任意多条的组合:SEQ ID NO:54、SEQ ID NO:61、SEQ ID NO:69、SEQ ID NO:73、SEQ ID NO:82、SEQ ID NO:85、SEQ ID NO:87、SEQ ID NO:90、SEQ ID NO:94和SEQ ID NO:95。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:87,以及SEQ ID NO:1-39和448-95中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:94,以及SEQ ID NO:1-46和48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:90,以及SEQ ID NO:1-42和44-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:73,以及SEQ ID NO:1-25和27-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:82,以及SEQ ID NO:1-34和36-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:61,以及SEQ ID NO:1-13和15-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:85,以及SEQ ID NO:1-37和39-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:67,以及SEQ ID NO:1-21和23-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:54,以及SEQ ID NO:1-6和8-48中的任意一条或多条。
在一些实施方案中,本申请所述的胃癌甲基化标志物包括SEQ ID NO:95,以及SEQ ID NO:1-47中的任意一条或多条。
可在公共数据库(例如UCSC Genome Browser、Ensemble和NCBI网站)中获得上述Hg19坐标的特定核苷酸序列,以及每个区域的各个起始位点的上游5kb和各个末端位点的下游5kb。
c)食管癌甲基化标志物
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自本申请的染色体范围编号1至43的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个或更多个选自本申请的染色体范围编号1至43的DNA区域。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、41个、42个、或43个选自本申请的染色体范围编号1至43的DNA区域。上述染色体范围编号1至43的DNA区域中的一个或多个可以是食管癌的甲基化标志物。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、41个、42个、或43个选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域。上述一个或多个选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域或片段可以是食管癌的甲基化标志物。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:105上游或下游5k bp以内的区域以及本申请基因编号为1至76的基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中选自SEQ ID NO:105上游或下游5k bp以内的区域以及本申请基因编号为1至76的基因的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、 29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、41个、42个、43个、44个、45个、46个、47个、48个、49个、50个、51个、52个、53个、54个、55个、56个、57个、58个、59个、60个、61个、62个、63个、64个、65个、66个、67个、68个、69个、70个、71个、72个、73个、74个、75个、或76个基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。上述选自SEQ ID NO:105上游或下游5k bp以内的区域以及本申请基因编号为1至76的基因所在的DNA区域或片段可以是食管癌的甲基化标志物。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自本申请的染色体范围编号3、5、7、8、12、13、14、16、17、18、19、21、22、26、28、29、30、32、33、35、36、38、39、40、41、42和43的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个或更多个选自本申请的染色体范围编号1至27的DNA区域。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、或27个选自本申请的染色体范围编号1至27的DNA区域。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:98、100、102、103、107、108、109、111、112、113、114、116、117、121、123、124、125、127、128、130、131、133、134、135、136、137和138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、或27个选自SEQ ID NO:1至27中任一项所示上游或下游5k bp以内的DNA区域。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自以下组的基因所在的DNA区域、或其片段的修饰状态的存在和/或含量:IRF2BP2、LZTS1、DMRTA2、CLVS1、CSNK2A3、HOXD12、FAM109A、HOXD13、FKBP4、TOMM20、AVPR1A、ELAVL4、CARKD、GALNT18、DLL4、CUX2、NR2F2、CACNA1C、ZSCAN10、CARS2、MTSS1L、VPS18、TIMP2、IL32、TBCD、VAC14、RAB3D、LGALS3BP、HOXD10、ZNF750、HOXD1、HOXD11、FAM150B、HOXD4、TNFRSF6B、TMEM18、ETV5、ARFRP1、BDH1、DGKG、PCDHGC5、DIAPH1、MPC1、RPS6KA2、ELN、WBSCR28、SOX7、和CHD7。。例如,本 申请的方法确定待测样本中选自上述基因的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、41个、42个、43个、44个、45个、46个、47个、或48个基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自本申请的染色体范围编号1、2、4、6、9、10、11、15、20、23、24、25、26、27、30、31、34、35、37、38、39、40、和42的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个或更多个选自本申请的染色体范围编号1至23的DNA区域。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、或23个选自本申请的染色体范围编号1至23的DNA区域或片段。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:96、97、99、101、104、105、106、110、115、118、119、120、121、122、125、126、129、130、132、133、134、135和137中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、或23个选自SEQ ID NO:96至118中任一项所示上游或下游5k bp以内的DNA区域或片段。
本申请提供了一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:111上游或下游5k bp以内的区域以及CSF1、DNM2、EPS8L3、RAB3D、RER1、RPL18、SKI、DBP、ARHGEF16、HOXD1、PRDM16、HOXD4、RNF207、PDCD1、ICMT、EP300、TBX5、RBX1、TBX3、ETV5、CHFR、DGKG、ZNF605、SLC2A9、DIO3、DRD5、ENSG00000269375、PCDHGC5、CTU2、DIAPH1、RNF166、MPC1、MBP、RPS6KA2、ZNF236、ELN、ICAM5、WBSCR28、ZGLP1、和LZTS1所在的DNA区域、或其片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中选自SEQ ID NO:111上游或下游5k bp以内的区域以及本申请提供的上述的基因的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34 个、35个、36个、37个、38个、39个、或40个基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。
d)肝癌甲基化标志物
本申请提供了一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自本申请的染色体范围编号44至245(SEQ ID NO:139-340)上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个或更多个选自本申请的染色体范围编号44至245(SEQ ID NO:139-340)上游或下游5k bp以内的DNA区域。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、50个、60个、70个、80个、90个、100个、150个、200个、或202个选自本申请的染色体范围编号44至245(SEQ ID NO:139-340)上游或下游5k bp以内的DNA区域。染色体范围编号44至245(SEQ ID NO:139-340)上游或下游5k bp以内的DNA区域、或其互补区域或片段可以是肝癌甲基化标志物。
本申请提供了一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、39个、40个、50个、60个、70个、80个、90个、100个、150个、200个、或202个选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域。选自SEQ ID NO:139至340中任一项所示上游或下游5k bp以内的DNA区域或其互补区域或片段可以是肝癌甲基化标志物。
本申请提供了一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自本申请基因编号为77至354的基因所在上游或下游5k bp以内的DNA区域、或其片段的修饰状态的存在和/或含量。例如,本申请的方法确定待测样本中选自本申请基因编号为77至354的基因的1个、2个、3个、4个、5个、6个、7个、8个、9个、10个、11个、12个、13个、14个、15个、16个、17个、18个、19个、20个、21个、22个、23个、24个、25个、26个、27个、28个、29个、30个、31个、32个、33个、34个、35个、36个、37个、38个、 39个、40个、41个、42个、43个、44个、45个、46个、47个、48个、49个、50个、51个、52个、53个、54个、55个、56个、57个、58个、59个、60个、61个、62个、63个、64个、65个、66个、67个、68个、69个、70个、80个、90个、100个、150个、200个、201个、或202个基因所在上游或下游5k bp以内的DNA区域、或其片段的修饰状态的存在和/或含量。
II样本来源及制备
在本申请中,所述标志物可以来自任何感兴趣的个体的生物样品。本文所用的术语“个体”包括人类和非人类的动物。非人类动物包括所有脊椎动物,例如哺乳动物和非哺乳动物。“个体”也可以是家畜,例如牛、猪、绵羊、家禽和马;或啮齿动物,例如大鼠、小鼠;或非人类灵长类动物,例如猿、猴、恒河猴;或家养的动物,例如狗或猫。在一些实施方式中,个体是人类或非人类灵长类动物。在一些实施方式中,个体是人类。在本申请中,“个体”、“对象”和“受试者”可互换使用。
应理解,上述第I部分“标志物”给出的序列为人的序列。当涉及非人动物的序列时,可采用现有技术容易地确定上述基因在非人动物基因组中的对应位置和对应序列。
本文所用的术语“样品”、“样本”、“待测样本”或“生物样品”是指获自或衍生自个体的生物组合物,其包含基于物理、生化、化学和/或生理特征待表征或待识别的细胞和/或其他分子实体(例如DNA)。生物样品包括但不限于通过本领域技术人员已知的任何方法获得的个体的细胞、组织、器官和/或生物体液。在一些实施方式中,所述待测样本或生物样品选自下组:组织学切片、组织活检、石蜡包埋的组织、体液、手术切除样本、分离的血细胞、分离自血液的细胞,及其任意组合。在一些实施方式中,所述体液选自下组:全血、血清、血浆,及其任意组合。选择最适合的样品将取决于情境的性质。在一些实施方式中,所述待测样本或生物样品为个体的全血。在一些实施方式中,所述待测样本或生物样品为个体的血浆。本领域技术人员知道从全血制备血浆的各种方法。例如,在一些实施方式中,血浆通过将来自个体的全血离心一次、两次、三次、四次、五次或更多次来获得。在一些实施方式中,所述待测样本或生物样品是胃癌活检物。
待检测的DNA可分离自所述生物样品。可以通过使用本领域已知的各种方法从生物样品中分离和纯化出待检测的DNA。可使用市售试剂盒来进行分离和纯化。例如,通过以下方式从细胞和组织中分离DNA:在高度变性和还原条件下裂解原材料、部分使用蛋白质降解酶、纯化通过苯酚/氯仿提取工艺获得的核酸组分,并通过渗析或乙醇沉淀从水相中回收核酸(参见例如Sambrook,J.,Fritsch,E.F.in T.Maniatis,C S H,Molecular Cloning,1989)。又例如,现 在有许多试剂体系特别适用于从琼脂糖凝胶中纯化DNA片段、从细菌裂解物中分离质粒DNA,以及从血液、组织或细胞培养物中分离较长链的核酸(基因组DNA、总细胞RNA)。许多这些可商购的纯化体系中是基于相当众所周知的原理,即,在不同离液盐的溶液的存在下将核酸与矿物载体相结合。在这些体系中,细磨的玻璃粉、硅藻土或硅胶的悬浮液被用作载体材料。在例如US7888006B2和EP1626085A1中描述了从生物样品中分离和纯化DNA的一些其他方法。在方法之间进行选择将受到几个因素的影响,包括时间、费用和所需的DNA数量。
在一些实施方式中,待测样本或生物样品中包含的DNA包括基因组DNA。本文所用的术语“基因组DNA”是指包含细胞或生物体的完整基因组及其片段或部分的DNA。基因组DNA是来源于个体的大段DNA(例如长于大约10、20、30、40、50、60、70、80、90、100、200或300kb),并且可以具有天然修饰,例如DNA甲基化。
在一些实施方式中,待测样本或生物样品中包含的DNA包括细胞DNA。本文所用的术语“细胞DNA”是指存在于细胞内的DNA,或从体内细胞中获取DNA并在体外分离、或以其他方式在体外操作,只要该DNA未从体内细胞中移除。
在一些实施方式中,待测样本或生物样品中包含的DNA包括细胞外游离DNA。本文所用的术语“细胞外游离DNA”是指在体内的细胞外存在的DNA片段。该术语也可以被用于指代获取自体内的细胞外来源并在体外分离、或操作的DNA片段。细胞外游离DNA中的DNA片段通常具有约100到200bp的长度,推测与被包裹于核小体的DNA片段的长度有关。细胞外游离DNA(cfDNA)包括例如细胞外游离胎儿DNA和循环肿瘤DNA。细胞外游离胎儿DNA在孕妇的体内(例如血液)中循环,代表胎儿基因组,而循环肿瘤DNA在癌症患者的体内(例如血液)中循环。在一些实施方式中,细胞外游离DNA可基本上不含个体的细胞DNA。例如,所述细胞外游离DNA可包含小于约1,000ng/mL、小于约100ng/mL、小于约10ng/mL、小于约1ng/mL的细胞DNA。
可以通过使用本领域已知的常规技术来制备细胞外游离DNA。例如,可以通过以约200-20,000g、约200-10,000g、约200-5,000g、约300-4000g等的速度离心血液样品约3-30分钟、约3-15分钟、约3-10分钟、约3-5分钟来获得血液样品的细胞外游离DNA。例如,在一些实施方式中,可以通过将个体的血浆或血清离心一、二、三、四、五次或更多次来获得血液样本的细胞外游离DNA。在一些实施方式中,为了从包含可溶性DNA的无细胞组分中分离细胞及其片段,可以通过微滤来获得所述生物样品。通常来说,微滤可以通过使用过滤器来进行,例如,0.1微米~0.45微米的膜过滤器,诸如0.22微米的膜过滤器。
在一些实施方式中,使用商购的DNA提取产品从全血、血清或血浆中提取细胞外游离DNA用于分析。这种提取方法据称对循环DNA的回收率高(>50%),某些产品(例如Qiagen生产的QIAamp Circulating Nucleic Acid Kit)据称可提取小尺寸的DNA片段。所使用的典型样品量为1-5mL血清或血浆。
在一些实施方式中,细胞外游离DNA包括循环肿瘤DNA。循环肿瘤DNA(“ctDNA”)是与细胞无关的体液(例如血液、尿液、唾液、痰、粪便、胸膜液、脑脊液等)中肿瘤来源的片段化DNA。通常,ctDNA高度片段化,平均长度约为150个碱基对。ctDNA通常包括体液(例如血浆)中细胞外游离DNA的极小部分,例如ctDNA可能构成血浆DNA的不到约10%。通常,该百分比小于约1%,例如小于约0.5%或小于约0.01%。另外,血浆DNA的总量通常非常低,例如约10ng/mL血浆。ctDNA的数量因人而异,并且取决于肿瘤的类型、位置,对于癌性肿瘤,则取决于癌症的阶段。但是,ctDNA通常在体液中非常罕见,只能通过极其敏感和特异性的技术进行检测。检测ctDNA可能有助于检测和诊断肿瘤、指导肿瘤特异性治疗、监测治疗以及监测癌症的缓解。
III碱基转化
DNA甲基化是(例如,通过DNA甲基转移酶的作用)将甲基添加到DNA分子上(例如,添加至DNA分子的一个或多个胞嘧啶碱基)的生物学过程。在哺乳动物中,DNA甲基化出现于胞嘧啶-磷酸-鸟嘌呤(CpG)二核苷酸(即“CpG位点”)的5’位置,当其出现在基因的启动子或第一个外显子中的5’-CpG-3’二核苷酸中时,会导致基因的表观遗传失活。已充分证明了DNA甲基化在调节基因表达、肿瘤发生、以及其他遗传和表观遗传疾病中起重要作用。
如本文所用,术语“甲基化的胞嘧啶残基”是指胞嘧啶残基的衍生物,其中一个甲基连接至胞嘧啶环的碳原子上(例如C5)。术语“未甲基化的胞嘧啶残基”是指未衍生化的胞嘧啶残基,其中与“甲基化的胞嘧啶残基”相反,在胞嘧啶环的碳原子(例如C5)上没有甲基连接。其内的胞嘧啶残基被甲基化的CpG位点就是甲基化的CpG位点,而其内的胞嘧啶残基未被甲基化的CpG位点是未甲基化的CpG位点。
如本文所述,DNA或RNA的碱基之间可发生转化。本文所述“转化”、“胞嘧啶转化”或“CT转化”是利用非酶促或酶促方法处理DNA,将未修饰的胞嘧啶碱基(cytosine,C)转化为不与鸟嘌呤(G)结合的碱基(例如尿嘧啶碱基(uracil,U))的过程。一些试剂能够区分DNA中的未甲基化和甲基化的CpG位点,从而获得经处理的DNA。该试剂可以选择性地作用于未甲基化的胞嘧啶残基,但不能显著地作用于甲基化的胞嘧啶残基。或者该试剂可以选择性地作用于甲基化的胞嘧啶残基,而不显著地作用于未甲基化的胞嘧啶残基。例如,一些试剂可以选择性地将未甲基化的胞嘧啶残基转化为尿嘧啶、胸腺嘧啶或杂交上与胞嘧啶 不同的另一碱基,而甲基化的胞嘧啶残基依然处于未转化状态;又例如,一些试剂可以选择性地切割甲基化的残基,或者选择性地切割未甲基化的残基。由此,原始DNA以取决于是否被甲基化的方式转化为经处理的DNA,从而可以通过其杂交行为将经处理的DNA与原始DNA区分开。
如本文所用,“经处理的DNA”、“经处理的序列”、“经处理的片段”是指已经用能够区分DNA、核酸序列、基因片段中的未甲基化和甲基化的CpG位点的试剂处理后的DNA、核酸序列、基因片段。
更具体而言,可采用非酶促或酶促方法进行胞嘧啶转化。示例性地,非酶促方法包括亚硫酸氢盐或重硫酸盐处理。在一些实施方式中,非酶促方法所用的试剂包括亚硫酸氢盐试剂。如本文所用,术语“亚硫酸氢盐试剂”是指,例如本申请所公开的可用于区分甲基化和未甲基化的CpG二核苷酸序列的包括亚硫酸氢盐、亚硫酸氢根离子或其任意组合的试剂。在本申请中,用亚硫酸氢盐试剂处理DNA也被描述为“亚硫酸氢盐反应”或“亚硫酸氢盐处理”,指的是转化未甲基化的胞嘧啶残基的反应,特别是在亚硫酸氢根离子存在的情况下,核酸中未甲基化的胞嘧啶残基被转化为尿嘧啶碱基、胸腺嘧啶碱基或在杂交行为上与胞嘧啶不同的其他碱基,而其中甲基化的胞嘧啶残基未被显著地转化。换言之,亚硫酸氢盐处理可用于区分甲基化的CpG二核苷酸和未甲基化的CpG二核苷酸。Frommer,M.,et al.,Proc Natl Acad Sci USA 89(1992)1827-31和Grigg,G.,Clark,S.,Bioessays 16(1994)431-6中详细描述了用于检测甲基化的胞嘧啶残基的亚硫酸氢盐反应。亚硫酸氢盐反应包括脱氨基步骤和脱磺酸基步骤(参见Grigg and Clark,同上)。“甲基化的胞嘧啶残基未被显著地转化”这一陈述,不排除非常小的百分比(例如,小于0.1%、小于0.2%、小于0.3%、小于0.4%、小于0.5%、小于0.6%、小于0.7%、小于0.8%、小于0.9%、小于1%、小于2%、小于3%、小于4%、小于5%、小于6%、小于7%、小于8%、小于9%、小于10%、小于11%、小于12%、小于13%、小于14%、小于15%、小于16%、小于17%、小于18%、小于19%、小于20%)的甲基化的胞嘧啶残基被转化为尿嘧啶、胸腺嘧啶或在杂交行为上与胞嘧啶不同的其他碱基,尽管其意在仅仅转化未甲基化的胞嘧啶残基。
在例如参考Frommer M.,et al.(同上)或Grigg and Clark(同上)的情况下(它们公开了亚硫酸氢盐处理的基本参数),本领域技术人员知道如何进行亚硫酸氢盐处理,特别是脱氨基步骤和脱磺酸基步骤。孵育时间和温度对脱氨基效率的影响、以及影响DNA降解的参数都已公开。
在一些实施方式中,所述亚硫酸氢盐试剂选自下组:亚硫酸氢铵、亚硫酸氢钠、亚硫酸氢钾、亚硫酸氢钙、亚硫酸氢镁、亚硫酸氢铝、亚硫酸氢根离子,及其任意组合。在一些实施方式中,所述亚硫酸氢盐试剂是亚硫酸氢钠。在一些实施方式中,亚硫酸氢盐试剂是可商购的,例如,MethylCodeTM Bisulfite Conversion Kit、EpiMarkTM Bisulfite Conversion Kit、EpiJETTM Bisulfite Conversion Kit、EZDNAMethylation-GoldTM Kit等。在一些实施方式中, 根据试剂盒的使用说明书进行亚硫酸氢盐反应。
示例性的酶促方法包括脱氨酶处理,以及使用试剂选择性地切割未甲基化的残基但不切割甲基化的残基,或者选择性地切割甲基化的残基但不切割未甲基化的残基。优选地,所述试剂是甲基化敏感限制酶(MSRE)。
术语“甲基化敏感限制酶”是指根据其识别位点的甲基化状态而选择性地消化核酸的酶。对于当识别位点未被甲基化或半甲基化时才特异剪切的限制酶来说,当识别位点被甲基化时,不会发生剪切,或以显著降低的效率剪切。对于当识别位点被甲基化时才特异剪切的限制酶来说,当识别位点未被甲基化时,不会发生剪切,或以显著降低的效率剪切。在一些实施方式中,甲基化敏感限制酶的识别序列含有CG二核苷酸(例如cgcg或cccggg)。在一些实施方式中,当该CG二核苷酸中的胞嘧啶在C5碳原子处被甲基化时,甲基化敏感限制酶不进行剪切。
示例性的MSRE选自下组:HpaII酶、SalI酶、酶、ScrFI酶、BbeI酶、NotI酶、SmaI酶、XmaI酶、MboI酶、BstBI酶、ClaI酶、MluI酶、NaeI酶、NarI酶、PvuI酶、SacII酶、HhaI酶及其任意组合。
使用本领域已知的方法,使用能区分目标区域内的甲基化的CpG二核苷酸和未甲基化的CpG二核苷酸的甲基化敏感限制酶或包含甲基化敏感限制酶的一系列限制酶试剂来确定甲基化,例如但不限于,差异性甲基化杂交(“DMH”)。
在一些实施方式中,生物样品中的DNA可以在用甲基化敏感限制酶处理之前被切割。这样的方法是本领域已知的,并且可以既包括物理方式也包括酶促方式。特别优选的是使用一种或多种对甲基化不敏感的并且其识别位点富含AT并且不包含CG二核苷酸的限制酶。使用此类酶使得DNA片段中的CpG位点和CpG富集区域得以保存。在一些实施方式中,此类限制酶选自MseI酶、BfaI酶、Csp6I15酶、Tru1I酶、Tru9I酶、MaeI酶、XspI酶及其任意组合。
经转化的DNA任选地经纯化。适用于本文的DNA纯化方法本领域周知。
IV定量分析
可检测本文第I部分第a)小结中所述任意1种、任意2种、任意3种、任意4种、任意5种、任意6种、任意7种、任意8种、任意9种、任意10种、任意11种、任意12种、任意13种、任意14种、任意15种、任意16种、任意17种、任意18种、任意19种、任意20种以上、任意21种以上、任意22种以上、任意23种以上、任意24种以上、任意25种以上、任意26种以上、任意27种以上、任意28种以上、任意29种以上、任意30种以上、任意31种以上、任意32种以上、任意33种以上、任意34种以上、任意35种以上、任意36种以上、任意37种以上、任意38种以上、任意39种以上、任意40种以上、任意41种以 上、任意42种以上、任意43种以上、任意44种以上、任意45种以上、任意46种以上、任意47种以上或全部48种所述目标标志物中的至少一个CpG二核苷酸的甲基化状态或甲基化水平,用以鉴别对象是否患有胃癌。
可检测本文第I部分第b)小结中所述任意1种、任意2种、任意3种、任意4种、任意5种、任意6种、任意7种、任意8种、任意9种、任意10种、任意11种、任意12种、任意13种、任意14种、任意15种、任意16种、任意17种、任意18种、任意19种、任意20种以上、任意21种以上、任意22种以上、任意23种以上、任意24种以上、任意25种以上、任意26种以上、任意27种以上、任意28种以上、任意29种以上、任意30种以上、任意31种以上、任意32种以上、任意33种以上、任意34种以上、任意35种以上、任意36种以上、任意37种以上、任意38种以上、任意39种以上、任意40种以上、任意41种以上、任意42种以上、任意43种以上、任意44种以上、任意45种以上、任意46种以上或全部47种所述目标标志物中的至少一个CpG二核苷酸的甲基化状态或甲基化水平,用以鉴别对象是否患有结直肠癌。
可检测本文第I部分第c)小结中所述任意1种、任意2种、任意3种、任意4种、任意5种、任意6种、任意7种、任意8种、任意9种、任意10种、任意11种、任意12种、任意13种、任意14种、任意15种、任意16种、任意17种、任意18种、任意19种、任意20种以上、任意21种以上、任意22种以上、任意23种以上、任意24种以上、任意25种以上、任意26种以上、任意27种以上、任意28种以上、任意29种以上、任意30种以上、任意31种以上、任意32种以上、任意33种以上、任意34种以上、任意35种以上、任意36种以上、任意37种以上、任意38种以上、任意39种以上、任意40种以上、任意41种以上、任意42种以上或全部43种所述目标标志物中的至少一个CpG二核苷酸的甲基化状态或甲基化水平,用以鉴别对象是否患有食管癌。
可检测本文第I部分第d)小结中所述任意1种、任意2种、任意3种、任意4种、任意5种、任意6种、任意7种、任意8种、任意9种、任意10种、任意11种、任意12种、任意13种、任意14种、任意15种、任意16种、任意17种、任意18种、任意19种、任意20种以上、任意21种以上、任意22种以上、任意23种以上、任意24种以上、任意25种以上、任意26种以上、任意27种以上、任意28种以上、任意29种以上、任意30种以上、任意31种以上、任意32种以上、任意33种以上、任意34种以上、任意35种以上、任意36种以上、任意37种以上、任意38种以上、任意39种以上、任意40种以上、任意41种以上、任意42种以上、任意43种以上、任意44种以上、任意45种以上、任意46种以上、任意47种、任意48种、任意49种、任意50种、任意51种、任意52种、任意53种、任意54种、任意55种、任意56种、任意57种、任意58种、任意59种、任意60种、任意65种、任意70种、任意75种、任意80种、任意85种、任意90种、任意95种以上、任意100种以上、任意105种、任意110种、任意120种、任意130种、任意140种、任意145种、任 意150种、任意155种以上、任意160种、任意170种、任意180种、任意190种、任意200种以上或全部202种所述目标标志物中的至少一个CpG二核苷酸的甲基化状态或甲基化水平,用以鉴别对象是否患有肝癌。
本申请所述的检测试剂和诊断试剂盒可用于所述甲基化状态或甲基化水平的检测。
本文中,“甲基化状态”是指一种或多种甲基化核苷酸碱基在核酸分子中的存在或不存在。例如,含有甲基化胞嘧啶的核酸分子被认为是甲基化的(例如核酸分子的甲基化状态是甲基化的)。不含有任何甲基化核苷酸的核酸分子被认为是未甲基化的。在一些实施方案中,如果核酸在特定基因座(例如特定单一CpG二核苷酸的基因座)或基因座特定组合处不是甲基化的,则核酸可表征为“未甲基化”,即使它在相同基因或分子的其他基因座处为甲基化的,也如此。
因此,甲基化状态描述了核酸(例如基因组序列或本文所述的目标标志物、DNA区域或其片段)的甲基化的状态。另外,甲基化状态是指在特定基因组基因座处的核酸区段与甲基化相关的特征。此类特征包括但不限于此DNA序列内的任何胞嘧啶(C)残基是否为甲基化的、一个或多个甲基化C残基的位置、贯穿核酸的任何特定区域的甲基化C的频率或百分比以及由于例如等位基因起点的差异而导致的甲基化等位基因差异。“甲基化状态”是指在生物样品中贯穿核酸的任何特定区域的甲基化C或未甲基化C的相对浓度、绝对浓度或模式。例如,如果核酸序列内的一个或多个胞嘧啶(C)残基是甲基化的,则其可称为“超甲基化”或具有“增加的甲基化”,而如果DNA序列内的一个或多个胞嘧啶(C)残基是未甲基化的,则其可称为“去甲基化”或具有“减少的甲基化”。同样地,如果核酸序列内的一个或多个胞嘧啶(C)残基与另一个核酸序列(例如来自不同区域或来自不同个体等)相比是甲基化的,则该序列被认为与其他核酸序列相比是超甲基化的或具有增加的甲基化。或者,如果DNA序列内的一个或多个胞嘧啶(C)残基与另一个核酸序列(例如来自不同区域或来自不同个体等)相比是未甲基化的,则该序列被认为与其他核酸序列相比是去甲基化的或具有减少的甲基化。
甲基化水平代表一个或多个位点处于甲基化状态的比例(或百分比、份数、比率、程度)。一个区域(或一组位点)的甲基化水平是该区域中所有位点(或组中所有位点)的甲基水平的均值。因此,区域的甲基化水平上升或下降并不表示区域中所有甲基化位点的甲基化水平都上升或下降。本领域知晓将检测DNA甲基化的方法(例如简化甲基化测序)所得结果转化为甲基化水平的过程。甲基化水平可以通过例如定量分析在用甲基化敏感性限制性酶进行限制性消化后存在的完整DNA的量来确定。在该例中,如果使用定量PCR对DNA中的特定序列进行定量分析,模板DNA的量大约等于模拟处理的对照则表明该序列未高度甲基化,而模板量明显少于模拟处理的样品中的模板量则表明该序列中存在甲基化DNA。因此,如上述例子中的甲基化水平可以用作甲基化状态的定量指标。当需要将样品中序列的甲基化水平与阈值水平进行比较时,这尤其有用。
DNA序列(例如目标标志物)内的一个或多个CpG二核苷酸序列的甲基化水平/状态可 以通过本领域中已知的各种分析方法来确定,优选为定量分析方法。示例性的分析方法包括:聚合酶链式反应,包括实时聚合酶链式反应,数字聚合酶链式反应,和基于重亚硫酸盐转化的PCR(例如甲基化特异性PCR(Methylation-specific PCR,MSP))及其上游5kb以内和/或下游5kb以内的序列;核酸测序;全基因组甲基化测序(RRBS)及其上游5kb以内和/或下游5kb以内的序列;简化甲基化测序;基于质量的分离(例如电泳法、质谱法)及其上游5kb以内和/或下游5kb以内的序列;靶标捕获(例如杂交、微阵列)及其上游5kb以内和/或下游5kb以内的序列;甲基化敏感的限制性内切酶分析法;甲基化敏感性高分辨率熔解曲线法;基于芯片的甲基化图谱分析;质谱;和荧光定量法。本文中,检测包括检测基因或位点处的任一条链。
在一些实施方式中,通过实时PCR进行定量分析。实时PCR的非限制性实例包括Cottrell et al.,Nucl.Acids Res.32:e10,2003描述的HeavyMethylTM PCR;Eads et al.,Cancer Res.59:2302-2306,1999描述的MethyLightTM PCR;Rand et al.,Nucl.Acids Res.33:e 127,2005描述的Headloop PCR。
如本文所用,术语“HeavyMethylTM PCR”是指本领域公认的一种实时PCR技术,其中一个或多个不可延伸性核酸(例如,寡核苷酸)封闭物以甲基化特异性方式与亚硫酸氢盐处理的核酸结合(即,封闭物在中等至高等严谨条件下与未突变的DNA特异性结合)。使用一种或多种引物进行扩增反应,所述引物可以任选地是甲基化特异性的,但旁侧分布一个或多个封闭物。在未甲基化的核酸(即突变的DNA)存在的情况下,封闭物结合并且无PCR产物产生。使用基本上像例如Holland et al.,Proc.Natl.Acad.Sci.USA,88:7276-7280,1991所述的TaqManTM分析方法,样品中核酸的甲基化水平得以确定。
如本文所用,术语“MethyLightTM PCR”是指基于本领域公认的一种基于荧光的实时PCR技术,其中采用了称为TaqManTM探针的双标记荧光寡核苷酸探针,并且被设计为可同位于正向和反向扩增引物之间的富含CpG的序列杂交。所述的TaqManTM探针包含一个荧光“报告因子部分”和“淬灭剂部分”共价结合到与TaqManTM寡核苷酸的核苷酸相连的接头部分(例如,亚磷酰胺)。在PCR扩增过程中,与富含CpG的序列杂交的TaqManTM探针被Taq聚合酶的5’核酸酶活性切割,从而在PCR反应过程中产生以实时方式检测的信号。在该方法中,可以将分子信标用作可检测的探针,并且该系统不依赖于所使用的DNA聚合酶的5’-3’核酸外切酶活性(参见Mhlanga and Malmberg,Methods 25:463-471,2001)。
如本文所用,术语“Headloop PCR”是指本领域公认的一种实时PCR,其选择性地扩增目标核酸,但是通过将3’茎环延伸形成不能进一步提供扩增模板的发卡结构来抑制非扩增目标变体的扩增。
在一些实施方式中,所述实时PCR是多重实时PCR。如本文所用,术语“多重”可指,通过使用一个以上的标志物,每个标志物具有至少一个不同的检测特征,例如荧光特征(例如,激发波长、发射波长、发射强度、FWHM(半峰高处的全宽度)或荧光寿命)或独特的 核酸或蛋白序列特征,可以同时对多个标志物(例如多个核酸序列)的存在和/或量进行测定的分析或其他分析方法。
在一些实施方式中,通过核酸测序进行定量分析。核酸测序的示例性方法是本领域已知的,参见,例如Frommer et al.,Proc.Natl.Acad.Sci.USA 89:1827-1831,1992;Clark et al.,Nucl.Acids Res.22:2990-2997,1994。例如,通过将未使用亚硫酸氢盐处理的样品获得的序列或目标区域的已知核苷酸序列与使用亚硫酸氢盐处理的样品获得的序列进行比较,有助于鉴定DNA序列中甲基化胞嘧啶。与未处理的样品相比,在亚硫酸氢盐处理的样品中的任意胞嘧啶位点检测到的胸腺嘧啶残基都可以认为是由亚硫酸氢盐处理而引起的突变,即该位点存在甲基化的胞嘧啶。
用于测序DNA的方法是本领域已知的,包括例如双脱氧链终止法或Maxam-Gilbert法(参见Sambrook et al.,Molecular Cloning,A Laboratory Manual(2nd Ed.,CSHP,New York 1989))、焦磷酸测序(参见Uhlmann et al.,Electrophoresis,23:4072-4079,2002)、固相焦磷酸测序(参见Landegren et al.,Genome Res.,8(8):769-776,1998)、固相微测序(参见例如,Southern et al.,Genomics,13:1008-1017,1992)、采用FRET的微测序(参见例如,Chen and Kwok,Nucleic Acids Res.25:347-353,1997)、连接法测序或超深度测序(参见Marguiles et al.,Nature 437(7057):376-80(2005))。
在一些实施方式中,通过基于质量的分离(例如电泳、质谱法)进行定量分析。例如,甲基化胞嘧啶残基的存在可以通过联合亚硫酸氢盐限制分析法(COBRA)进行检测,基本如Xiong and Laird,Nucl.Acids Res.,25:2532-2534,2001所述。这种方法利用了在使用可以选择性地突变未甲基化的胞嘧啶残基的化合物(例如,亚硫酸氢盐)处理之后,在甲基化和未甲基化的核酸之间的限制酶识别位点的差异。例如,限制性核酸内切酶Taq1切割序列TCGA,在对未甲基化核酸进行亚硫酸氢盐处理后该序列将是TTGA,因此将不被切割。然后使用本领域已知的检测手段例如电泳和/或质谱法,检测消化的和/或未消化的核酸。又例如,在用选择性突变未甲基化胞嘧啶残基的化合物处理后,基于核苷酸序列和/或二级结构的差异,使用不同的技术来检测扩增产物中核酸差异,例如甲基化特异性单链构象分析(MS-SSCA)(Bianco et al.,Hum.Mutat.,14:289-293,1999)、甲基化特异性变性梯度凝胶电泳(MS-DGGE)(Abrams and Stanton,Methods Enzymol.,212:71-74,1992)和甲基化特异性变性高效液相色谱(MS-DHPLC)(Deng et al.,Chin.J.Cancer Res.,12:171-191,2000)。
在一些实施方式中,通过靶标捕获(例如杂交、微阵列)进行定量分析。通过杂交的合适的检测方法是本领域已知的,例如Southern、斑点印迹、狭缝印迹或其他核酸杂交方式(Kawai et al.,Mol.Cell.Biol.14:7421-7427,1994;Gonzalgo et al.,Cancer Res.57:594-599,1997)。在一些实施方式中,用于杂交分析的探针被可检测地标记。在一些实施方式中,用于杂交分析的基于核酸的探针是未标记的。这种未标记的探针可以固定在固体载体如微阵列上,并且可以与被可检测地标记的目标核酸分子杂交。微阵列的一个实例是甲基化特异性微阵列,其 可用于区分具有转化的胞嘧啶残基的序列和具有未转化的胞嘧啶残基的序列(参见Adorjan et al.,Nucl.Acids Res.,30:e21,2002)。基于杂交的分析还可被用于用甲基化敏感的限制酶处理后的核酸。又例如,可通过寡核苷酸探针确定DNA序列内CpG二核苷酸序列的甲基化状态,所述寡核苷酸探针与PCR扩增引物同时与亚硫酸氢盐处理的DNA杂交(其中所述引物可以是甲基化特异性引物或标准引物)。
在一些实施方式中,定量分析在检测试剂的存在下进行。如本文所用,术语“检测试剂”是在定量分析步骤中用于检测核酸的存在、不存在或量的试剂。本领域已知的各种检测试剂在本申请中都可使用。在一些实施方式中,检测试剂选自下组:荧光探针、嵌入染料、生色团标记的探针、放射性同位素标记的探针和生物素标记的探针。
在一些实施方式中,定量分析包含使用定量引物对和DNA聚合酶对经处理的DNA进行扩增。如本文所用,术语“定量引物对”是指在定量分析步骤中使用的一个或多个引物对。优选地,所述定量引物对能够与所述经处理的DNA的至少9个连续核苷酸在严谨条件下、中等严谨条件下或高度严谨条件下杂交。
在一些实施方式中,所述定量分析包括基于经处理的DNA中多个CpG二核苷酸、TpG二核苷酸或CpA二核苷酸的存在或水平,确定一个或多个目标标志物的甲基化水平。在一些实施方式中,所述定量分析包括基于经处理的DNA中一个或多个CpG二核苷酸的存在或水平来确定胞嘧啶残基的甲基化水平。在一些实施方式中,所述定量分析包括基于所述经处理的DNA中一个或多个TpG二核苷酸的存在或水平来确定胞嘧啶残基的甲基化水平。在一些实施方式中,所述定量分析包括基于所述经处理的DNA中CpA二核苷酸的存在来确定胞嘧啶残基的甲基化水平。
在一些实施方式中,定量分析步骤是通过将经处理的DNA产物分为多个组分来进行的。在一些实施方式中,对多个组分进行多个不同的定量分析测试,其中在多个组分之一中定量分析所述经处理的DNA产物(如果存在于所述组分中的话)的不同组合。在一些实施方式中,定量分析每个组分中的对照标志物。
在一些实施方式中,基于预扩增的DNA通过使用MSP(参见Herman et al.,Methylation-specific PCR:a novelPCRassay for methylation status ofCpGislands.Proc Natl Acad Sci USA.1996 September 3;93(18):9821-6和United States Patent No.6,265,171)分别定量分析每个目标标志物的甲基化水平。例如,通过使用在中等和/或高度严谨条件下与未转化序列特异性杂交的一种或多种引物,仅当模板在CpG位点包含甲基化胞嘧啶时才产生扩增产物。
在一些实施方式中,所述定量引物对被设计为扩增所述经处理的DNA产物中的至少一部分,即定量分析被设计为巢式PCR。巢式PCR是PCR的一种改进,旨在提高灵敏度和特异性。巢式PCR涉及使用两个引物组和两个连续的PCR反应。进行第一轮扩增以产生第一扩增子,并使用一个引物对进行第二轮扩增,其中一个或两个引物与由初始引物对界定的区域内的位点退火,即第二个引物对被认为是“嵌套”在第一对引物中。以这种方式,不包含 正确内部序列的来自第一次PCR反应的背景扩增产物在第二次PCR反应中不再被进一步扩增。
通常,PCR的反应液包含Taq DNA聚合酶、PCR缓冲液、引物、探针、dNTPs、Mg2+。优选地,Taq DNA聚合酶为热启动Taq DNA聚合酶。示例性地,Mg2+终浓度为1.0-20.0mM;各引物浓度为100-500nM;各探针浓度为100-500nM。示例性的PCR反应条件为,95℃预变性5min;95℃变性15s,60℃退火延伸60s,50个循环。
在一些实施方案中,本申请的方法包括预扩增步骤。对目标标志物进行预扩增的目的之一是增加经处理的DNA中的目标标志物的数量。如本文所用,术语“扩增”大体上指任何能够导致分子或一组相关分子的拷贝数增加的过程。当“扩增”被用于多核苷酸分子时,是指通常从少量多核苷酸开始产生多拷贝的多核苷酸分子或多核苷酸分子的一部分的多份拷贝,其中被扩增的物质(扩增子,PCR扩增子)通常是可被检测到的。多核苷酸的扩增涵盖多个化学和酶促过程。扩增的形式包括通过聚合酶链式反应(逆转录PCR、PCR)、链置换扩增(SDA)反应、转录介导扩增(TMA)反应、基于核酸序列的扩增(NASBA)反应或连接酶链反应(LCR),从一个或几个拷贝的模板RNA或DNA分子生成多个DNA拷贝。
可用预扩增引物预扩增经处理的DNA中的所述标志物。如本文所用,术语“引物”是指这样的单链寡核苷酸,其能够在合适的条件(例如缓冲液和温度)下,在四种不同的三磷酸核苷和用于聚合的试剂(例如DNA聚合酶)的存在下,作为模板指导的DNA合成的起始点。在任何给定的情况下,引物的长度取决于例如引物的预期用途,并且通常在15至30个核苷酸的范围内。短的引物分子通常需要较低的温度才能与模板形成足够稳定的杂交复合物。引物不必反映模板的确切序列,但必须足够互补以能与该模板杂交。引物位点是模板上与引物杂交的区域。引物对是一组引物,其包括与待扩增的序列的5’末端杂交的5’正向引物和与待扩增的序列的3’末端的互补链杂交的3’反向引物。本领域技术人员可以基于本领域的公知常识根据待扩增的标志物设计引物(参见,例如PCR Primer:A Laboratory Manual,Cold Spring Harbor Laboratories,NY,1995)。此外,一些用于设计在各种各样分析中使用的最佳探针和/或引物的软件包是公开的,例如可从美国马萨诸塞州剑桥市的基因组研究中心(the Center for Genome Research,Cambridge,Mass.,USA)获得的Primer 3。显然,在设计探针或引物时其潜在用途也应考虑在内。例如,设计用于本发明目的的引物可以包括至少一个CpG位点,或者从该引物获得的扩增产物可以包括至少一个CpG位点。用于设计检测DNA甲基化状态的引物的工具也是本领域已知的,例如MethPrimer(Li LC and Dahiya R.MethPrimer:designing primers for methylation PCRs.Bioinformatics.2002 Nov;18(11):1427-31)。在本申请中,通过将预扩增引物作为引物池,经处理的DNA中的任何目标标志物(目标标志物的每至少一部分或目标标志物的一个亚区域)均可以被预扩增。
如本文所用,术语“互补”是指核苷酸或核酸之间的杂交或碱基配对,例如,双链DNA分子的两条链之间,或待测序或扩增的单链核酸上的引物结合位点和寡核苷酸引物之间。互 补核苷酸通常是A和T(或A和U),或C和G。当一条链的核苷酸以最佳的方式对齐、并比较、并有适当的核苷酸插入或缺失后,与另一链的至少约80%(通常至少约90%至95%,更优选地为约98%至100%)的核苷酸配对,两条单链RNA或DNA分子就被称为是互补的。或者,当RNA链或DNA链在选择性杂交条件下与其互补序列杂交时,互补存在。通常,当在至少14至25个核苷酸的一段上具有至少约65%(优选至少约75%、更优选至少约90%)的互补性时,将发生选择性杂交。参见M.Kanehisa,Nucleic Acids Res.12:203(1984),作为参考并入本文。
在一些实施方式中,预扩增引物池包含至少一个甲基化特异性引物对。在一些实施方式中,预扩增引物池包含多个甲基化特异性引物对。在一些实施方式中,预扩增步骤通过甲基化特异性PCR(“MSP”)进行,甲基化特异性PCR是使用甲基化特异性引物的PCR。Herman et al.,(同上)中已描述了该技术(即MSP)。
如本文所用,术语“甲基化特异性引物对”是指经特异性设计以识别CpG位点以利用甲基化的差异来扩增经处理的DNA中的特定目标标志物的引物对。引物仅作用于具有特定甲基化状态或没有特定甲基化状态的分子。例如,引物可以是寡核苷酸,在严谨条件、中等严谨条件或高度严谨条件下,其可以以甲基化特异性方式与具有甲基化的特定CpG位点特异性杂交,但不能与没有甲基化的特定CpG位点杂交。因此,引物将特异性扩增在特定CpG位点具有甲基化的目标标志物。又例如,引物可以是寡核苷酸,在严谨条件、中等严谨条件或高度严谨条件下,其可以以甲基化特异性的方式与未甲基化的特定的CpG位点特异性杂交,但是不能与甲基化的特定的CpG位点杂交。因此,引物将特异性扩增在特定CpG位点没有甲基化的目标标志物。因此,在本申请中,对在经处理的DNA内的至少一个目标标志物的预扩增中使用甲基化特异性引物,可以区分甲基化的和未甲基化的CpG位点。本申请的甲基化特异性引物对包含至少一个与亚硫酸氢盐处理的CpG二核苷酸杂交的引物。因此,所述特异性针对甲基化DNA的引物的序列包含至少一个CpG二核苷酸,并且所述特异性针对未甲基化DNA的引物的序列在CpG的C位置上包含“T”,和/或在CpG中G位置上包含“A”。
甲基化特异性引物对通常包含正向引物和反向引物,所述引物均包含寡核苷酸序列,所述寡核苷酸序列与所述目标标志物之一(或目标标志物的亚区域)的至少9个连续核苷酸在严谨条件下、中等严谨条件下或高度严谨条件下杂交,其中所述目标标志物之一(或目标标志物的亚区域)的至少9个连续核苷酸包含至少一个(例如1、2、3、4、5、6、7、8、9、10或更多个)CpG位点。
如本文所用,术语“杂交”可以指其中两条单链多核苷酸非共价形式结合以形成稳定的双链多核苷酸的过程。在一个方面,所得的双链多核苷酸可以是“杂交物”或“双链”。“杂交条件”中的盐浓度通常约小于1M,经常小于约500mM并且可以小于约200mM。“杂交缓冲液”包括缓冲盐溶液,例如5%SSPE,或本领域已知的其他此类缓冲液。杂交温度可以低至5℃,但是通常高于22℃,并且更为通常地高于约30℃,并且通常超过37℃。杂交通常在 严谨条件下进行,即在该条件下序列将与其目标序列杂交但不与其他非互补序列杂交。严谨条件取决于序列,且在不同情况下有所不同。例如,更长的片段可能需要比短片段更高的杂交温度才能进行特异性杂交。由于其他因素可能会影响杂交的严谨性,包括碱基组成和互补链的长度,有机溶剂的存在以及碱基错配的程度,因此参数组合比单独使用任何一个参数的绝对测量更为重要。通常严谨条件被选定为比特定序列在特定的离子强度和pH下的解链温度(Tm)低约5℃。Tm可以是双链核酸分子群体中的一半被分离成单链的温度。用于计算核酸的Tm的几个方程式是本领域众所周知的。如标准参考文献所示,当核酸在1M NaCl水溶液中时,可以通过公式Tm=81.5+0.41(%G+C)计算出简单估算的Tm值(参见例如Anderson and Young,Quantitative Filter Hybridization,in Nucleic Acid Hybridization(1985))。其他参考文献(例如Allawi and SantaLucia,Jr.,Biochemistry,36:10581-94(1997))包括替代的计算方法,其计算Tm时将结构和环境以及序列特征等考虑在内。
通常,杂交物的稳定性是关于离子浓度和温度的函数。通常,杂交反应在较低严谨条件下进行,然后在具有不同但较高严谨性的洗涤液中洗涤。示例性的严谨条件包括pH约7.0至约8.3、温度至少25℃、钠离子(或其他盐)浓度为至少0.01M至不超过1M。例如,5x SSPE(750mM NaCl,50mM磷酸钠,5mM EDTA,pH 7.4)和约30℃的温度适合于等位基因特异性杂交,尽管合适的温度取决于杂交区域的长度和/或GC含量。在一个方面,确定错配百分比的“杂交严谨性”可以如下:1)高度严谨性:0.1x SSPE,0.1%SDS,65℃;2)中等严谨性(也称为中度严谨性):0.2x SSPE,0.1%SDS,50℃;3)低严谨性:1.0x SSPE,0.1%SDS,50℃。应当理解,使用替代的缓冲剂、盐和温度可以达到相同的严谨性。例如,中等严谨杂交可以是指允许核酸分子(例如探针)结合互补核酸分子的条件。杂交的核酸分子通常具有至少60%的同一性,包括例如至少70%、75%、80%、85%、90%或95%的同一性。中等严谨条件可以是与下述条件达到同等效果的条件:42℃,50%甲酰胺,5x Denhardt溶液,5x SSPE,0.2%SDS杂交,然后用42℃,0.2x SSPE,0.2%SDS进行洗涤。高度严谨条件可以通过如下条件提供,例如,42℃,50%甲酰胺,5x Denhardt溶液,5x SSPE,0.2%SDS杂交,然后65℃,0.1x SSPE和0.1%SDS中洗涤。低严谨性杂交可以是与下述条件达到同等效果的条件:22℃,10%甲酰胺,5x Denhardt溶液,6x SSPE,0.2%SDS杂交,然后在1x SSPE,0.2%SDS中于37℃洗涤。Denhardt的溶液包含1%聚蔗糖,1%聚乙烯吡咯烷酮和1%牛血清白蛋白(BSA)。20x SSPE(氯化钠,磷酸钠,EDTA)包含3M氯化钠、0.2M磷酸钠和0.025M EDTA。其他合适的中等严谨性和高度严谨性杂交缓冲液和条件是本领域技术人员众所周知的,并且描述于例如Sambrook et al.,Molecular Cloning:A Laboratory Manual,2nd ed.,Cold Spring Harbor Press,Plainview,N.Y.(1989)和Ausubel et al.,Short Protocols in Molecular Biology,4th ed.,John Wiley&Sons(1999)。
在一些实施方式中,预扩增引物池还包含用于扩增对照标志物的对照引物对。通常,对照标志物是具有已知特征(例如,序列已知,每个细胞的拷贝数已知)的核酸,用于与实验 目标(例如,浓度未知的核酸)进行比较。对照可以是内源的,优选为不变的基因,可以将分析中的实验核酸或目标核酸相对其进行标准化。此类因为样品间差异而标准化的对照可能发生在例如样品处理,分析效率等,并且允许精确的样品间数据比较,定量分析扩增效率和偏差。
在一些实施方案中,本申请采用RRBS技术检测感兴趣目标标志物的CpG位点的甲基化水平,然后计算该标志物的甲基化单倍型比值(MHF),将其作为该标志物的DNA甲基化水平。MHF的计算可如本申请所述进行。
V对象是否患有癌症的鉴定
本申请的发明人发现,本文所述的一个或多个目标标志物的甲基化水平可用于确定癌症。在一个或多个实施方案中,可检测样品中本文所述目标标志物中CpG位点的甲基化水平,然后计算该目标标志物的甲基化单倍型比值(MHF),将其作为该标志物的DNA甲基化水平。
本文中,MHF可由以下公式计算得到:
MHFi,h=(Ni,h)/Ni
其中i表示目标甲基化区间,h表示目标的甲基化单倍型,Ni表示位于目标甲基化区间的读数数目,Ni,h表示包含目标甲基化单倍型的读数数目。
也可以计算平均甲基化水平(AMF),对于每个目标区域计算区域内甲基化的平均水平。公式如下:
其中m为该目标中总的CpG位点数,i为区间内每个CpG位点,NC,i为该CpG位点碱基为C的reads数(即该位点发生甲基化的reads数),NT,i为该CpG位点碱基为T的reads数(即该位点未甲基化的测序reads数)。
可使用python(V3.9.7)中的sklearn(V1.0.1)包中的逻辑回归模型计算每个目标标志物或多个目标标志物的模型预测分值y:model=LogisticRegression(),该模型的公式如下,其中x为样本目标标志物的甲基化水平值(MHF),w为不同标志物的系数,b为截距值,T表示转置:
本文分别以每个标志物在训练集样本中的DNA甲基化水平构建训练集,以训练集的约登指数界定的阈值作为癌症预测阈值,分别获得了本文所述各个标志物的癌症预测阈值,每一个标志物的癌症预测阈值可见本文表8、表11、表15和表19。
在一些实施方案中,以本文所述的单个目标标志物的甲基化水平为判断依据,根据上述公式计算得到每个样本中该目标标志物的MHF,并通过训练的模型得到该目标标志物的预测分值,若该值高于表8中所示的该目标标志物的阈值,则判断为患有结直肠癌,或存在患有 结直肠癌的风险。
在一些实施方案中,以本文所述的单个目标标志物的甲基化水平为判断依据,根据上述公式计算得到每个样本中该目标标志物的MHF,并通过训练的模型得到该目标标志物的预测分值,若该值高于表11中所示的该目标标志物的阈值,则判断为患有胃癌,或存在患有胃癌的风险。
在一些实施方案中,以本文所述的单个目标标志物的甲基化水平为判断依据,根据上述公式计算得到每个样本中该目标标志物的MHF,并通过训练的模型得到该目标标志物的预测分值,若该值高于表15中所示的该目标标志物的阈值,则判断为患有食管癌,或存在患有食管癌的风险。
在一些实施方案中,以本文所述的单个目标标志物的甲基化水平为判断依据,根据上述公式计算得到每个样本中该目标标志物的MHF,并通过训练的模型得到该目标标志物的预测分值,若该值高于表19中所示的该目标标志物的阈值,则判断为患有肝癌,或存在患有肝癌的风险。
应理解,当使用两个及以上目标标志物时,每个样本都可由检测得到的各目标标志物中的CpG位点的甲基化水平计算获得各自的MHF。在训练集的样本中,使用所有样本得到的所述的两个及以上的目标标记物的MHF进行训练,得到上述预测模型公式的参数。对于待测样本,通过将计算得到该样本的MHF带入到由训练集确定的预测模型的公式中,获得预测模型分值y,并将该y与以由训练集中所述两个及以上目标标记物获得的约登指数界定的阈值相比,其中,高于该阈值则判断为患有癌症,或存在患有癌症的风险。
除上述比较之外,本领域技术人员还可以基于各种因素,例如年龄、性别、病史、家族史、症状等,来确定个体患有癌症的风险。
VI组合物和试剂盒
本发明提供一种用于癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)鉴别的甲基化检测或诊断试剂盒和诊断试剂或诊断组合物,所述试剂盒和组合物包括用于检测本文所述的一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂。根据待检测的目标标志物,试剂盒和组合物中可含有引物和/或探针分子。优选地,引物包括能够与所述待检测的目标标志物或其目标区域在严谨条件下、中等严谨条件下或高度严谨条件下杂交的引物对。引物还可包括检测内参如ACTB的引物。
在一些实施方式中,所述引物被包装在单一容器内或被包装在独立容器内。在一些实施方式中,所述试剂盒进一步包含一个或多个封闭寡核苷酸。
在一些实施方式中,所述试剂盒和组合物进一步包含检测试剂。在一些实施方式中,所述检测试剂选自下组:荧光探针,嵌入染料、生色团标记的探针,放射性同位素标记的探针和生物素标记的探针。
在一些实施方式中,所述试剂盒还可包含DNA聚合酶和/或适合存放从个体获取的生物样品的容器。在一些实施方式中,所述试剂盒进一步含使用说明书和/或对试剂盒检测结果的解释。
在一些实施方式中,所述试剂盒和组合物还可包括用于酶促法或非酶促法进行转化的试剂。在优选的实施方案中,所示试剂盒还包括亚硫酸氢盐试剂或甲基化敏感限制酶(MSRE)。在一些实施方式中,所述亚硫酸氢盐试剂选自下组:亚硫酸氢铵、亚硫酸氢钠、亚硫酸氢钾、亚硫酸氢钙、亚硫酸氢镁、亚硫酸氢铝、亚硫酸氢根离子,及其任意组合。在一些实施方式中,亚硫酸氢盐试剂是亚硫酸氢钠。在一些实施方式中,所述MSRE选自下组:HpaII酶、SalI酶、酶、ScrFI酶、BbeI酶、NotI酶、SmaI酶、XmaI酶、MboI酶、BstBI酶、ClaI酶、MluI酶、NaeI酶、NarI酶、PvuI酶、SacII酶、HhaI酶及其任意组合。
所述试剂盒和组合物还可包括经转化的阳性标准品,其中未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。所述阳性标准品可以是完全甲基化的。
所述试剂盒和组合物还可包括PCR反应试剂。优选地,所述PCR反应试剂包括Taq DNA聚合酶、PCR缓冲液(buffer)、dNTPs、Mg2+
在一些实施方式中,所述试剂盒和组合物还包含可用于进行CpG位置特异性甲基化分析的标准试剂,其中所述分析包括以下一种或多种技术:MS-SNuPE、MSP、MethyLightTM、HeavyMethylTM、COBRA和核酸测序。
在一些实施方式中,所述试剂盒和组合物可包含选自下组的额外的试剂:缓冲液(例如限制酶、PCR、保存或洗涤缓冲液)、DNA回收试剂或试剂盒(例如沉淀、超滤、亲和柱)和DNA回收组件等。
本申请的试剂盒可进一步包含在DNA富集领域中已知的以下组分的一种或几种:蛋白组分,所述蛋白选择性地结合甲基化的DNA;三链形成核酸组分,一个或多个接头,任选地在合适的溶液中;用于进行连接的物质或溶液,例如连接酶、缓冲液;用于进行柱层析的物质或溶液;用于进行免疫学为基础的富集(例如免疫沉淀)的物质或溶液;用于进行核酸扩增的物质或溶液,例如PCR;一种染料或几种染料,若适用于偶联剂,若适用于溶液中;用于进行杂交的物质或溶液;和/或用于进行洗涤步骤的物质或溶液。
在其他一些实施方案中,本申请的组合物含有分离的核酸分子,所述分离的核酸分子选自以下的一种或多种:SEQ ID NO:1-47中任一项所示。。
在其他一些实施方案中,本申请的组合物含有分离的核酸分子,所述分离的核酸分子选自以下的一种或多种:SEQ ID NO:48-95中任一项所示。
在其他一些实施方案中,本申请的组合物含有分离的核酸分子,所述分离的核酸分子选自以下的一种或多种:SEQ ID NO:96-138中任一项所示。
在其他一些实施方案中,本申请的组合物含有分离的核酸分子,所述分离的核酸分子选自以下的一种或多种:SEQ ID NO:139-340中任一项所示。
本申请还包括记载有本文所述分离的核酸分子的序列和任选的其甲基化信息的介质,所述介质用于与基因甲基化测序数据比对以确定所述核酸分子的存在、含量和/或甲基化水平。优选地,所述介质是印有所述序列和任选的其甲基化信息的卡片,例如纸质、塑料、金属、玻璃卡片。优选地,所述介质是存储有所述序列和任选的其甲基化信息和计算机程序的计算机可读介质,当所述计算机程序被处理器执行时,实现下述步骤:将样品的甲基化测序数据与所述序列比较,从而获得所述样品中含所述序列的核酸分子的存在、含量和/或甲基化水平。
本申请还包括一种用于鉴别癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤:(1)获取样品中选自以下一种或多种本文所述的目标标志物或其目标区域的甲基化水平,(2)根据(1)的甲基化水平判读是否为癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)。优选地,所述获取步骤采用本申请第IV部分所述的任意一种方法进行;优选地,所述判读采取本申请第V部分所述的任意一种方法进行。
VII用途
本申请还提供本申请所述的分离的核酸分子做为检测靶标在癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)诊断中的应用。
与现有的分子诊断癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)技术相比,本申请提供的甲基化标志物和技术方案有效地解决了目前诊断技术敏感性低的问题,有助于癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)的早诊早治,以提高治愈率。
基于发明人的发现,本发明提供一种用于筛查癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)风险、诊断癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)、评估癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)预后的方法,包括:(1)检测对象的样品中本申请所述癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)相关序列(一个或多个标志物)的甲基化水平,例如通过测序;(2)比较步骤(1)中标记物的甲基化水平和相应的参考水平,(3)根据比较结果筛查癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)风险、诊断癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)或评估癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)预后。通常,所述方法在步骤(1)之前还包括:样品DNA的抽提、质检、和/或将DNA上未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。
步骤(1)的检测可以是任何适用于检测基因组DNA甲基化的检测方法。在具体实施方案中,步骤(1)包括:用转化试剂处理基因组DNA,使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基(例如尿嘧啶);使用引物进行PCR扩增,所述引物适用于扩增本申请所述癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)相关序列的经转化的序列;通过扩增产物的有或无、或者序列鉴定(例如基于探针的PCR检测鉴定或DNA测序鉴定)确 定至少一个CpG的甲基化水平。或者步骤(1)还可包括:用甲基化敏感的限制性内切酶处理基因组DNA;使用引物进行PCR扩增,所述引物适用于扩增具有本申请所述结直肠癌相关序列中含有至少一个CpG二核苷酸的序列;通过扩增产物的含量确定至少一个CpG的甲基化水平。
在一些实施方案中,步骤(2)中的比较包括:直接比较步骤(1)中标记物的甲基化水平和参考水平,或者通过计算得出评分并比较标记物的甲基化水平的评分和相应的参考评分。优选地,所述评分通过逻辑回归模型进行计算。在一些实施方案中,步骤(3)包括:当标记物的甲基化水平大于参考水平,或者甲基化水平的评分大于参考评分,则所述对象有形成癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)的风险、患有癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)或癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)预后不良。
本申请中,参考水平或参考评分是可作为诊断或筛查依据的参照甲基化水平或评分。这样的水平或评分可以通过基于癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)或风险对象的样品与健康对象、无癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)或风险的对象的样品之间的比较来获得。此外,参考水平或参考评分也可以是健康对象、无癌症(例如,结直肠癌、胃癌、食管癌和/或肝癌)或风险的对象的水平或评分。参考水平或参考评分可以源自一个对象或至少两个对象的群。本领域技术人员可以根据期望的灵敏度和特异性来选择参考水平。
实施方案A
另一方面,本申请提供了如下实施方案:
1.检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂在制备诊断胃癌的检测试剂或诊断试剂盒中的应用,以及用于确定一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的装置在制备诊断胃癌的诊断试剂盒中的应用;其中,所述一个或多个目标标志物选自以下序列(1)-(48)中的任意1、2、3、4、5、6、7、8、9、10、11、12、13、14、15、16、17、18、19、20、21、22、23、24、25、26、27、28、29、30、31、32、33、34、35、36、37、38、39、40、41、42、43、44、45、46、47条或全部48条序列:
((1)含chr6:166970625:166970825(SEQ ID NO:48)及其上游5kb以内和/或下游5kb以内的序列;
(2)含chr11:11600237:11600617(SEQ ID NO:49)及其上游5kb以内和/或下游5kb以内的序列;
(3)含chr17:76929754:76929954(SEQ ID NO:50)及其上游5kb以内和/或下游5kb 以内的序列;
(4)含chr6:391738:391938(SEQ ID NO:51)及其上游5kb以内和/或下游5kb以内的序列;
(5)含chr12:2282090:2282290(SEQ ID NO:52)及其上游5kb以内和/或下游5kb以内的序列;
(6)含chr2:177030134:177030449(SEQ ID NO:53)及其上游5kb以内和/或下游5kb以内的序列;
(7)含chr7:35301095:35301411(SEQ ID NO:54)及其上游5kb以内和/或下游5kb以内的序列;
(8)含chr7:8482114:8482413(SEQ ID NO:55)及其上游5kb以内和/或下游5kb以内的序列;
(9)含chr2:72371208:72371433(SEQ ID NO:56)及其上游5kb以内和/或下游5kb以内的序列;
(10)含chr5:134364359:134364559(SEQ ID NO:57)及其上游5kb以内和/或下游5kb以内的序列;
(11)含chr10:118892523:118892723(SEQ ID NO:58)及其上游5kb以内和/或下游5kb以内的序列;
(12)含chr12:113901298:113901498(SEQ ID NO:59)及其上游5kb以内和/或下游5kb以内的序列;
(13)含chr8:143613755:143613955(SEQ ID NO:60)及其上游5kb以内和/或下游5kb以内的序列;
(14)含chr8:20375580:20375780(SEQ ID NO:61)及其上游5kb以内和/或下游5kb以内的序列;
(15)含chr7:107499318:107499518(SEQ ID NO:62)及其上游5kb以内和/或下游5kb以内的序列;
(16)含chr6:1378941:1379141(SEQ ID NO:63)及其上游5kb以内和/或下游5kb以内的序列;
(17)含chr15:34786976:34787337(SEQ ID NO:64)及其上游5kb以内和/或下游5kb以内的序列;
(18)含chr1:156405314:156405514(SEQ ID NO:65)及其上游5kb以内和/或下游 5kb以内的序列;
(19)含chr8:10588811:10589173(SEQ ID NO:66)及其上游5kb以内和/或下游5kb以内的序列;
(20)含chr4:85418610:85418919(SEQ ID NO:67)及其上游5kb以内和/或下游5kb以内的序列;
(21)含chr5:140871317:140871517(SEQ ID NO:68)及其上游5kb以内和/或下游5kb以内的序列;
(22)含chr5:92906255:92906617(SEQ ID NO:69)及其上游5kb以内和/或下游5kb以内的序列;
(23)含chr14:57265398:57265598(SEQ ID NO:70)及其上游5kb以内和/或下游5kb以内的序列;
(24)含chr19:19650947:19651147(SEQ ID NO:71)及其上游5kb以内和/或下游5kb以内的序列;
(25)含chr11:20618486:20618686(SEQ ID NO:72)及其上游5kb以内和/或下游5kb以内的序列;
(26)含chr7:73407894:73408161(SEQ ID NO:73)及其上游5kb以内和/或下游5kb以内的序列;
(27)含chr16:82660460:82660774(SEQ ID NO:74)及其上游5kb以内和/或下游5kb以内的序列;
(28)含chr13:24844736:24844936(SEQ ID NO:75)及其上游5kb以内和/或下游5kb以内的序列;
(29)含chr20:55500358:55500677(SEQ ID NO:76)及其上游5kb以内和/或下游5kb以内的序列;
(30)含chr10:123923943:123924143(SEQ ID NO:77)及其上游5kb以内和/或下游5kb以内的序列;
(31)含chr20:59827678:59827907(SEQ ID NO:78)及其上游5kb以内和/或下游5kb以内的序列;
(32)含chr20:62330559:62330808(SEQ ID NO:79)及其上游5kb以内和/或下游5kb以内的序列;
(33)含chr19:13209774:13209974(SEQ ID NO:80)及其上游5kb以内和/或下游 5kb以内的序列;
(34)含chr16:2085778:2086156(SEQ ID NO:81)及其上游5kb以内和/或下游5kb以内的序列;
(35)含chr6:108488634:108488917(SEQ ID NO:82)及其上游5kb以内和/或下游5kb以内的序列;
(36)含chr12:115124911:115125191(SEQ ID NO:83)及其上游5kb以内和/或下游5kb以内的序列;
(37)含chr10:124896740:124897020(SEQ ID NO:84)及其上游5kb以内和/或下游5kb以内的序列;
(38)含chr14:55243006:55243206(SEQ ID NO:85)及其上游5kb以内和/或下游5kb以内的序列;
(39)含chr13:36729096:36729334(SEQ ID NO:86)及其上游5kb以内和/或下游5kb以内的序列;
(40)含chr2:10444997:10445197(SEQ ID NO:87)及其上游5kb以内和/或下游5kb以内的序列;
(41)含chr9:2157701:2157901(SEQ ID NO:88)及其上游5kb以内和/或下游5kb以内的序列;
(42)含chr12:57529619:57529819(SEQ ID NO:89)及其上游5kb以内和/或下游5kb以内的序列;
(43)含chr1:119527250:119527450(SEQ ID NO:90)及其上游5kb以内和/或下游5kb以内的序列;
(44)含chr1:119532788:119532988(SEQ ID NO:91)及其上游5kb以内和/或下游5kb以内的序列;
(45)含chr15:96909441:96909641(SEQ ID NO:92)及其上游5kb以内和/或下游5kb以内的序列;
(46)含chr1:146551463:146551747(SEQ ID NO:93)及其上游5kb以内和/或下游5kb以内的序列;
(47)含chr17:35293755:35293955(SEQ ID NO:94)或其上下游各5kb以内的序列;和
(48)含chr17:59482763:59482963(SEQ ID NO:95)或其上下游各5kb以内的序列。
2.如实施方式1所述的应用,其特征在于,
所述一个或多个目标标志物包括所述第(3)、(8)、(13)、(15)、(17)、(19)、(22)、(25)、(29)、(31)、(37)、(38)、(40)、(41)、(42)、(43)、(45)、(47)和(48)项所述的序列;或
所述一个或多个目标标志物包括所述第(2)、(6)、(7)、(8)、(12)、(15)、(19)、(25)、(28)、(32)、(33)、(36)、(37)、(40)、(42)、(43)、(44)、(46)和(48)项所述的序列;或
所述一个或多个目标标志物包括所述第(3)、(13)、(14)、(20)、(22)、(28)、(30)和(36)项所述的序列;或
所述一个或多个目标标志物包括所述第(3)、(13)、(27)、(30)和(35)项所述的序列;或
所述一个或多个目标标志物包括所述第(7)、(14)、(22)、(26)、(35)、(38)、(40)、(43)、(47)和(48)项所述的序列。
3.如实施方式1所述的应用,其特征在于,所述一个或多个目标标志物选自所述第(7)、(14)、(22)、(26)、(35)、(38)、(40)、(43)、(47)和(48)项中任意1、2、3、4、5、6、7、8或9项所述的序列。
4.如实施方式1所述的应用,其特征在于,
所述目标标志物包括第(40)项所述序列,以及第(1)-(39)和(41)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(47)项所述序列,以及第(1)-(46)和(48)中的任意一条或多条序列;或
所述目标标志物包括第(43)项所述序列,以及第(1)-(42)和(44)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(26)项所述序列,以及第(1)-(25)和(27)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(35)项所述序列,以及第(1)-(34)和(36)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(14)项所述序列,以及第(1)-(13)和(15)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(38)项所述序列,以及第(1)-(37)和(39)-(48)中的任 意一条或多条序列;或
所述目标标志物包括第(22)项所述序列,以及第(1)-(21)和(23)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(7)项所述序列,以及第(1)-(6)和(8)-(48)中的任意一条或多条序列;或
所述目标标志物包括第(48)项所述序列,以及第(1)-(47)中的任意一条或多条序列。
5.如实施方式1-4中任一项所述的应用,其特征在于,所述目标标志物包括所述SEQ ID NO:48-95各序列各起始位点的上游1kb以内、优选500bp以内、更优选300bp以内、更优选100bp以内的序列和/或各末端位点的下游1kb以内、优选500bp以内、优选300bp以内、优选100bp以内的序列;优选地,所述目标标志物是含有所述SEQ ID NO:48-95任一序列且长度为400bp以内的基因序列。
6.如实施方式1-4中任一项所述的应用,其特征在于,所述第(1)到第(48)项所述的序列分别是SEQ ID NO:48-95所示的序列。
7.如实施方式1-6中任一项所述的应用,其特征在于,所述试剂包括引物和/或探针分子;
优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交。
8.如实施方式1-6中任一项所述的应用,其特征在于,所述试剂为实施基因组简化甲基化测序技术所需的试剂。
9.一种用于检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或甲基化水平以诊断胃癌的诊断试剂或诊断试剂盒,其包含用于检测一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平的试剂;其中,所述一个或多个目标标志物如实施方式1-6中任一项所述。
10.如实施方式9所述的诊断试剂或诊断试剂盒,其特征在于,所述诊断试剂或诊断试剂盒包括引物和/或探针分子;优选地,所述引物分子相同于、互补于或在严谨条件下杂交于所述一个或多个目标标志物并包含至少9个连续的核苷酸,所述探针分子与所述一个或多个目标标志物的扩增产物在严谨条件下杂交;
任选地,所述诊断试剂或诊断试剂盒还包括检测内参基因ACTB的引物分子和/或探针分子。
11.如实施方式9所述的诊断试剂或诊断试剂盒,其特征在于,所述诊断试剂或诊断试剂盒还包括选自以下的一种或多种物质:PCR缓冲液、聚合酶、dNTP、限制性内切酶、酶切缓冲液、荧光染料、荧光淬灭剂、荧光报告剂、外切核酸酶、碱性磷酸酶、内标、对照物、KCl、MgCl2和(NH4)2SO4
12.如实施方式9所述的诊断试剂或诊断试剂盒,其特征在于,所述试剂还包括下述一个或多个方法中所用的试剂:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析和质谱。
13.如实施方式12所述的诊断试剂或诊断试剂盒,其特征在于,所述试剂选自以下一种或多种:重亚硫酸盐及其衍生物、荧光染料、荧光淬灭剂、荧光报告剂、内标和对照物。
14.区分基因组DNA至少一个靶区域内甲基化和未甲基化CpG二核苷酸的至少一种试剂或成组试剂在制备用于检测和/或分类个体中胃癌的方法的试剂盒中的用途,其中所述方法包括使从所述个体生物样品中分离的基因组DNA与所述至少一种试剂或成组试剂接触,其中所述靶区域等同于或互补于一个或多个目标标志物的至少16连续核苷酸的序列,其中所述连续核苷酸包含至少一个CpG二核苷酸序列,由此至少部分地提供对胃癌的检测和/或分类,其中,所述一个或多个目标标志物如实施方式1-6中任一项所述。
15.将5位未甲基化的胞嘧啶碱基转化为尿嘧啶或在杂交性能方面可检测地不同于胞嘧啶的其它碱基的一种或多种试剂、扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中胃癌的方法的试剂盒中的用途,其中所述方法包括:
a)从所述个体生物样品分离基因组DNA;
b)用所述一种或多种试剂处理a)的所述基因组DNA或其片段;
c)使所述经处理的基因组DNA或其经处理的片段与所述扩增酶和所述至少一种引物接触,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物,其中所述经处理的基因组DNA或其片段被扩增以产生至少一种扩增产物或不被扩增;以及
d)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,或者反映所述一个或多个目标标志物的多个CpG二核苷酸平均甲基化状态或水平的均值或值,由此至少部分地检测和/或分类个体中的胃癌;
其中,所述一个或多个目标标志物如实施方式1-6中任一项所述。
16.如实施方式15所述的用途,其中步骤b)中,使用选自亚硫酸氢盐、酸式亚硫酸盐、焦亚硫酸盐及其组合的试剂处理所述基因组DNA或其片段。
17.如实施方式16所述的用途,其中c)中,通过使用耐热DNA聚合酶作为所述扩增酶、使用缺乏5’-3’外切酶活性的聚合酶、使用聚合酶链式反应和/或产生带有可检测标记的扩增产物进行核酸分子的接触或扩增。
18.如实施方式15所述的用途,其中c)中的接触或扩增包括使用甲基化特异的引物。
19.一种或多种甲基化敏感限制酶和扩增酶以及至少一种包含至少9个连续核苷酸的引物在制备用于检测和/或分类个体中胃癌的方法的试剂盒中的用途,其中,所述引物相同于、互补于或在严谨条件下杂交于一个或多个目标标志物;所述方法包括:
a)从所述个体生物样品分离基因组DNA;
b)以所述一种或多种甲基化敏感限制酶消化a)所述的基因组DNA或其片段,使所得消化产物与所述扩增酶和所述至少一种引物接触;和
c)基于所述扩增物是否存在或其性质,确定所述一个或多个目标标志物的至少一个CpG二核苷酸的甲基化状态或水平,由此至少部分地检测和/或分类个体中的胃癌;
其中,所述一个或多个目标标志物如实施方式1-6中任一项所述。
20.如实施方式19所述的用途,其特征在于,通过杂交至少一种核酸或肽核酸来确定扩增产物的存在与否,所述至少一种核酸或肽核酸等同于或互补于选自所述一个或多个目标标志物的序列的至少16碱基长片段。
21.衍生自一个或多个目标标志物的经处理的核酸在制备用于诊断胃癌的试剂盒中的用途,其中所述处理适合于将所述一个或多个目标标志物的至少一个未甲基化的胞嘧啶碱基转化至尿嘧啶或在杂交上可检测地不同于胞嘧啶的其它碱基,所述一个或多个目标标志物如实施方式1-6中任一项所述。
22.用于检测并诊断个体胃癌的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现以下步骤:(1)获取样品中一个或多个目标标志物的至少一个CpG二核苷酸的甲基化水平或甲基化状态,和(2)根据(1)的甲基化水平或甲基化状态判读胃癌;
其中,所述一个或多个目标标志物如实施方式1-6中任一项所述。
实施方案B
另一方面,本申请提供了以下实施方案:
1.一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自表1中任意一种或多种DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
2.一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
3.一种评估食管癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:105上游或下游5k bp以内的区域以及表2中任意一种或多种基因所在的DNA区域、或其片段的修饰状态的存在和/或含量。
4.如实施方式1-3中任一项所述的方法,所述方法还包含获取待测样本中的核酸。
5.如实施方式4所述的方法,所述核酸包含无细胞游离核酸。
6.如实施方式1-5中任一项所述的方法,所述待测样本包含组织、细胞和/或体液。
7.如实施方式1-6中任一项所述的方法,所述待测样本包含血浆。
8.如实施方式1-7中任一项所述的方法,所述方法还包含转化所述DNA区域或其片段。
9.如实施方式8所述的方法,具有所述修饰状态的碱基以及不具有所述修饰状态的所述碱基,在转化后分别形成不同的物质。
10.如实施方式1-9中任一项所述的方法,具有所述修饰状态的碱基在转化后基本不发生改变,且不具有所述修饰状态的所述碱基在转化后改变为与所述碱基不同的其它碱基、或在转化后被剪切。
11.如实施方式9-10中任一项所述的方法,所述碱基包含胞嘧啶。
12.如实施方式1-11中任一项所述的方法,所述修饰状态包含甲基化修饰。
13.如实施方式10-12中任一项所述的方法,所述其它碱基包含尿嘧啶。
14.如实施方式8-13中任一项所述的方法,所述转化包含通过脱氨基试剂和/或甲基化敏感限制酶转化。
15.如实施方式14所述的方法,所述脱氨基试剂包含亚硫酸氢盐或其类似物。
16.如实施方式1-15中任一项所述的方法,所述确定修饰状态的存在和/或含量的方法包含,确定具有所述修饰状态的DNA区域或其片段的存在和/或含量。
17.如实施方式1-16中任一项所述的方法,通过测序方法检测具有所述修饰状态的DNA区域或其片段的存在和/或含量。
18.如实施方式1-17中任一项所述的方法,通过确认所述DNA区域或其片段的修饰状态的存在和/或所述DNA区域或其片段相对于参考水平具有更高的修饰状态的含量,确定肿瘤的存在和/或进展。
19.一种核酸,所述核酸包含能够结合选自表1中任意一种或多种DNA区域、或其互补区 域、或上述的转化而来的区域、或上述的片段的序列。
20.一种核酸,所述核酸包含能够结合选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
21.一种核酸,所述核酸包含能够结合选自SEQ ID NO:105上游或下游5k bp以内的区域以及表2中任意一种或多种基因所在的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
22.一种试剂盒,包含如实施方式19-21中任一项所述的核酸。
23.如实施方式19-21中任一项所述的核酸、和/或实施方式22所述的试剂盒,在制备疾病检测产品中的应用。
24.如实施方式19-21中任一项所述的核酸、和/或实施方式22所述的试剂盒,在制备评估食管癌的存在和/或进展的物质中的应用。
25.如实施方式19-21中任一项所述的核酸、和/或实施方式22所述的试剂盒,在制备确定所述DNA区域或其片段的修饰状态的物质中的应用。
26.一种制备核酸的方法,包含根据选自表1中任意一种或多种DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
27.一种制备核酸的方法,包含根据选自SEQ ID NO:96至138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
28.一种制备核酸的方法,包含根据选自SEQ ID NO:105上游或下游5k bp以内的区域以及表2中任意一种或多种基因所在的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
29.用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估食管癌的存在和/或进展的物质中的应用,所述用于确定的DNA区域包含选自表1中任意一种或多种DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
30.用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估食管癌的存在和/或进展的物质中的应用,所述用于确定的DNA区域包含选自SEQ ID NO:96至 138中任一项所示上游或下游5k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
31.用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估食管癌的存在和/或进展的物质中的应用,所述用于确定的DNA区域包含选自SEQ ID NO:105上游或下游5k bp以内的区域以及表2中任意一种或多种基因所在的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
32.如实施方式29-31中任一项所述的应用,所述修饰状态包含甲基化修饰。
33.一种储存介质,其记载可以运行实施方式1-18中任一项所述的方法的程序。
34.一种设备,其包含实施方式33所述的储存介质,以及任选地还包含耦接至所述储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现实施方式1-18中任一项所述的方法。
实施方案C
另一方面,本申请提供了以下实施方案:
1.一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自表3中任意一种或多种DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
2.一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自SEQ ID NO:139至340中任一项所示上游或下游1k bp以内的DNA区域、或其互补区域、或上述的片段的修饰状态的存在和/或含量。
3.一种评估肝癌的存在和/或进展的方法,包含确定待测样本中选自表4中任意一种或多种基因所在上游或下游1k bp以内的DNA区域、或其片段的修饰状态的存在和/或含量。
4.如实施方式1-3中任一项所述的方法,所述方法还包含获取待测样本中的核酸。
5.如实施方式4所述的方法,所述核酸包含无细胞游离核酸。
6.如实施方式1-5中任一项所述的方法,所述待测样本包含组织、细胞和/或体液。
7.如实施方式1-6中任一项所述的方法,所述待测样本包含血浆。
8.如实施方式1-7中任一项所述的方法,所述方法还包含转化所述DNA区域或其片段。
9.如实施方式8所述的方法,具有所述修饰状态的碱基以及不具有所述修饰状态的所述碱基,在转化后分别形成不同的物质。
10.如实施方式1-9中任一项所述的方法,具有所述修饰状态的碱基在转化后基本不发生改变,且不具有所述修饰状态的所述碱基在转化后改变为与所述碱基不同的其它碱基、或在转化后被剪切。
11.如实施方式9-10中任一项所述的方法,所述碱基包含胞嘧啶。
12.如实施方式1-11中任一项所述的方法,所述修饰状态包含甲基化修饰。
13.如实施方式10-12中任一项所述的方法,所述其它碱基包含尿嘧啶。
14.如实施方式8-13中任一项所述的方法,所述转化包含通过脱氨基试剂和/或甲基化敏感限制酶转化。
15.如实施方式14所述的方法,所述脱氨基试剂包含亚硫酸氢盐或其类似物。
16.如实施方式1-15中任一项所述的方法,所述确定修饰状态的存在和/或含量的方法包含,确定具有所述修饰状态的DNA区域或其片段的存在和/或含量。
17.如实施方式1-16中任一项所述的方法,通过测序方法检测具有所述修饰状态的DNA区域或其片段的存在和/或含量。
18.如实施方式1-17中任一项所述的方法,通过确认所述DNA区域或其片段的修饰状态的存在和/或所述DNA区域或其片段相对于参考水平具有更高的修饰状态的含量,确定肿瘤的存在和/或进展。
19.一种核酸,所述核酸包含能够结合选自表3中任意一种或多种DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
20.一种核酸,所述核酸包含能够结合选自SEQ ID NO:139至340中任一项所示上游或下游1k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
21.一种核酸,所述核酸包含能够结合选自表4中任意一种或多种基因所在上游或下游1kbp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
22.一种试剂盒,包含如实施方式19-21中任一项所述的核酸。
23.如实施方式19-21中任一项所述的核酸、和/或实施方式22所述的试剂盒,在制备疾病检测产品中的应用。
24.如实施方式19-21中任一项所述的核酸、和/或实施方式22所述的试剂盒,在制备评估食管癌的存在和/或进展的物质中的应用。
25.如实施方式19-21中任一项所述的核酸、和/或实施方式22所述的试剂盒,在制备确定所述DNA区域或其片段的修饰状态的物质中的应用。
26.一种制备核酸的方法,包含根据选自表3中任意一种或多种DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的修饰状态,设计能够结合所述DNA区 域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
27.一种制备核酸的方法,包含根据选自SEQ ID NO:139至340中任一项所示上游或下游1k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
28.一种制备核酸的方法,包含根据选自表4中任意一种或多种基因所在上游或下游1k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的修饰状态,设计能够结合所述DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的核酸。
29.用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估肝癌的存在和/或进展的物质中的应用,所述用于确定上游或下游1k bp以内的DNA区域包含选自表3中任意一种或多种DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
30.用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估肝癌的存在和/或进展的物质中的应用,所述用于确定上游或下游1k bp以内的DNA区域包含选自SEQ ID NO:139至340中任一项所示上游或下游1k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
31.用于确定DNA区域修饰状态的核酸、核酸组和/或试剂盒,在制备用于评估肝癌的存在和/或进展的物质中的应用,所述用于确定上游或下游1k bp以内的DNA区域包含选自表4中任意一种或多种基因所在上游或下游1k bp以内的DNA区域、或其互补区域、或上述的转化而来的区域、或上述的片段的序列。
32.如实施方式29-31中任一项所述的应用,所述修饰状态包含甲基化修饰。
33.一种储存介质,其记载可以运行实施方式1-18中任一项所述的方法的程序。
34.一种设备,其包含实施方式33所述的储存介质,以及任选地还包含耦接至所述储存介质的处理器,所述处理器被配置为基于存储在所述储存介质中的程序执行以实现实施方式1-18中任一项所述的方法。
不欲被任何理论所限,下文中的实施例仅仅是为了阐释本申请的融合蛋白、制备方法和用途等,而不用于限制本申请发明的范围。
实施例
实施例1结直肠癌样本处理及甲基化标志物筛选
实验样本
收集了总计108个结直肠癌血液样本、108个年龄性别匹配的无结直肠癌血液样本,所有入组患者签署知情同意书,样本信息见表6:
表6
实验方法
1.样本cfDNA提取
所有血液样本均在Streck管中采集,为了提取血浆,首先将血液样本在4℃下以1600g离心10min。为了防止破坏buffy coat层,需要设置了平滑制动模式。然后将上清液转移到新的1.5ml锥形管中,并在4℃下以16000g离心10min。将上清液再次转移到新的1.5ml锥形管中,并在-80℃下储存。
为了提取循环游离DNA(cfDNA),根据制造商的说明,将血浆等分解冻并立即使用QIAamp循环核酸提取试剂盒(Qiagen 55114)进行处理。提取的cfDNA浓度用qubit3.0定量。
2.亚硫酸氢盐转化与文库制备
亚硫酸氢钠转化胞嘧啶碱基进行了使用亚硫酸氢盐转化试剂盒(ThermoFisher,MECOV50)。根据制造商的说明,将20ng基因组DNA或ctDNA转化并纯化以用于下游应用。
样品DNA的抽提、质检、和将DNA上未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。 在一个或多个实施方案中,所述转化使用酶促方法进行,优选脱氨酶处理,或所述转化使用非酶促方法进行,优选用亚硫酸氢盐或重硫酸盐处理,更优选使用亚硫酸氢钙、亚硫酸氢钠、亚硫酸氢钾、亚硫酸氢铵、重硫酸钠、重硫酸钾和重硫酸铵处理。
使用MethylTitan(CN201910515830)方法建库,MethylTitan方法,具体如下,被亚硫酸氢盐转化的DNA去磷酸化后连接到带有分子标签(UMI)的通用illumina测序接头上。在进行第二链合成和纯化后,对转化后的DNA进行半靶向PCR反应,以靶向扩增需要的目标区域。再次纯化后,通过PCR反应将样本特异的条形码和全长Illumina测序接头加到目标DNA分子上。最终形成的文库然后使用Illumina的KAPA文库定量试剂盒进行定量(KK4844),并在Illumina测序仪进行测序,MethylTitan建库方式在使用较少DNA量,尤其是cfDNA的情况下可以有效的富集需要的目标片段,同时改方法可以很好的保留原始DNA的甲基化状态,最终通过分析相邻的CpG甲基化胞嘧啶(一个给定的目标可能有几个到几十个CpG,这取决于给定的区域),该特定区域的整个甲基化模式可以作为一个独特的标记,而不是对单个碱基的状态进行比较。
3.测序及数据预处理
1)使用Illumina Hiseq 2500测序仪进行双端测序,测序量为每个样本25~35M;使用Trim_galore v 0.6.0、cutadapt v2.1软件对Illumina Hiseq 2500测序仪下机的双端150bp测序数据进行去接头处理。在Read 1的3’端去除接头序列为“AGATCGGAAGAGCACACGTCTGAACTCCAGTC(SEQ ID NO:341)”,在Read 2的3’端去除接头序列“AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT(SEQ ID NO:342)”,并去除两端测序质量值低于20的碱基。如果5’端有3bp的接头序列则去掉整条read。去接头后短于30个碱基的read也被去掉。
2)使用Pear v0.9.6软件合并双端序列为单端序列。合并至少重叠20个碱基的两端reads,如果合并之后的reads短于30个碱基则舍弃。
4.测序数据比对
本实施例使用的参考基因组数据来自UCSC数据库(UCSC:hg19,http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz)。
1)首先将hg19使用Bismark软件分别进行胞嘧啶到胸腺嘧啶(CT)和腺嘌呤到鸟嘌呤(GA)的转化,并且分别对转换后的基因组使用Bowtie2软件构建索引。
2)将预处理的数据同样进行CT和GA转化。
3)使用Bowtie2软件分别将转化后的序列比对到转化后的HG19参考基因组,最短种子序列长度20,种子序列不允许错配。
5.提取甲基化信息
对于每个目标区域hg19的CpG位点,根据上述比对结果,获取每个位点对应的甲基化水平。本发明涉及到的位点的核苷酸编号对应于hg19的核苷酸位置编号。
1)甲基化单倍型比例(MHF)的计算,对于每个目标区域hg19的CpG位点,根据上述比对结果,获取reads中每个位点对应的碱基序列,C表示该位点发生甲基化,T表示该位点未甲基化状态。本申请中位点的核苷酸编号对应于HG19的核苷酸位置编号。一个目标甲基化区域可能有多个甲基化haplotype,对于目标区域内的每一个甲基化haplotype都需要进行该值的计算,MHF的计算公式示例如下:
MHFi,h=(Ni,h)/Ni
其中i表示目标甲基化区间,h表示目标的甲基化haplotype,Ni表示位于目标甲基化区间的reads数目,Ni,h表示包含目标甲基化haplotype的reads数目
2)平均甲基化水平(AMF)的计算,对于每个目标区域计算区域内甲基化的平均水平。公式如下:
其中m为该目标中总的CpG位点数,i为区间内每个CpG位点,NC,i为该CpG位点碱基为C的reads数(即该位点发生甲基化的reads数),NT,i为该CpG位点碱基为T的reads数(即该位点未甲基化的测序reads数)
6.甲基化单倍型数据矩阵
1)将训练集和测试集的各个样本的甲基化单倍型比例(MHF)和平均甲基化水平(AMF)数据分别合并成数据矩阵,对每个深度低于200的位点做缺失值处理。
2)去除缺失值比例高于10%的位点。
3)对于数据矩阵的缺失值,利用KNN算法进行缺失数据插补。首先使用训练集利用KNN算法训练插补器,然后分别对训练集矩阵和测试集矩阵进行插补。
7.根据特征矩阵筛选甲基化标志物(图1)
1)对训练集随机分成3折,取其中2份作为训练集构建逻辑回归模型,其中1份作为验证数据,对验证数据进行预测。重复5次后,计算每个目标区域验证集平均AUC。对每个目标区域筛选AUC最大的特征作为该区域的代表特征,并按照AUC从大到小排序。
2)将训练集随机分成5份做5折交叉验证,重复10次,进行增量特征筛选。具体过程为:留出训练集中的一份数据作为验证数据,其余训练集数据作为训练数据。按照上述顺序依次将每个区域的代表特征加入特征组合,使用4份训练数据构建逻辑回归模型,对验证数 据进行预测。重复10次后计算验证数据平均AUC。
3)如果训练数据的AUC增加则保留该甲基化标志物,否则则去掉,循环过后将得到的特征组合作为甲基化标志物组合,使用所有训练集数据训练新的模型,并使用测试集数据进行验证。
实施例2甲基化靶向测序筛选结直肠癌特异性的甲基化位点
发明人从大量候选区域中筛选出47个甲基化标志物,其基因组位置和关联基因如表1所示,甲基化标志物基因组位置指该甲基化标志物在UCSC(https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19)HG19基因组位置。甲基化标志物关联基因指TSS距离甲基化标志物100Kb内,并且距离最近的基因。
选择SEQ ID NO:1-47所示的序列作为实施例中使用的甲基化标志物,每个甲基化标志物的所有CpG位点的甲基化水平都可以通过MethylTitan甲基化测序的方法获得。每个区域中所有CpG位点甲基化水平的均值,单个CpG位点的甲基化水平,以及区域内CpG位点甲基化单倍型组合都可以作为结直肠癌的标志物。
图2箱线图展示训练集结直肠癌和非结直肠癌47个甲基化标志物甲基化水平分布。图3箱线图展示测试集结直肠癌和非结直肠癌47个甲基化标志物甲基化水平分布。从图2和图3的箱线图中可以看出,甲基化标志物区域内的平均甲基化水平在结直肠癌癌与无结直肠癌cfDNA样品中的分布显著不同,具备良好的区分效果。
表7中P值为Mann Whitney U Test P value,甲基化水平表示该组cfDNA样品甲基化水平中位数。表7的统计结果也显示本申请的47个甲基化标志物,甲基化水平在结直肠癌和非结直肠癌样本间具有显著性的差异(P<0.001),是良好的结直肠癌甲基化标志物。
表7在训练集和测试集中甲基化标志物在结直肠癌中的甲基化水平

实施例3单个甲基化标志物判别结直肠癌是否存在的性能
为了验证单个甲基化标志物的区分结直肠癌和无结直肠癌的性能,使用单个marker的甲 基化水平数据在实施例1训练集数据中训练模型,并使用测试集样本对模型的性能进行验证。
使用python(V3.9.7)中的sklearn(V1.0.1)包中的逻辑回归模型:model=LogisticRegression(),该模型的公式如下,其中x为样本目标marker的甲基化水平值,w为不同marker的系数,b为截距值,y为模型预测分值:
使用训练集的样本进行训练:model.fit(Traindata,TrainPheno),其中TrainData是训练集样本中目标甲基化位点的数据,TrainPheno是训练集样本的性状(结直肠癌为1,无结直肠癌为0),并根据训练集的样本确定模型的相关阈值。
使用测试集的样本进行测试:TestPred=model.predict_proba(TestData),其中TestData为测试集样本中目标甲基化位点的数据,TestPred为模型预测分值,使用该预测分值并根据上述阈值对样本是否是结直肠癌进行判断。
本实施例中单个甲基化标志物逻辑回归模型的效果见表8,从该表中可看出,所有的甲基化标志物的不论在测试集和训练集都可以达到0.75以上的AUC,都是较好的结直肠癌标志物。
本申请中单个甲基化标志物均可作为结直肠癌标志物,采用逻辑回归建模,根据训练集设置阈值,大于阈值则预测为结直肠癌,反之则预测为非结直肠癌,训练集和测试集都能达到很好的准确性,特异性和灵敏性,采用其它机器学习模型也可达到相似效果。
表8单个甲基化标志物逻辑回归模型的表现

实施例4结直肠癌所有目标甲基化标志物的预测结果
本实施例使用所有的47个甲基化标志物的甲基化水平构建了逻辑回归的机器学习模型ALLMODEL,数据中准确区分出结直肠癌和非结直肠癌的样本。具体的步骤与实施例2基本一致,不同之处是使用了所有47个目标甲基化标志物组合(SEQ ID NO:1-47)的数据输入 模型。
训练集和测试集中模型预测分值分布见图4。ROC曲线见图5,在训练集中结直肠癌和无结直肠癌样本区分的AUC达到了0.965,测试集中,结直肠癌和无结直肠癌样本区分的AUC达到了0.965,设置阈值为0.441,大于该值预测为结直肠癌,反之则预测为无结直肠癌,在该阈值下,训练集准确性为0.894,训练集特异性为0.932,训练集敏感性为0.859,测试集准确性为0.892,测试集特异性为0.914,测试集敏感性为0.867,该模型可以较好地从样本中区分出结直肠癌和无结直肠癌样本。
实施例5结直肠癌9个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的47个甲基化标志物的甲基化水平中挑选SEQ ID NO:4,SEQ ID NO:11,SEQ ID NO:15,SEQ ID NO:18,SEQ ID NO:19,SEQ ID NO:30,SEQ ID NO:34,SEQ ID NO:37,SEQ ID NO:41共9个甲基化标志物构建了逻辑回归的机器学习模型SUBMODEL1。
机器学习模型构建的方法也同实施例3一致,但相关样本只使用了该实施例中的以上9个标志物的数据,该模型在训练集和测试集中的模型得分见图6,该模型ROC曲线见图7。可看出该模型在训练集和测试集中,结直肠癌和无结直肠癌样本分值同其他癌种分值具有显著差异,该模型训练集中腺癌和无结直肠癌样本区分的AUC达到了0.921,测试集中,结直肠癌和无结直肠癌样本区分的AUC达到了0.917,设置阈值为0.502,大于该值预测为结直肠癌,反之则预测为无结直肠癌,在该阈值下,训练集准确性为0.854,训练集特异性为0.822,训练集敏感性为0.885,测试集准确性为0.800,测试集特异性为0.800,测试集敏感性为0.800,说明了该组合模型良好的性能。
实施例6结直肠癌6个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的47个甲基化标志物的甲基化水平中挑选SEQ ID NO:1,SEQ ID NO:21,SEQ ID NO:29,SEQ ID NO:36,SEQ ID NO:44,SEQ ID NO:47共6个甲基化标志物构建了逻辑回归的机器学习模型SUBMODEL2。
机器学习模型构建的方法也同实施例3一致,但相关样本只使用了该实施例中的以上6个标志物的数据,该模型在训练集和测试集中的模型得分见图8,该模型ROC曲线见图9。可看出该模型在训练集和测试集中,结直肠癌和无结直肠癌样本分值具有显著差异,该模型训练集中结直肠癌和无结直肠癌样本区分的AUC达到了0.916,测试集中,结直肠癌和无结 直肠癌样本区分的AUC达到了0.879,设置阈值为0.392,大于该值预测为结直肠癌,反之则预测为无结直肠癌,在该阈值下,训练集准确性为0.841,训练集特异性为0.877,训练集敏感性为0.822,测试集准确性为0.785,测试集特异性为0.714,测试集敏感性为0.867,说明了该组合模型良好的性能。
实施例7结直肠癌7个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的47个甲基化标志物的甲基化水平中挑选SEQ ID NO:6,SEQ ID NO:10,SEQ ID NO:13,SEQ ID NO:14,SEQ ID NO:22,SEQ ID NO:28,SEQ ID NO:43共7个甲基化标志物构建了逻辑回归的机器学习模型SUBMODEL3。
机器学习模型构建的方法也同实施例3一致,但相关样本只使用了该实施例中的以上7个标志物的数据,该模型在训练集和测试集中的模型得分见图10,该模型ROC曲线见图11。可看出该模型在训练集和测试集中,结直肠癌和无结直肠癌样本分值同其他癌种分值具有显著差异,该模型训练集中腺癌和无结直肠癌样本区分的AUC达到了0.911,测试集中,结直肠癌和无结直肠癌样本区分的AUC达到了0.932,设置阈值为0.507,大于该值预测为结直肠癌,反之则预测为无结直肠癌,在该阈值下,训练集准确性为0.848,训练集特异性为0.973,训练集敏感性为0.731,测试集准确性为0.815,测试集特异性为0.971,测试集敏感性为0.633,说明了该组合模型良好的性能。
实施例8胃癌样本处理及甲基化标志物筛选
收集了总计206个胃癌患者,以及393个正常人,所有入组患者签署知情同意书。将这些样本按照一定的比例分为训练集和测试集,其中训练集用于下述机器学习模型的构建,测试集用于模型的性能测试,样本信息见下表9。
表9

样本处理、策略及数据预处理过程同实施例1,计算MHF提取甲基化信息后,进行甲基化单倍型数据矩阵:
1)将训练集和测试集的各个样本的甲基化单倍型数据分别合并成数据矩阵,对每个深度低于100的位点做缺失值处理。
2)去除缺失值比例高于10%的位点。
3)对于数据矩阵的缺失值,利用KNN算法进行缺失数据插补。
然后根据训练集样本分组发现特征甲基化单倍型:
1)将数据集按年龄匹配随机分成三份。
2)留出数据集中的一份数据作为测试数据,其余数据作为训练数据。
3)训练集内部进一步分成3分,进行3折交叉验证。基于3折交叉验证的平均AUC,筛选marker。
4)步骤3中得到的marker,基于Logistic Regression模型,使用训练数据进行模型训练,并在测试数据中进行模型效果的验证。
5)将得到的甲基化标志物用Great进行基因注释。
筛选出的胃癌特异性的甲基化标志物具体如下:位于MPC1内或者该基因上下游的SEQ ID NO:48;位于GALNT18内或者该基因上下游的SEQ ID NO:49;位于TIMP2内或者该基因上下游的SEQ ID NO:50;位于IRF4内或者该基因上下游的SEQ ID NO:51;位于CACNA1C内或者该基因上下游的SEQ ID NO:52;位于HOXD4内或者该基因上下游的SEQ ID NO:53;位于TBX20内或者该基因上下游的SEQ ID NO:54;位于NXPH1内或者该基因上下游的SEQ ID NO:55;位于CYP26B1内或者该基因上下游的SEQ ID NO:56;位于PITX1内或者该基因上下游的SEQ ID NO:57;位于VAX1内或者该基因上下游的SEQ ID NO:58;位于LHX5内或者该基因上下游的SEQ ID NO:59;位于ARC内或者该基因上下游的SEQ ID NO:60;位于LZTS1内或者该基因上下游的SEQ ID NO:61;位于DLD内或者该基因上下游的SEQ ID NO:62;位于FOXF2内或者该基因上下游的SEQ ID NO:63;位于GOLGA8A内或者该基因上下游的SEQ ID NO:64;位于C1orf61内或者该基因上下游的SEQ ID NO:65;位于SOX7内或者该基因上下游的SEQ ID NO:66;位于NKX6-1内或者该基因上下游的SEQ ID NO:67;位于PCDHGC5内或者该基因上下游的SEQ ID NO:68;位于NR2F1内或者该基因上下游的SEQ ID NO:69;位于OTX2内或者该基因上下游的SEQ ID NO:70;位于CILP2内或者该基因上下游的SEQ ID NO:71;位于SLC6A5内或者该基因上下游的SEQ ID NO:72;位于ELN 内或者该基因上下游的SEQ ID NO:73;位于CDH13内或者该基因上下游的SEQ ID NO:74;位于C1QTNF9内或者该基因上下游的SEQ ID NO:75;位于TFAP2C内或者该基因上下游的SEQ ID NO:76;位于TACC2内或者该基因上下游的SEQ ID NO:77;位于CDH4内或者该基因上下游的SEQ ID NO:78;位于TNFRSF6B内或者该基因上下游的SEQ ID NO:79;位于LYL1内或者该基因上下游的SEQ ID NO:80;位于SLC9A3R2内或者该基因上下游的SEQ ID NO:81;位于NR2E1内或者该基因上下游的SEQ ID NO:82;位于TBX3内或者该基因上下游的SEQ ID NO:83;位于HMX3内或者该基因上下游的SEQ ID NO:84;位于GCH1内或者该基因上下游的SEQ ID NO:85;位于DCLK1内或者该基因上下游的SEQ ID NO:86;位于HPCAL1内或者该基因上下游的SEQ ID NO:87;位于SMARCA2内或者该基因上下游的SEQ ID NO:88;位于LRP1内或者该基因上下游的SEQ ID NO:89;位于TBX15内或者该基因上下游的SEQ ID NO:90;位于TBX15内或者该基因上下游的SEQ ID NO:91;位于NR2F2内或者该基因上下游的SEQ ID NO:92;位于PRKAB2内或者该基因上下游的SEQ ID NO:95;位于LHX1内或者该基因上下游的SEQ ID NO:94;位于TBX2内或者该基因上下游的SEQ ID NO:95。
甲基化标志物区域的甲基化水平在胃癌患者cfDNA中上升或下降(如表10)。得到的48个甲基化标志物的序列如SEQ ID NO:48-95。每个甲基化标志物的所有CpG位点的甲基化水平都可以通过MethylTitan甲基化测序的方法获得。每个区域中通过MHF计算得到的甲基化水平都可以作为胃癌的标志物。
表10在训练集和测试集中甲基化标志物在胃癌中的甲基化水平


测试集中胃癌与非胃癌人群的甲基化标志物区域内的甲基化水平如表10所示。从表10中可以看出,甲基化标志物区域内的甲基化水平在胃癌与无胃癌人群中的分布显著不同,具备良好的区分效果,具有显著性的差异(P<0.01),是良好的胃癌甲基化标志物。
实施例9单个甲基化标志物判别胃癌是否存在的性能
为了验证单个甲基化标志物区分对象是否患有胃癌的性能,使用单个marker的甲基化水平数据在实施例8训练集数据中训练模型,并使用测试集样本对模型的性能进行验证,具体步骤如下(图12):
1.序列预处理,针对每一个目标区域,计算该区域内的每一个MHF(Methylated Haplotype Fraction)甲基化单倍型比值数值。
2.使用python(V3.9.7)中的sklearn(V1.0.1)包中的逻辑回归模型:model=LogisticRegression(),该模型的公式如下,其中x为样本目标marker的甲基化水平值,w为不同marker的系数,b为截距值,y为模型预测分值:
3.使用训练集的样本进行训练:model.fit(Traindata,TrainPheno),其中TrainData是训练集样本中目标甲基化位点的数据,TrainPheno是训练集样本的性状(胃癌为1,非胃癌为0),并根据训练集的样本确定模型的相关阈值。
4.使用测试集的样本进行测试:TestPred=model.predict_proba(TestData),其中TestData为测试集样本中目标甲基化位点的数据,TestPred为模型预测分值,使用该预测分值并根据上述阈值对样本是否是胃癌进行判断。
5.统计模型的AUC指标。
本实施例中单个目标标志物逻辑回归模型的效果见表11。从表11中可看出,所有的目标标志物不论在测试集和训练集都可以达到0.5以上的AUC,都是较好的胃癌标志物。
表11单个marker逻辑回归模型的表现

实施例10胃癌所有甲基化标志物的预测结果
本实施例使用所有的胃癌48个目标标志物的甲基化水平构建了逻辑回归的机器学习模型,数据中准确区分出对象是否患有胃癌的样本。具体的步骤与实施例2基本一致,不同之处是使用了所有48个目标标志物组合(SEQ ID NO:48-95)的数据输入模型。
训练集和测试集中模型预测分值分布见图13。ROC曲线见图14,在测试集中,胃癌和无胃癌样本区分的AUC达到了0.922,可以较好地样本中区分出胃癌和无胃癌样本。阈值设成0.53时,大于该值预测为胃癌,小于该值预测为无胃癌,在训练集中特异性为95%,测试集敏感性达到了73%,说明了该组合模型良好的性能。
实施例11胃癌19个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的48个甲基化标志物的甲基化水平中挑选SEQ ID NO:50、55、60、62、64、66、69、72、76、78、84、85、87、88、89、90、92、94和95共19个目标标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例9一致,但相关样本只使用了上述19个目标标志物的数据,该模型在训练集和测试集中的模型得分见图15,该模型ROC曲线见图16。可看出该模型在训练集和测试集中,胃癌和无胃癌样本分值具有显著差异,该模型测试集AUC达到了0.919,说明了该组合模型良好的性能。阈值设成0.54时,大于该值预测为胃癌,小于该值预测为无胃癌,在训练集中特异性为95%,测试集敏感性达到了78%,说明了该组合模型良好的性能。
实施例12胃癌19个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的48个甲基化标志物的甲基化水平中 挑选SEQ ID NO:49、53、54、55、59、62、66、72、75、79、80、83、84、87、89、90、91、93和95共19个目标标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例2一致,但相关样本只使用了上述19个目标标志物的数据,该模型在训练集和测试集中的模型得分见图17,该模型ROC曲线见图18。可看出该模型在训练集和测试集中,胃癌和无胃癌样本分值具有显著差异,该模型测试集AUC达到了0.913,说明了该组合模型良好的性能。阈值设成0.49时,大于该值预测为胃癌,小于该值预测为无胃癌,在训练集中特异性为95%,测试集敏感性达到了65%,说明了该组合模型良好的性能。
实施例13胃癌8个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的48个甲基化标志物的甲基化水平中挑选SEQ ID NO:50、SEQ ID NO:60、SEQ ID NO:61、SEQ ID NO:67、SEQ ID NO:69、SEQ ID NO:75、SEQ ID NO:77、SEQ ID NO:84共8个目标标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例9一致,但相关样本只使用了上述8个目标标志物的数据,该模型在训练集和测试集中的模型得分见图19,该模型ROC曲线见图20。可看出该模型在训练集和测试集中,胃癌和无胃癌样本分值具有显著差异,该模型测试集AUC达到了0.872,说明了该组合模型良好的性能。阈值设成0.46时,大于该值预测为胃癌,小于该值预测为无胃癌,在训练集中特异性为95%,测试集敏感性达到了56%,说明了该组合模型良好的性能。
实施例14胃癌5个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的48个甲基化标志物的甲基化水平中挑选SEQ ID NO:50、SEQ ID NO:60、SEQ ID NO:74、SEQ ID NO:77、SEQ ID NO:82共5个目标标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例9一致,但相关样本只使用了上述5个目标标志物的数据,该模型在训练集和测试集中的模型得分见图21,模型ROC曲线见图22。看出该模型在训练集和测试集中,胃癌和无胃癌样本分值具有显著差异,该模型测试集AUC达到了0.856,说明了该组合模型良好的性能。阈值设成0.52,大于该值预测为胃癌,小于该值预测为无胃癌,在训练集中特异性为95%,测试集敏感性达到了48%,说明了该组合模型良好的性能。
实施例15食管癌样本处理及甲基化标志物筛选
收集了总计162个食管癌血液样本、393个无食管癌血液样本,所有入组患者签署知情同意书,样本信息见表12。
表12
样本处理、策略及数据预处理过程同实施例1,甲基化单倍型数据矩阵后,根据训练集样本分组发现特征甲基化单倍型:
1)对每个甲基化单倍型对于表型进行逻辑回归分析,构建逻辑回归模型,具体为:针对每一个目标区域,计算该区域内的每一个MHF(Methylated Haplotype Fraction)甲基化单倍型比值数值,使用python软件(v3.6.9)的statmodels软件包(0.12.0)构建逻辑回归模型并计算逻辑回归系数,命令行:
import statsmodels.api as sm
logist_model=sm.Logit(Y,sm.add_constant(X)).fit
pvlaue=logist_model.pvalues。其中X表示每个样本对应的甲基化单倍型数值,Y表示每个样本对应的分类标签,pvalue表示为逻辑回归的显著性检验值,对每个扩增的目标区域筛选出最回归系数最小的MHF对应的甲基化标志物,组成候选甲基化单倍型。
2)将训练集随机分成十份做十倍交叉验证增量特征筛选。
3)留出训练集中的一份数据作为测试数据,其余训练集数据作为训练数据。每个区域的 候选甲基化单倍型按照回归系数显著性进行从大到小排序,每次加入一个甲基化单倍型,使用9份训练数据构建多项式内核的SVM模型,对测试数据进行预测。
4)步骤3重复10次将所有数据遍历一遍,每次计算测试数据的AUC,重复10次之后计算10次的平均AUC。如果训练数据的AUC增加则保留该候选甲基化单倍型作为特征甲基化标志物,否则舍弃,将得到的甲基化标志物使用GREAT工具(great.stanford.edu/great/public-3.0.0/html/index.php)进行基因注释(如表13)。
所述甲基化标志物中靶标基因使用GREAT工具(great.stanford.edu/great/public-3.0.0/html/3.0.0/html/index.php)进行基因注释。GREAT分析时,通过把标志物区域和相邻的基因进行关联,并用相邻基因注释该区域。关联分成两个过程,首先,找到每个基因的调节域,然后把覆盖该区域的调节域的基因与该区域关联起来。例SKI(+2024)可以表示的是距离SKI基因的转录起始位置(TSS)下游93bp处的标志物,EPS8L3(-28150)可以表示的是距离EPS8L3基因的转录起始位置(TSS)上游28150bp处的标志物。
5)取训练集中不同特征数量情况下的平均AUC中位数对应的特征组合作为最终确定的甲基化标志物(表13)。
表13


甲基化标志物区域的甲基化水平在食管癌患者cfDNA中上升或下降(如表14)。得到的43个甲基化标志物的序列如SEQ ID NO:96-138。每个甲基化标志物的所有CpG位点的甲基化水平都可以通过MethylTitan甲基化测序的方法获得。每个区域中所有CpG位点甲基化水平的均值,以及单个CpG位点的甲基化水平都可以作为食管癌的标志物。
表14在训练集和测试集中甲基化标志物在食管癌中的甲基化水平




测试集中食管癌与无食管癌人群的甲基化标志物区域内的平均甲基化水平如表14所示。从表14中可以看出,甲基化标志物区域内的平均甲基化水平在食管癌与无食管癌人群中的分布显著不同,具备良好的区分效果,具有显著性的差异(P<0.01),是良好的食管癌甲基化标志物。
实施例16单个甲基化标志物判别食管癌是否存在的性能
为了验证单个甲基化标志物的区分食管癌和无食管癌的性能,使用单个marker的甲基化水平数据在实施例1训练集数据中训练模型,并使用测试集样本对模型的性能进行验证,具体步骤如下(图12):
1.序列预处理,针对每一个目标区域,计算该区域内的每一个MHF(Methylated Haplotype Fraction)甲基化单倍型比值数值。
2.使用python(V3.9.7)中的sklearn(V1.0.1)包中的逻辑回归模型:model=LogisticRegression(),该模型的公式如下,其中x为样本目标marker的甲基化水平值,w为不同marker的系数,b为截距值,y为模型预测分值:
3.使用训练集的样本进行训练:model.fit(Traindata,TrainPheno),其中TrainData是训练集样本中目标甲基化位点的数据,TrainPheno是训练集样本的性状(食管癌为1,无食管癌为0),并根据训练集的样本确定模型的相关阈值。
4.使用测试集的样本进行测试:TestPred=model.predict_proba(TestData)[:,1],其中TestData为测试集样本中目标甲基化位点的数据,TestPred为模型预测分值,使用该预测分值并根据上述阈值对样本是否是食管癌进行判断。
5.统计模型的AUC指标。
本实施例中单个marker逻辑回归模型的效果见表15,从该表中可看出,所有的marker的不论在测试集和训练集都可以达到0.55以上的AUC,都是较好的食管癌标志物。
表15单个marker逻辑回归模型的表现

实施例17食管癌所有甲基化标志物的预测结果
本实施例使用所有的43个甲基化标志物的甲基化水平构建了逻辑回归的机器学习模型,数据中准确区分出食管癌和非食管癌的样本。具体的步骤与实施例16基本一致,不同之处是使用了所有43个目标甲基化标志物组合(SEQ ID No:96-138)的数据输入模型。
训练集和测试集中模型预测分值分布见图23。ROC曲线见图24,在测试集中,食管癌和无食管癌样本区分的AUC达到了0.935,在训练集特异性为95%时,测试集敏感性达到了84.3%,设置阈值为0.383,大于该值则预测为食管癌,反之预测为无食管癌,说明了从本申请的甲基化标志物可以较好地从样本中区分出食管癌和无食管癌样本。
实施例18食管癌16个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的43个甲基化标志物的甲基化水平中任选SEQ ID No:100、103、109、110、113、120、121、125、128、130、132、133、134、135、137和138共16个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例16一致,但相关样本只使用了该实施例中的以上16个标志物的数据,该模型在训练集和测试集中的模型得分见图25,该模型ROC曲线见图26。可看出该模型在训练集和测试集中,食管癌和无食管癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.920,阈值设成0.431时,大于该值预测为食管癌,小于该值预测为无食管癌,在训练集中特异性为95%,测试集敏感性达到了75.8%,说明了从本申请的甲基化标志物中任选多个标志物组成的组合,其模型具有良好的性能。
实施例19食管癌17个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的43个甲基化标志物的甲基化水平中任选SEQ ID No:102、107、108、110、112、120、121、123、124、125、130、131、132、133、134、135和137共17个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例16一致,但相关样本只使用了该实施例中的以上16个标志物的数据,该模型在训练集和测试集中的模型得分见图27,该模型ROC曲线见图28。可看出该模型在训练集和测试集中,食管癌和无食管癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.916,阈值设成0.431时,大于该值预测为食管癌,小于该值预测为无食管癌,在训练集中特异性为95%,测试集敏感性达到了59.4%,说明了从本申请的甲基化标志物中任选多个标志物组成的组合,其模型具有良好的性能。
本申请通过血浆cfDNA中相关基因的甲基化水平得到43个具有明显差异的甲基化核酸 片段。基于上述单个甲基化核酸片段标志物,或者多个甲基化核酸片段组成的标志物群,建立食管癌风险预测模型,可以有效鉴别食管癌且具有较高的灵敏度和特异性,适用于食管癌的筛查与诊断。
实施例20食管癌27个甲基化标志物的预测结果
本实施例使用27个甲基化标志物的甲基化水平构建了逻辑回归的机器学习模型,数据中准确区分出食管癌和非食管癌的样本。具体的步骤与实施例16基本一致,不同之处是使用了27个目标甲基化标志物组合(SEQ ID No:98、100、102、103、107、108、109、111、112、113、114、116、117、121、123、124、125、127、128、130、131、133、134、135、136、137和138)的数据输入模型。
训练集和测试集中模型预测分值分布见图29。ROC曲线见图30,在测试集中,食管癌和无食管癌样本区分的AUC达到了0.930,在训练集特异性为95%时,测试集敏感性达到了57.6%,设置阈值为0.425,大于该值则预测为食管癌,反之预测为无食管癌,说明了从本申请的甲基化标志物可以较好地从样本中区分出食管癌和无食管癌样本。
实施例21食管癌7个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从实施例20的27个甲基化标志物的甲基化水平中挑选SEQ ID No:102、109、116、117、127、134和135共7个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例16一致,但相关样本只使用了该实施例中的以上7个标志物的数据,该模型在训练集和测试集中的模型得分见图31,该模型ROC曲线见图32。可看出该模型在训练集和测试集中,食管癌和无食管癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.900,阈值设成0.50时,大于该值预测为食管癌,小于该值预测为无食管癌,在训练集中特异性为95%,测试集敏感性达到了57.6%,说明了从本申请的甲基化标志物中任选多个标志物组成的组合,其模型具有良好的性能。
实施例22食管癌7个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从实施例20的27个甲基化标志物的甲基化水平中挑选SEQ ID No:121、125、130、133、134、135和136共7个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例16一致,但相关样本只使用了该实施例中的以上7 个标志物的数据,该模型在训练集和测试集中的模型得分见图33,该模型ROC曲线见图34。可看出该模型在训练集和测试集中,食管癌和无食管癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.890,阈值设成0.594时,大于该值预测为食管癌,小于该值预测为无食管癌,在训练集中特异性为95%,测试集敏感性达到了65.6%,说明了从本申请的甲基化标志物中任选多个标志物组成的组合,其模型具有良好的性能。
实施例23食管癌23个甲基化标志物的预测结果
本实施例使用的23个甲基化标志物的甲基化水平构建了逻辑回归的机器学习模型,数据中准确区分出食管癌和非食管癌的样本。具体的步骤与实施例16基本一致,不同之处是使用了所有23个目标甲基化标志物组合(SEQ ID No:96、97、99、101、104、105、106、110、115、118、119、120、121、122、125、126、129、130、132、133、134、135和137)的数据输入模型。
训练集和测试集中模型预测分值分布见图35。ROC曲线见图36,在测试集中,食管癌和无食管癌样本区分的AUC达到了0.934,在训练集特异性为95%时,在训练集特异性为95%时,测试集敏感性达到了64%,设置阈值为0.41,大于该值则预测为食管癌,反之预测为无食管癌,说明了从本申请的甲基化标志物可以较好地从样本中区分出食管癌和无食管癌样本。
实施例24食管癌17个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从实施例23的甲基化标志物的甲基化水平中挑选SEQ ID No:96、97、99、104、105、106、110、118、120、122、125、126、129、130、132、133和135共17个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例16一致,但相关样本只使用了该实施例中的以上17个标志物的数据,该模型在训练集和测试集中的模型得分见图37,该模型ROC曲线见图38可看出该模型在训练集和测试集中,食管癌和无食管癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.900,阈值设成0.508时,大于该值预测为食管癌,小于该值预测为无食管癌,在训练集中特异性为95%,测试集敏感性达到了56.3%,说明了从本申请的甲基化标志物中任选多个标志物组成的组合,其模型具有良好的性能。
实施例25食管癌15个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从实施例23的的甲基化水平中挑选SEQ ID No:96、97、99、105、110、118、119、120、121、122、129、130、134、135和137共15个 甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例16一致,但相关样本只使用了该实施例中的以上15个标志物的数据,该模型在训练集和测试集中的模型得分见图39,该模型ROC曲线见图40。可看出该模型在训练集和测试集中,食管癌和无食管癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.906,阈值设成0.511时,大于该值预测为食管癌,小于该值预测为无食管癌,在训练集中特异性为95%,测试集敏感性达到了59.4%,说明了从本申请的甲基化标志物中任选多个标志物组成的组合,其模型具有良好的性能。
实施例26肝癌样本处理及甲基化标志物筛选
收集了总计276个肝癌血液样本、393个无肝癌血液样本,所有入组患者签署知情同意书,样本信息见表16。
表16
样本处理、策略及数据预处理过程同实施例1,甲基化单倍型数据矩阵后,根据训练集样本分组发现特征甲基化单倍型:
1)对每个甲基化单倍型对于表型进行逻辑回归分析,构建逻辑回归模型,具体为:针对每一个目标区域,计算该区域内的每一个MHF(Methylated Haplotype Fraction)甲基化单倍型比值数值,使用python软件(v3.6.9)的statmodels软件包(0.12.0)构建逻辑回归模型并计算逻辑回归系数,命令行:
import statsmodels.api as sm
logist_model=sm.Logit(Y,sm.add_constant(X)).fit
pvlaue=logist_model.pvalues。其中X表示每个样本对应的甲基化单倍型数值,Y表示每个样本对应的分类标签,pvalue表示为逻辑回归的显著性检验值,对每个扩增的目标区域筛选出最回归系数最小的MHF对应的甲基化标志物,组成候选甲基化单倍型。
2)将训练集随机分成十份做十倍交叉验证增量特征筛选。
3)留出训练集中的一份数据作为测试数据,其余训练集数据作为训练数据。每个区域的候选甲基化单倍型按照回归系数显著性进行从大到小排序,每次加入一个甲基化单倍型,使用9份训练数据构建多项式内核的SVM模型,对测试数据进行预测。
4)步骤3重复10次将所有数据遍历一遍,每次计算测试数据的AUC,重复10次之后计算10次的平均AUC。如果训练数据的AUC增加则保留该候选甲基化单倍型作为特征甲基化标志物,否则舍弃,将得到的甲基化标志物使用GREAT工具(great.stanford.edu/great/public-3.0.0/html/index.php)进行基因注释(如表17)。
所述甲基化标志物中靶标基因使用GREAT工具(great.stanford.edu/great/public-3.0.0/html/3.0.0/html/index.php)进行基因注释。GREAT分析时,通过把标志物区域和相邻的基因进行关联,并用相邻基因注释该区域。关联分成两个过程,首先,找到每个基因的调节域,然后把覆盖该区域的调节域的基因与该区域关联起来。例SKI(+2024)可以表示的是距离SKI基因的转录起始位置(TSS)下游93bp处的标志物,EPS8L3(-28150)可以表示的是距离EPS8L3基因的转录起始位置(TSS)上游28150bp处的标志物。
5)取训练集中不同特征数量情况下的平均AUC中位数对应的特征组合作为最终确定的甲基化标志物(表17)。
表17





甲基化标志物区域的甲基化水平在肝癌患者cfDNA中上升或下降(如表18)。得到的202个甲基化标志物的序列如SEQ ID NO:139-340。每个甲基化标志物的所有CpG位点的甲基化水平都可以通过MethylTitan甲基化测序的方法获得。每个区域中所有CpG位点甲基化水平的均值,以及单个CpG位点的甲基化水平都可以作为肝癌的标志物。
表18在训练集和测试集中甲基化标志物在肝癌中的甲基化水平


















测试集中肝癌与无肝癌人群的甲基化标志物区域内的平均甲基化水平如表18所示。从表18中可以看出,甲基化标志物区域内的平均甲基化水平在肝癌与无肝癌人群中的分布显著不同,具备良好的区分效果,具有显著性的差异(P<0.01),是良好的肝癌甲基化标志物。
实施例27单个甲基化标志物判别肝癌是否存在的性能
为了验证单个甲基化标志物的区分肝癌和无肝癌的性能,使用单个marker的甲基化水平数据在实施例26训练集数据中训练模型,并使用测试集样本对模型的性能进行验证,具体步骤如下(图12):
1.序列预处理,针对每一个目标区域,计算该区域内的每一个MHF(Methylated Haplotype Fraction)甲基化单倍型比值数值。
2.使用python(V3.9.7)中的sklearn(V1.0.1)包中的逻辑回归模型:model=LogisticRegression(),该模型的公式如下,其中x为样本目标marker的甲基化水平值,w为不同marker的系数,b为截距值,y为模型预测分值:
3.使用训练集的样本进行训练:model.fit(Traindata,TrainPheno),其中TrainData是训练集样本中目标甲基化位点的数据,TrainPheno是训练集样本的性状(肝癌为1,无肝癌为0),并根据训练集的样本确定模型的相关阈值。
4.使用测试集的样本进行测试:TestPred=model.predict_proba(TestData)[:,1],其中TestData为测试集样本中目标甲基化位点的数据,TestPred为模型预测分值,使用该预测分值并根据上述阈值对样本是否是肝癌进行判断。
5.统计模型的AUC指标。
本实施例中单个marker逻辑回归模型的效果见表19,从该表中可看出,所有的marker的不论在测试集和训练集都可以达到0.55以上的AUC,都是较好的肝癌标志物。
表19单个marker逻辑回归模型的表现







实施例28肝癌所有甲基化标志物的预测结果
本实施例使用所有的202个肝癌甲基化标志物的甲基化水平构建了逻辑回归的机器学习模型,数据中准确区分出肝癌和非肝癌的样本。具体的步骤与实施例27基本一致,不同之处是使用了所有202个目标甲基化标志物组合(SEQ ID No:139-340)的数据输入模型。
训练集和测试集中模型预测分值分布见图41。ROC曲线见图42,在测试集中,肝癌和无肝癌样本区分的AUC达到了0.986,在训练集特异性为99%时,测试集敏感性达到了91%,设置阈值为0.58,大于该值则预测为肝癌,反之预测为无肝癌,可以较好地样本中区分出肝癌和无肝癌样本。
实施例29肝癌25个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的202个甲基化标志物的甲基化水平中任选SEQ ID No:176、183、187、195、196、209、210、214、220、225、227、228、241、245、246、269、270、286、293、299、301、302、326、329和337共25个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例27一致,但相关样本只使用了该实施例中的以上25个标志物的数据,该模型在训练集和测试集中的模型得分见图43,该模型ROC曲线见图44。可看出该模型在训练集和测试集中,肝癌和无肝癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.938,阈值设成0.673时,大于该值预测为肝癌,小于该值预测为无肝癌,在训练集中特异性为99%,测试集敏感性达到了76%,说明了该组合模型良 好的性能。
实施例30肝癌52个甲基化标志物的预测结果
为了验证相关标志物组合的效果,本实施例从所有的202个甲基化标志物的甲基化水平中挑选SEQ ID No:139、140、143、144、164、165、175、176、178、183、184、190、192、194、195、199、203、204、206、208、210、213、215、216、218、220、224、234、235、237、253、265、266、267、269、270、271、272、281、286、301、306、314、315、317、320、321、322、323、333、336和338共52个甲基化标志物构建了逻辑回归的机器学习模型。
机器学习模型构建的方法也同实施例27一致,但相关样本只使用了该实施例中的以上52个标志物的数据,该模型在训练集和测试集中的模型得分见图45,该模型ROC曲线见图46。可看出该模型在训练集和测试集中,肝癌和无肝癌样本分值同其他癌种分值具有显著差异,该模型测试集AUC达到了0.959,阈值设成0.58时,大于该值预测为肝癌,小于该值预测为无肝癌,在训练集中特异性为99%,测试集敏感性达到了71%,说明了该组合模型良好的性能。
本申请通过血浆cfDNA中相关基因的甲基化水平得到202个具有明显差异的甲基化核酸片段。基于上述单个甲基化核酸片段标志物,或者多个甲基化核酸片段组成的标志物群,建立肝癌风险预测模型,可以有效鉴别肝癌且具有较高的灵敏度和特异性,适用于肝癌的筛查与诊断。
前述详细说明是以解释和举例的方式提供的,并非要限制所附权利要求的范围。目前本申请所列举的实施方式的多种变化对本领域普通技术人员来说是显而易见的,且保留在所附的权利要求和其等同方案的范围内。

Claims (10)

  1. 一种结直肠癌甲基化标志物,其是分离的来自哺乳动物的核酸分子,所述核酸分子的序列包括:(1)SEQ ID NO:1-47中任一种或多种或全部所示的序列或其互补序列或变体,所述变体是与相应序列具有至少70%序列同一性的变体,并且所述变体中的甲基化位点未发生突变,或(2)(1)的经处理的序列,所述处理使未甲基化的胞嘧啶转化为与鸟
    嘌呤结合能力低于胞嘧啶的碱基,
    优选地,项目(1)选自以下任一组:
    (1.1)以下序列中任一种或多种或全部:SEQ ID NO:4、SEQ ID NO:11、SEQ ID NO:15、SEQ ID NO:18、SEQ ID NO:19、SEQ ID NO:30、SEQ ID NO:34、SEQ ID NO:37、SEQ ID NO:41,或其互补序列或变体,任选还包括SEQ ID NO:1-47中其余序列的任一种或多种或其互补序列或变体,
    (1.2)以下序列中任一种或多种或全部:SEQ ID NO:1、SEQ ID NO:21、SEQ ID NO:29、SEQ ID NO:36、SEQ ID NO:44、SEQ ID NO:47,或其互补序列或变体,任选还包括SEQ ID NO:1-47中其余序列的任一种或多种,
    (1.3)以下序列中任一种或多种或全部:SEQ ID NO:6、SEQ ID NO:10、SEQ ID NO:13、SEQ ID NO:14、SEQ ID NO:22、SEQ ID NO:28、SEQ ID NO:43,或其互补序列或变体,任选还包括SEQ ID NO:1-47中其余序列的任一种或多种或其互补序列或变体。
  2. 通过检测DNA甲基化筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后的试剂,所述试剂包含待测样品中标志物的甲基化水平的试剂,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
    优选地,
    所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部;或者
    所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部;或者
    所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,
    更优选地,所述试剂具有选自以下的一项或多项特征:
    所述标志物包含至少3个CpG二核苷酸,
    所述片段长度为1-1000bp,优选1-700bp,
    所述片段是基因序列的启动子区域或其部分,
    所述试剂包含与所述标志物或其经转化的序列杂交的引物分子,
    所述试剂包含与标志物或其经转化的序列杂交的探针分子,
    所述样品来自哺乳动物。
  3. 记载有DNA序列或其片段和/或其甲基化信息的介质,所述DNA序列包括:
    (i)以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
    或(ii)(i)的经处理的序列,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基,
    优选地,
    所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部;或者
    所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部;或者
    所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,
    更优选地,所述介质具有选自以下的一项或多项特征:
    所述标志物包含至少3个CpG二核苷酸,
    所述的片段长度为1-1000bp,优选1-700bp,
    所述片段是基因序列的启动子区域或其部分,
    所述介质是印有所述DNA序列或其片段和/或其甲基化信息的载体,包括卡片,例如纸质、塑料、金属、玻璃卡片,
    所述介质是存储有所述序列和/或其甲基化信息和计算机程序的计算机可读介质,当所述计算机程序被处理器执行时,实现下述步骤:将样品的甲基化测序数据与所述序列或信息比较,从而获得所述样品中含所述序列的核酸分子的存在、含量和/或甲基化水平,并据此筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后。
  4. 以下(a)和任选的(b)在制备用于筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后的试剂盒中的用途,
    (a)用于确定对象的样品中标志物的甲基化水平的试剂或装置,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,
    (b)所述标志物或其经处理的核酸分子,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基,
    其中,所述DNA序列包括以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2。
  5. 如权利要求4所述的用途,其特征在于,
    所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或者
    所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或者
    所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部。
  6. 如权利要求4或5所述的用途,其特征在于,所述试剂包含与所述标志物或其经转化的序列杂交的引物分子,和/或所述试剂包含与所述标志物或其经转化的序列杂交的探针分子,
    优选地,所述用途还具有选自以下的一项或多项特征:
    所述标志物包含至少3个CpG二核苷酸,
    所述片段长度为1-1000bp,优选1-700bp,
    所述片段是基因序列的启动子区域或其部分,
    所述装置包含权利要求3所述的介质,
    所述对象是哺乳动物,
    所述样品来自哺乳动物的组织、细胞或体液,优选血液,
    所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基,
    所述试剂盒还包括PCR反应试剂,
    所述试剂盒还包括检测DNA甲基化的其他试剂,所述其他试剂是选自以下方法的一个或多个中所用的试剂:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱;优选地,所述其他试剂选自以下一种或多种:重亚硫酸盐、亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其衍生物,甲基化敏感或不敏感的限制性内切酶,酶切缓冲液,荧光染料,荧光淬灭剂,荧光报告剂,外切核酸酶,碱性磷酸酶,内标和对照物;
    优选地,所述筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后包括:比较标记物的甲基化水平和相应的参考水平,并根据评分筛查结直肠癌风险、诊断结直肠癌、评估结直肠癌预后。
  7. 一种用于筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后的方法,包括:
    (1)检测对象的样品中标志物的甲基化水平,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括以下基因序列中的一个或多个或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
    (2)比较步骤(1)中标记物的甲基化水平和相应的参考水平,
    (3)根据比较结果筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后,
    优选地,
    所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或
    所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种 或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或
    所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,
    更优选地,所述方法具有选自以下的一项或多项特征:
    所述标志物包含至少3个CpG二核苷酸,
    所述片段长度为1-1000bp,优选1-700bp,
    所述片段是基因序列的启动子区域或其部分,
    优选地,所述方法在步骤(1)之前还包含从对象获取含有DNA的生物样品的步骤;优选地,步骤(1)包括使用引物分子、探针分子和/或介质进行所述检测,
    步骤(2)中的比较包括:直接比较步骤(1)中标记物的甲基化水平和参考水平,或者通过计算得出评分并比较标记物的甲基化水平的评分和相应的参考评分;优选地,所述评分通过逻辑回归模型进行计算,
    步骤(3)包括:当标记物的甲基化水平大于参考水平,或者甲基化水平的评分大于参考评分,则所述对象有形成结直肠癌的风险、患有结直肠癌或结直肠癌预后不良,
    优选地,所述检测包括:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱,
    所述样品来自哺乳动物的组织、细胞或体液,优选血液,
    所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使其中未甲基化的胞嘧啶转化为不与鸟嘌呤结合的碱基。
  8. 用于筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后的试剂盒,包含:
    (a)用于确定对象的样品中标志物的甲基化水平的试剂或装置,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,和
    任选的(b)所述标志物或其经处理的核酸分子,所述处理使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基,
    其中,所述DNA序列包括以下基因序列中的一种或多种或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、 B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
    优选地,
    所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或
    所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或
    所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,
    更优选地,所述方法具有选自以下的一项或多项特征:
    所述标志物包含至少3个CpG二核苷酸,
    所述片段长度为1-1000bp,优选1-700bp,
    所述片段是基因序列的启动子区域或其部分,
    所述试剂包含与所述标志物或其经转化的序列杂交的引物分子,
    所述试剂包含与所述标志物或其经转化的序列杂交的探针分子,
    所述装置包含权利要求3所述的介质,
    所述对象是哺乳动物,
    所述样品来自哺乳动物的组织、细胞或体液,优选血液,
    所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于胞嘧啶的碱基。
  9. 如权利要求8所述的试剂盒,其特征在于,
    所述试剂盒还包括PCR反应试剂,或
    所述试剂盒还包括检测DNA甲基化的其他试剂,所述其他试剂是选自以下方法的一个或多个中所用的试剂:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱,
    优选地,所述检测DNA甲基化的其他试剂选自以下一种或多种:重亚硫酸盐、亚硫酸氢盐、酸式亚硫酸盐或焦亚硫酸盐或其衍生物,甲基化敏感或不敏感的限制性内切酶,酶切缓冲液,荧光染料,荧光淬灭剂,荧光报告剂,外切核酸酶,碱性磷酸酶,内标和对照物。
  10. 一种用于筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后的装置,所述装置包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现以下步骤:
    (1)获取对象的样品中标志物的甲基化水平,所述标志物是DNA序列以及该DNA序列的上游5kb和下游5kb、或其片段、或其中一个或多个CpG二核苷酸,所述DNA序列包括以下基因序列中的一个或多个或全部:(p)TTLL10、ST6GALNAC5、KCNA3、CACNA1E、TRAPPC12、UBE2F、ZIC4、ZNF595、EVC2、HMX1、PITX2、POU4F2、IRX4、IRX1、CRHBP、KCNMB1、KCNQ5、TBX20、ACTR3C、ACTR3B、VIPR2、SOX17、MOS、PREX2、GDF6、OSR2、BARX1、SORCS3、VAX1、DPYSL4、UTF1、B3GAT1、HOXC13、CUX2、GLT1D1、ITGBL1、SKOR1、TM6SF1、LRRK1、FOXL1、MYO15B、DNM2、ZNF536、YTHDF1、SIM2,
    (2)比较步骤(1)中标记物的甲基化水平和相应的参考水平,
    (3)根据比较结果筛查结直肠癌风险、诊断结直肠癌或评估结直肠癌预后,
    优选地,
    所述DNA序列包括选自CACNA1E、PITX2、CRHBP、TBX20、SORCS3、B3GAT1、GLT1D1和LRRK1的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或
    所述DNA序列包括选自TTLL10、ACTR3B、BARX1、CUX2、DNM2和SIM2的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,或
    所述DNA序列包括选自UBE2F、HMX1、IRX4、IRX1、VIPR2、OSR2和MYO15B的一种或多种或全部,任选还包括(p)中的其他基因序列中的一种或多种或全部,
    更优选地,所述方法具有选自以下的一项或多项特征:
    所述标志物包含至少3个CpG二核苷酸,
    所述片段长度为1-1000bp,优选1-700bp,
    所述片段是基因序列的启动子区域或其部分,
    所述试剂包含与所述标志物或其经转化的序列杂交的引物分子,
    所述试剂包含与所述标志物或其经转化的序列杂交的探针分子,
    所述装置包含权利要求3所述的介质,
    所述对象是哺乳动物,
    所述样品来自哺乳动物的组织、细胞或体液,优选血液,
    所述DNA序列是:相应标志物在基因组中的序列、或其经转化的序列、或其经甲基化敏感型限制性内切酶处理的序列,所述转化使未甲基化的胞嘧啶转化为与鸟嘌呤结合能力低于 胞嘧啶的碱基,
    优选地,所述方法在步骤(1)之前还包含从对象获取含有DNA的生物样品的步骤;优选地,步骤(1)包括使用所述的引物分子、探针分子和/或介质进行所述检测,
    步骤(2)中的比较包括:直接比较步骤(1)中标记物的甲基化水平和参考水平,或者通过计算得出评分并比较标记物的甲基化水平的评分和相应的参考评分;优选地,所述评分通过逻辑回归模型进行计算,
    步骤(3)包括:当标记物的甲基化水平大于参考水平,或者甲基化水平的评分大于参考评分,则所述对象有形成结直肠癌的风险、患有结直肠癌或结直肠癌预后不良,
    所述检测包括:基于重亚硫酸盐转化的PCR、DNA测序、甲基化敏感的限制性内切酶分析法、荧光定量法、甲基化敏感性高分辨率熔解曲线法、基于芯片的甲基化图谱分析、质谱。
PCT/CN2023/118675 2022-09-16 2023-09-14 鉴别癌症的甲基化标志物及应用 WO2024056008A1 (zh)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN202211129987.8 2022-09-16
CN202211129987.8A CN117821585A (zh) 2022-09-16 2022-09-16 结直肠癌早期诊断标志物及应用
CN202211190564.7 2022-09-28
CN202211190564.7A CN117778568A (zh) 2022-09-28 2022-09-28 鉴别胃癌的标志物及应用
CNPCT/CN2022/124503 2022-10-11
CN2022124503 2022-10-11
CN2022126559 2022-10-21
CNPCT/CN2022/126559 2022-10-21

Publications (1)

Publication Number Publication Date
WO2024056008A1 true WO2024056008A1 (zh) 2024-03-21

Family

ID=90274284

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/118675 WO2024056008A1 (zh) 2022-09-16 2023-09-14 鉴别癌症的甲基化标志物及应用

Country Status (1)

Country Link
WO (1) WO2024056008A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101688239A (zh) * 2007-02-12 2010-03-31 约翰·霍普金斯大学 结肠癌的早期检测和预后
CN103314114A (zh) * 2010-09-13 2013-09-18 临床基因组学股份有限公司 结直肠癌的外遗传标记以及使用它们的诊断方法
WO2021202351A1 (en) * 2020-03-31 2021-10-07 Freenome Holdings, Inc. Methods and systems for detecting colorectal cancer via nucleic acid methylation analysis
CN113493835A (zh) * 2020-03-20 2021-10-12 上海鹍远健康科技有限公司 通过检测bcan基因区域的甲基化状态筛查大肠瘤的方法和试剂盒
CN114127313A (zh) * 2019-05-31 2022-03-01 通用诊断公司 结肠直肠癌的检测
CN114207153A (zh) * 2020-03-20 2022-03-18 上海鹍远健康科技有限公司 筛查结直肠瘤的方法和试剂盒
CN114375340A (zh) * 2019-05-31 2022-04-19 通用诊断公司 结肠直肠癌的检测
CN114908159A (zh) * 2021-02-09 2022-08-16 复旦大学附属中山医院 结直肠进展期腺瘤的筛查、风险评估及预后方法和试剂盒

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101688239A (zh) * 2007-02-12 2010-03-31 约翰·霍普金斯大学 结肠癌的早期检测和预后
CN103314114A (zh) * 2010-09-13 2013-09-18 临床基因组学股份有限公司 结直肠癌的外遗传标记以及使用它们的诊断方法
CN114127313A (zh) * 2019-05-31 2022-03-01 通用诊断公司 结肠直肠癌的检测
CN114375340A (zh) * 2019-05-31 2022-04-19 通用诊断公司 结肠直肠癌的检测
CN113493835A (zh) * 2020-03-20 2021-10-12 上海鹍远健康科技有限公司 通过检测bcan基因区域的甲基化状态筛查大肠瘤的方法和试剂盒
CN114207153A (zh) * 2020-03-20 2022-03-18 上海鹍远健康科技有限公司 筛查结直肠瘤的方法和试剂盒
WO2021202351A1 (en) * 2020-03-31 2021-10-07 Freenome Holdings, Inc. Methods and systems for detecting colorectal cancer via nucleic acid methylation analysis
CN114908159A (zh) * 2021-02-09 2022-08-16 复旦大学附属中山医院 结直肠进展期腺瘤的筛查、风险评估及预后方法和试剂盒

Similar Documents

Publication Publication Date Title
CN110872631B (zh) Dna甲基化生物标志物组合、检测方法和试剂盒
AU2013266341B2 (en) A quantitative multiplex methylation specific PCR method- cMethDNA, reagents, and its use
CN107847515B (zh) 实体瘤甲基化标志物及其用途
KR20190004768A (ko) 메틸화된 dna 분석에 의한 폐 종양의 검출
AU2017246318A2 (en) Noninvasive diagnostics by sequencing 5-hydroxymethylated cell-free dna
WO2018069450A1 (en) Methylation biomarkers for lung cancer
JP2022552400A (ja) 特定の遺伝子のcpgメチル化変化を利用した肝癌診断用組成物およびその使用
CN113186278B (zh) 甲状腺结节良恶性相关标志物及其应用
US20230193395A1 (en) Methods and kits for screening colorectal neoplasm
CN113493835A (zh) 通过检测bcan基因区域的甲基化状态筛查大肠瘤的方法和试剂盒
US11535897B2 (en) Composite epigenetic biomarkers for accurate screening, diagnosis and prognosis of colorectal cancer
WO2022170984A1 (zh) 结直肠进展期腺瘤的筛查、风险评估及预后方法和试剂盒
WO2024056008A1 (zh) 鉴别癌症的甲基化标志物及应用
JP6583817B2 (ja) 子宮平滑筋における腫瘍の診断マーカー
EP2044214A2 (en) A method for determining the methylation rate of a nucleic acid
EP4083232A1 (en) Combination of dna methylation biomarkers, and detection method therefor and kit thereof
CN113122633A (zh) 检测dna甲基化的试剂及用途
WO2023104136A1 (zh) 甲状腺癌良恶性结节诊断的甲基化标志物及其应用
WO2023274350A1 (zh) 甲状腺结节良恶性相关标志物及其应用
CN117778568A (zh) 鉴别胃癌的标志物及应用
CN117821585A (zh) 结直肠癌早期诊断标志物及应用
KR20230105973A (ko) 특정 유전자의 CpG 메틸화 변화를 이용한 전립선암 진단용 조성물 및 이의 용도
CN117721203A (zh) 用于检测甲状腺癌的组合物及其用途
WO2023116593A1 (zh) 一种肿瘤检测方法及应用
WO2023007241A2 (en) Compositions and methods related to tet-assisted pyridine borane sequencing for cell-free dna

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23864740

Country of ref document: EP

Kind code of ref document: A1