CN115491421A - Pancreatic cancer diagnosis related DNA methylation marker and application thereof - Google Patents

Pancreatic cancer diagnosis related DNA methylation marker and application thereof Download PDF

Info

Publication number
CN115491421A
CN115491421A CN202110679281.8A CN202110679281A CN115491421A CN 115491421 A CN115491421 A CN 115491421A CN 202110679281 A CN202110679281 A CN 202110679281A CN 115491421 A CN115491421 A CN 115491421A
Authority
CN
China
Prior art keywords
seq
methylation
dna sequence
fragment
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110679281.8A
Other languages
Chinese (zh)
Inventor
苏志熙
何其晔
马成城
徐敏杰
谢可辉
杨世方
马建华
刘琪
刘蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Fuyuan Biotechnology Co ltd
Original Assignee
Shanghai Fuyuan Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Fuyuan Biotechnology Co ltd filed Critical Shanghai Fuyuan Biotechnology Co ltd
Priority to CN202110679281.8A priority Critical patent/CN115491421A/en
Priority to PCT/CN2022/099311 priority patent/WO2022262831A1/en
Priority to KR1020247001904A priority patent/KR20240021975A/en
Priority to CA3222729A priority patent/CA3222729A1/en
Priority to US18/571,373 priority patent/US20240141442A1/en
Priority to CN202280042761.6A priority patent/CN117500942A/en
Priority to AU2022292704A priority patent/AU2022292704A1/en
Priority to EP22824304.4A priority patent/EP4372103A1/en
Publication of CN115491421A publication Critical patent/CN115491421A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/154Methylation markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/166Oligonucleotides used as internal standards, controls or normalisation probes

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biophysics (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a DNA methylation marker related to pancreatic cancer diagnosis and application thereof. Specifically, the invention provides an early methylation marker for non-invasively identifying pancreatic cancer and application of a detection reagent thereof in preparing a kit for diagnosing pancreatic cancer of a subject. The invention identifies markers capable of identifying pancreatic cancer based on the measured methylation sequencing data analysis of blood samples of pancreatic cancer patients and pancreatic cancer-free subjects, can effectively identify pancreatic cancer based on the methylation marker group, has higher sensitivity and specificity, and provides a new method for early identification of pancreatic cancer. The detection process is noninvasive and high in safety.

Description

Pancreatic cancer diagnosis related DNA methylation marker and application thereof
Technical Field
The invention belongs to the technical field of molecular biomedicine, and particularly relates to a pancreatic cancer related methylation marker and application thereof, which are used for early identification of pancreatic cancer.
Background
Pancreatic cancer, such as Pancreatic Ductal Adenocarcinoma (PDAC), is one of the most fatal diseases in the world. The 5-year relative survival rate is 9%, and for patients with distant metastases, this rate is further reduced to only 3%. One major reason for the high mortality rate is that the methods for early detection of PDAC remain limited, which is critical for PDAC patients undergoing surgical resection. Currently, carbohydrate antigen 19-9 (CA 19-9) is the most common clinical serum biomarker for the adjuvant detection of PDAC, and can achieve 79-90% sensitivity and 75-90% specificity on patients with pre-excision symptoms. However, several large population studies have demonstrated that CA19-9 is ineffective in detecting PDAC in the asymptomatic population because of its low positive predictive value, essentially precluding its use for early screening of PDAC (Kim et al, 2004) (Chang et al, 2006 homma &tsuchiya,1991, kim et al, 2004, satake, takeuchi, homma, & Ozaki, 1994. Endoscopic ultrasound-guided fine needle aspiration (EUS-FNA) is another common method for obtaining pathological diagnosis without open surgery, but it is invasive and requires clear visual evidence, which usually means that PDAC has progressed. During tumorigenesis and progression, the DNA methylation patterns and levels of malignant cell genomic DNA have profoundly changed. Some tumor-specific DNA methylation has been shown to occur early in tumorigenesis and may be a "driver" of tumorigenesis.
Circulating tumor DNA (ctDNA) molecules, derived from apoptotic or necrotic tumor cells, carry tumor-specific DNA methylation markers from early malignant tumors, and have recently been investigated as promising new targets for developing noninvasive early screening tools for a variety of cancers. However, most of these studies have not achieved effective results. Studies have shown that the proportion of ctDNA in plasma DNA of patients with early stage tumors is very small (Abbosh et al, 2017), and therefore identifying stable and consistent pancreatic cancer tumor-specific markers from plasma DNA is a great challenge.
Disclosure of Invention
The invention provides a method for detecting the methylation levels of a plurality of genes of a sample, distinguishing pancreatic cancer by using the methylation levels of the different genes of a detection result, and achieving the purpose of noninvasive accurate diagnosis of pancreatic cancer with higher accuracy and lower cost.
Specifically, the present invention provides in a first aspect an isolated nucleic acid molecule from a mammal, which is a methylation marker of a gene associated with pancreatic cancer, the sequence of which comprises (1) one or more or all of the following sequences selected from: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, , (2) (1) , (3) (1) (2) , </xnotran> The treatment converts unmethylated cytosines to bases that have less binding ability to guanine than cytosine.
In one or more embodiments, the methylation sites are consecutive cpgs.
In one or more embodiments, the methylation marker can be any one or more CpG sites in the sequence region.
In one or more embodiments, the nucleic acid molecule is used as an internal standard or control for detecting the level of DNA methylation of the corresponding sequence in a sample.
In one or more embodiments, the pancreatic cancer is pancreatic ductal adenocarcinoma.
In a second aspect, the invention provides a reagent for detecting DNA methylation, the reagent comprising a reagent for detecting the methylation level of a DNA sequence or fragment thereof, or the methylation state or level of one or more CpG dinucleotides in the DNA sequence or fragment thereof, in a sample from a subject, the DNA sequence being selected from one or more (e.g. at least 7) or all gene sequences, or sequences within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, BNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2.
In one or more embodiments, the DNA sequence comprises a sense strand or an antisense strand of DNA.
In one or more embodiments, the fragment is 1-1000bp in length, preferably 1-700bp in length.
In one or more embodiments, the fragment comprises at least one CpG dinucleotide.
In one or more embodiments, the DNA sequence is selected from one or more (e.g., at least 7) or all of the following or a complement thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
In one or more embodiments, the agent is a primer molecule that hybridizes to the DNA sequence or fragment thereof. The primer molecule can amplify the DNA sequence or a fragment thereof. In one or more embodiments, the primer sequence is methylation specific or non-specific. The primer molecule is at least 9bp.
In one or more embodiments, the agent is a probe molecule that hybridizes to the DNA sequence or fragment thereof. In one or more embodiments, the probe further comprises a detectable substance. In one or more embodiments, the detectable species is a 5 'fluorescent reporter and a 3' labeled quencher. In one or more embodiments, the fluorescent reporter gene is selected from Cy5, FAM, and VIC. Preferably, the sequence of the probe comprises MGB (Minor groove binder) or LNA (Locked nucleic acid). The probe molecule is at least 12bp.
In one or more embodiments, the agent comprises a nucleic acid molecule as described in the first aspect herein.
In one or more embodiments, the sample is from a mammal, preferably a human.
In a third aspect, the invention provides a medium bearing DNA sequences or fragments thereof and/or methylation information thereof, wherein the DNA sequences are (i) selected from one or more (e.g. at least 7) or all of the following gene sequences, or sequences within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX2, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, znna 6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, sp, hx3, LHX1, TIMP2, SIM 750, ZNF 2, or (ii) (i) treated purine sequences that bind to cytosine at a lower than the ability of converting a cytosine to a cytosine.
In one or more embodiments, the DNA sequence is (i) a gene sequence selected from any one of the following groups, or a sequence within 20kb upstream or downstream thereof: (1) LBX2, TBR1, EVX, SFRP2, SYT10, CCNA1, ZFHX3; (2) TRIM58, HOXD4, INSIG1, SYT10, CCNA1, ZIC2, CLEC14A; (3) EMX1, POU3F3, TOPAZ1, ZIC2, OTX2, AHSP, TIMP2; (4) EMX1, EVX, RPL9, SFRP2, HOXA13, SYT10, CLEC14A; (5) TBX15, EMX1, LBX2, OLIG3, SYT10, AGAP2, TBX3; (6) TRIM58, VAX2, EMX1, HOXD4, ZIC2, CLEC14A, LHX; (7) POU3F3, HOXD8, RPL9, TBX18, SYT10, TBX3, CLEC14A; (8) TRIM58, EMX1, TLX2, EVX, HOXD4, IRX4; (9) SIX3, POU3F3, TOPAZ1, RPL9, SFRP2, CLEC14A, BNC; (10) DMRTA2, HOXD4, IRX4, INSIG1, MOS, CLEC14A, CLEC A, or (ii) a treated sequence of (i) that converts unmethylated cytosine to a base that has less binding capacity to guanine than cytosine.
In one or more embodiments, the medium is used to align with gene methylation sequencing data to determine the presence, amount, and/or level of methylation of a nucleic acid molecule comprising the sequence or fragment.
In one or more embodiments, the DNA sequence comprises a DNA sense strand or an antisense strand.
In one or more embodiments, the fragment is 1-1000bp in length, preferably 1-700bp in length.
In one or more embodiments, the fragment comprises at least one CpG dinucleotide.
In one or more embodiments, the DNA sequence is selected from one or more (e.g., at least 7) or all of the following or a complement thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
In one or more embodiments, the methylation information includes information related to cytosines in the sequence of the nucleic acid molecule that are likely to be methylated. Preferably, the cytosine that is likely to be methylated is a C in CpG. In one or more embodiments, the methylation information is the location of a methylation site (e.g., a CpG dinucleotide) of the nucleic acid molecule.
In one or more embodiments, the medium is a support, including cards, such as paper, plastic, metal, glass cards, printed with the DNA sequence or fragment thereof and/or methylation information thereof.
In one or more embodiments, the medium is a computer readable medium having stored thereon the sequence and/or methylation information thereof and a computer program which, when executed by a processor, performs the steps of: comparing methylation sequencing data of the sample to the sequence, thereby obtaining the presence, amount and/or level of methylation of nucleic acid molecules comprising the sequence in the sample. The presence, amount and/or level of methylation of nucleic acid molecules comprising said sequences are useful for diagnosing pancreatic cancer.
In another aspect, the present invention also provides the use of (a) and/or (b) in the preparation of a kit for diagnosing pancreatic cancer in a subject,
(a) Reagents or devices for determining the methylation level of a DNA sequence or fragment thereof or the methylation state or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof in a sample of a subject,
(b) A treated nucleic acid molecule of said DNA sequence or fragment thereof, said treatment converting unmethylated cytosine to a base having a lower binding capacity for guanine than cytosine,
wherein the DNA sequence is selected from one or more (e.g., at least 7) or all of the following gene sequences, or sequences within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, BNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2.
In one or more embodiments, the DNA sequence comprises a gene sequence selected from any one of the following groups: (1) LBX2, TBR1, EVX, SFRP2, SYT10, CCNA1, ZFHX3; (2) TRIM58, HOXD4, INSIG1, SYT10, CCNA1, ZIC2, CLEC14A; (3) EMX1, POU3F3, TOPAZ1, ZIC2, OTX2, AHSP, TIMP2; (4) EMX1, EVX, RPL9, SFRP2, HOXA13, SYT10, CLEC14A; (5) TBX15, EMX1, LBX2, OLIG3, SYT10, AGAP2, TBX3; (6) TRIM58, VAX2, EMX1, HOXD4, ZIC2, CLEC14A, LHX; (7) POU3F3, HOXD8, RPL9, TBX18, SYT10, TBX3, CLEC14A; (8) TRIM58, EMX1, TLX2, EVX, HOXD4, IRX4; (9) SIX3, POU3F3, TOPAZ1, RPL9, SFRP2, CLEC14A, BNC; (10) DMRTA2, HOXD4, IRX4, INSIG1, MOS, CLEC14A, CLEC A.
In one or more embodiments, the DNA sequence comprises a sense strand or an antisense strand of DNA.
In one or more embodiments, the fragment is 1-1000bp in length, preferably 1-700bp in length.
In one or more embodiments, the fragment comprises at least one CpG dinucleotide.
In one or more embodiments, the DNA sequence is selected from one or more (e.g., at least 7) or all of the following or a complement thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
In one or more embodiments, the DNA sequence comprises a sequence selected from any one of the following: <xnotran> (1) SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:26, SEQ ID NO:40, SEQ ID NO:43, SEQ ID NO:52, (2) SEQ ID NO:5, SEQ ID NO:18, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:46, (3) SEQ ID NO:8, SEQ ID NO:11, SEQ ID NO:20, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:51, SEQ ID NO:54, (4) SEQ ID NO:8, SEQ ID NO:14, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:31, SEQ ID NO:40, SEQ ID NO:46, (5) SEQ ID NO:3, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:29, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, (6) SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:19, SEQ ID NO:44, SEQ ID NO:47, SEQ ID NO:53, (7) SEQ ID NO:12, SEQ ID NO:17, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:47, (8) SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:27, (9) SEQ ID NO:6, </xnotran> 12,20, 24, 26, 47,50, (10) 1,19,27,34,37,46, 47.
In one or more embodiments, the nucleic acid molecule is a nucleic acid molecule as described herein in the first aspect.
In one or more embodiments, the reagent comprises a primer molecule and/or a probe molecule.
In one or more embodiments, the agent comprises a primer molecule that hybridizes to the DNA sequence or fragment thereof. The primer molecule can amplify the DNA sequence or a fragment thereof. In one or more embodiments, the primer sequence is methylation specific or non-specific. The primer molecule is at least 9bp.
In one or more embodiments, the agent is a probe molecule that hybridizes to the DNA sequence or fragment thereof. In one or more embodiments, the probe further comprises a detectable substance. In one or more embodiments, the detectable species is a 5 'fluorescent reporter and a 3' labeled quencher. In one or more embodiments, the fluorescent reporter gene is selected from Cy5, FAM, and VIC. Preferably, the sequence of the probe comprises MGB (Minor groove binder) or LNA (Locked nucleic acid). The probe molecule is at least 12bp.
In one or more embodiments, the reagent comprises a medium as described in any embodiment herein.
In one or more embodiments, the kit is a non-invasive diagnostic kit.
In one or more embodiments, the subject is a mammal, preferably a human.
In one or more embodiments, the sample is from a tissue, cell, or bodily fluid of a mammal, such as pancreatic tissue or blood. In one or more embodiments, the sample is pancreatic cancer tissue, preferably a fine needle biopsy. In one or more embodiments, the sample is plasma.
In one or more embodiments, the sample comprises genomic DNA or cfDNA.
In one or more embodiments, the DNA sequence is transformed, wherein unmethylated cytosines are converted to bases that have less ability to bind guanine than cytosines. The conversion is carried out using an enzymatic method, preferably a deaminase treatment, or the conversion is carried out using a non-enzymatic method, preferably a treatment with bisulfite, bisulfite or metabisulfite or a combination thereof.
In one or more embodiments, the DNA sequence is treated with a methylation sensitive restriction endonuclease.
In one or more embodiments, the kit further comprises PCR reaction reagents. Preferably, the PCR reaction reagents include DNA polymerase, PCR buffer, dNTP, mg2+.
In one or more embodiments, the kit further comprises additional reagents for detecting DNA methylation, the additional reagents being reagents used in one or more methods selected from the group consisting of: bisulfite conversion based PCR (e.g., methylation specific PCR), DNA sequencing (e.g., bisulfite sequencing, whole genome methylation sequencing, simplified methylation sequencing), methylation sensitive restriction enzyme analysis, fluorometry, methylation sensitive high resolution melting curve, chip-based methylation profile analysis, mass spectrometry (e.g., flight mass spectrometry). Preferably, the additional agent is selected from one or more of: bisulfite, bisulfite or pyrosulfite or their derivatives, restriction endonucleases sensitive or insensitive to methylation, enzyme digestion buffer, fluorescent dyes, fluorescence quenchers, fluorescence reporters, exonucleases, alkaline phosphatase, internal standards, and controls.
In one or more embodiments, the reaction solution for PCR comprises Taq DNA polymerase, PCR buffer (buffer), dNTPs, KCl、MgCl 2 And (NH) 4 ) 2 SO 4 . Preferably, the Taq DNA polymerase is a hot start Taq DNA polymerase. Preferably, mg 2+ The final concentration is 1.0-10.0mM.
In one or more embodiments, the diagnosing comprises: a score is derived, either by comparison with a control sample or by calculation, and pancreatic cancer is diagnosed based on the score. In one or more embodiments, the calculation is calculated by constructing a support vector machine model.
In yet another aspect, the present invention provides a method for pancreatic cancer screening, comprising:
(1) Detecting the methylation level of a DNA sequence or a fragment thereof or the methylation state or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof in a sample of a subject, said DNA sequence being selected from one or more or all of the following gene sequences: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX2, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14 zxft 3242, C14orf39, BNC1, SP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2,
(2) Comparing with a control sample, or calculating to obtain a score,
(3) Pancreatic cancer was diagnosed according to the score.
In one or more embodiments, the DNA sequence comprises a sense strand or an antisense strand of DNA.
In one or more embodiments, the fragment is 1-1000bp in length, preferably 1-700bp in length.
In one or more embodiments, the fragment comprises at least one CpG dinucleotide.
In one or more embodiments, the method further comprises DNA extraction and/or quality control prior to step (1).
In one or more embodiments, the DNA sequence is selected from one or more or all of the following sequences or their complements: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
In one or more embodiments, step (1) comprises performing the detecting using a nucleic acid molecule, primer molecule, probe molecule and/or medium as described herein.
In one or more embodiments, the detection includes, but is not limited to: bisulfite conversion based PCR (e.g., methylation specific PCR), DNA sequencing (e.g., bisulfite sequencing, whole genome methylation sequencing, simplified methylation sequencing), methylation sensitive restriction enzyme analysis, fluorometry, methylation sensitive high resolution melting curve, chip-based methylation profile analysis, mass spectrometry (e.g., flight mass spectrometry).
In one or more embodiments, the detecting is DNA sequencing. In one or more embodiments, the DNA sequencing has a sequencing depth greater than or equal to 5M, preferably 7M,11m,13m, or 15M.
In one or more embodiments, the sample is from a tissue, cell, or bodily fluid of a mammal, such as pancreatic tissue or blood. The mammal is preferably a human. In one or more embodiments, the sample is pancreatic tumor tissue, preferably a fine needle biopsy. In one or more embodiments, the sample is plasma.
In one or more embodiments, the sample comprises genomic DNA or cfDNA.
In one or more embodiments, the DNA sequence is transformed in which unmethylated cytosines are converted to bases that do not bind guanine. The conversion is carried out using an enzymatic method, preferably a deaminase treatment, or the conversion is carried out using a non-enzymatic method, preferably a treatment with bisulfite, bisulfite or metabisulfite or a combination thereof.
In one or more embodiments, the DNA sequence is treated with a methylation sensitive restriction endonuclease.
In one or more embodiments, the score in step (2) is calculated by constructing a support vector machine model.
In one or more embodiments, step (3) comprises: the methylation level of the subject sample is altered compared to the control sample, and when the methylation level meets a threshold, the subject is identified as having pancreatic cancer.
In one or more embodiments, step (3) comprises: when the score meets a threshold, the subject is identified as having pancreatic cancer.
In another aspect, the present invention provides a kit for identifying pancreatic cancer, comprising:
(a) Reagents or devices for determining the methylation level of a DNA sequence or fragment thereof or the methylation state or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof in a sample of a subject, and
optionally (b) a treated nucleic acid molecule of said DNA sequence or fragment thereof, said treatment converting unmethylated cytosine to a base having less ability to bind guanine than cytosine,
wherein the DNA sequence is selected from one or more (e.g., at least 7) or all of the following gene sequences, or sequences within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX2, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, SP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2.
In one or more embodiments, the DNA sequence is selected from one or more (e.g., at least 7) or all of the following or a complement thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
In one or more embodiments, the kit is suitable for use as described in any of the embodiments herein.
In one or more embodiments, the nucleic acid molecule is a nucleic acid molecule as described in the first aspect herein.
In one or more embodiments, the reagent comprises a primer molecule and/or a probe molecule.
In one or more embodiments, the agent comprises a primer molecule that hybridizes to the DNA sequence or fragment thereof. The primer molecule can amplify the DNA sequence or the segment thereof. In one or more embodiments, the primer sequence is methylation specific or non-specific. The primer molecule is at least 9bp.
In one or more embodiments, the agent is a probe molecule that hybridizes to the DNA sequence or fragment thereof. In one or more embodiments, the probe further comprises a detectable substance. In one or more embodiments, the detectable species is a 5 'fluorescent reporter and a 3' labeled quencher. In one or more embodiments, the fluorescent reporter gene is selected from Cy5, FAM, and VIC. Preferably, the sequence of the probe comprises MGB (Minor groove binder) or LNA (Locked nucleic acid). The probe molecule is at least 12bp.
In one or more embodiments, the reagent comprises a medium as described in any embodiment herein.
In one or more embodiments, the kit is a non-invasive diagnostic kit.
In one or more embodiments, the subject is a mammal, preferably a human.
In one or more embodiments, the sample is from a tissue, cell, or bodily fluid of a mammal, such as pancreatic tissue or blood. In one or more embodiments, the sample is pancreatic cancer tissue, preferably a fine needle biopsy. In one or more embodiments, the sample is plasma.
In one or more embodiments, the sample comprises genomic DNA or cfDNA.
In one or more embodiments, the DNA sequence is transformed, wherein unmethylated cytosines are converted to bases that have less ability to bind guanine than cytosines. The conversion is carried out using an enzymatic method, preferably a deaminase treatment, or the conversion is carried out using a non-enzymatic method, preferably a treatment with bisulfite, bisulfite or metabisulfite or a combination thereof.
In one or more embodiments, the DNA sequence is treated with a methylation sensitive restriction endonuclease.
In one or more embodiments, the kit further comprises PCR reaction reagents. Preferably, the PCR reaction reagent comprises DNA polymerase, PCR buffer solution,dNTP、Mg 2+
In one or more embodiments, the kit further comprises reagents for detecting DNA methylation, the reagents being reagents used in one or more of the following methods selected from: bisulfite conversion based PCR (e.g., methylation specific PCR), DNA sequencing (e.g., bisulfite sequencing, whole genome methylation sequencing, simplified methylation sequencing), methylation sensitive restriction enzyme analysis, fluorometry, methylation sensitive high resolution melting curve, chip-based methylation profile analysis, mass spectrometry (e.g., flight mass spectrometry). Preferably, the agent is selected from one or more of: bisulfite and its derivatives, restriction enzyme sensitive or insensitive to methylation, enzyme digestion buffer, fluorescent dye, fluorescence quencher, fluorescence reporter, exonuclease, alkaline phosphatase, internal standard, and reference substance.
In another aspect, the present invention provides an apparatus for diagnosing pancreatic cancer, the apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to perform the steps of:
(1) Obtaining the methylation level of a DNA sequence or fragment thereof or the methylation status or level of one or more cpgs in said DNA sequence or fragment in a sample from a subject, said DNA sequence being selected from one or more or all of the following gene sequences: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, ZNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2,
(2) Comparing with a control sample, or calculating a score, and
(3) Pancreatic cancer was diagnosed according to the score.
In one or more embodiments, step (1) is preceded by a step of obtaining DNA, such as DNA extraction and/or quality control.
In one or more embodiments, the DNA sequence is selected from one or more or all of the following sequences or their complements: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
In one or more embodiments, step (1) comprises detecting the level of methylation of the sequence in the sample using a nucleic acid molecule, primer molecule, probe molecule and/or medium described herein. In one or more embodiments, the detection includes, but is not limited to: bisulfite conversion based PCR (e.g., methylation specific PCR), DNA sequencing (e.g., bisulfite sequencing, whole genome methylation sequencing, simplified methylation sequencing), methylation sensitive restriction enzyme analysis, fluorometry, methylation sensitive high resolution melting curve, chip-based methylation profile analysis, mass spectrometry (e.g., flight mass spectrometry). In one or more embodiments, the detecting is DNA sequencing. Preferably, the sequencing depth of the DNA sequencing is greater than or equal to 5M, preferably 7M,11M,13M, or 15M.
In one or more embodiments, the sample is from a tissue, cell, or bodily fluid of a mammal, such as pancreatic tissue or blood. The mammal is preferably a human. In one or more embodiments, the sample is pancreatic tumor tissue, preferably a fine needle biopsy. In one or more embodiments, the sample is plasma.
In one or more embodiments, the sample comprises genomic DNA or cfDNA.
In one or more embodiments, the sequence is transformed, wherein unmethylated cytosines are converted to bases that do not bind guanine. The conversion is carried out using an enzymatic method, preferably a deaminase treatment, or the conversion is carried out using a non-enzymatic method, preferably a treatment with bisulfite, bisulfite or metabisulfite or a combination thereof.
In one or more embodiments, the DNA sequence is treated with a methylation sensitive restriction endonuclease.
In one or more embodiments, the score in step (2) is calculated by constructing a support vector machine model.
In one or more embodiments, step (3) comprises: the methylation level of the subject sample is altered compared to the control sample, and when the methylation level meets a threshold, the subject is identified as having pancreatic cancer.
In one or more embodiments, step (3) comprises: when the score meets a threshold, the subject is identified as having pancreatic cancer.
Drawings
Fig. 1 is a flow chart of the technical solution of the present invention according to an embodiment.
FIG. 2 is a ROC curve of the pancreatic cancer prediction Model CN for diagnosing pancreatic cancer in the test group.
Fig. 3 is a distribution of the prediction scores of the pancreatic cancer prediction Model CN in each group.
FIG. 4 is the methylation levels of the 56 sequences SEQ ID NO 1-56 in the training set.
FIG. 5 is the methylation levels of the 56 sequences SEQ ID NO 1-56 in the test group.
FIG. 6 is a classification ROC curve using CA19-9 alone, the SVM Model CN constructed using example 2 alone, and the Model constructed using example 2 in combination with CA 19-9.
FIG. 7 is a distribution of classification predicted scores using CA19-9 alone, SVM Model CN constructed using example 2 alone, and the Model constructed using example 2 in combination with CA 19-9.
FIG. 8 ROC curve of SVM Model CN constructed in example 2 in samples with tumor marker CA19-9 discriminant negatives (CA 19-9 measurements less than 37).
FIG. 9 ROC curves for the seven marker SEQ ID NO 9,14,13,26,40,43,52 combination model
FIG. 10 ROC curves for the seven marker SEQ ID NO 5,18,34,40,43,45,46 combination model
FIG. 11 ROC curve of the seven marker SEQ ID NO 11,8,20,44,48,51,54 combinatorial model
FIG. 12 ROC curves for the seven marker SEQ ID NO 14,8,26,24,31,40,46 combinatorial model
FIG. 13 ROC curve of the seven marker SEQ ID NO3,9,8,29,42,40,41 combinatorial model
FIG. 14 ROC curves for the seven marker SEQ ID NO 5,8,19,7,44,47,53 combination model
FIG. 15 ROC curves for the seven marker SEQ ID NO 12,17,24,28,40,42,47 combination model
FIG. 16 ROC curve of the seven marker SEQ ID NO 5,18,14,10,8,19,27 combinatorial model
FIG. 17 ROC curve of the seven marker SEQ ID NO 6,12,20,26,24,47,50 combinatorial model
FIG. 18 ROC curves of the seven marker SEQ ID NO 1,19,27,34,37,46,47 combinatorial model
Detailed Description
The present invention explores the relationship between methylation and pancreatic cancer. Aims to improve the accuracy of noninvasive diagnosis of pancreatic cancer by using the methylation level of related genes as a pancreatic cancer differential marker through a noninvasive method.
The inventors have found that a property of pancreatic cancer is associated with the methylation level of 1,2, 3,4, 5,6, 7,8, 9,10, 11, 12,13, 14,15, 16,17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27,28, 29, 30, 31, 32, 33, 34,35, 36, 37,38, 39, 40,41, 42, 43,44, 45,46, 47, 48, 49,50 genes or sequences within 20kb upstream or downstream thereof selected from the group consisting of: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, BNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2. In one or more embodiments, the property of pancreatic cancer is associated with the methylation level of a gene selected from any one of the following groups: (1) LBX2, TBR1, EVX, SFRP2, SYT10, CCNA1, ZFHX3; (2) TRIM58, HOXD4, INSIG1, SYT10, CCNA1, ZIC2, CLEC14A; (3) EMX1, POU3F3, TOPAZ1, ZIC2, OTX2, AHSP, TIMP2; (4) EMX1, EVX, RPL9, SFRP2, HOXA13, SYT10, CLEC14A; (5) TBX15, EMX1, LBX2, OLIG3, SYT10, AGAP2, TBX3; (6) TRIM58, VAX2, EMX1, HOXD4, ZIC2, CLEC14A, LHX; (7) POU3F3, HOXD8, RPL9, TBX18, SYT10, TBX3, CLEC14A; (8) TRIM58, EMX1, TLX2, EVX, HOXD4, IRX4; (9) SIX3, POU3F3, TOPAZ1, RPL9, SFRP2, CLEC14A, BNC; (10) DMRTA2, HOXD4, IRX4, INSIG1, MOS, CLEC14A, CLEC A. The invention provides a nucleic acid molecule containing one or more CpG of the above gene or its fragment.
Herein, the term "gene" includes both coding and non-coding sequences on the genome of the gene in question. Wherein the non-coding sequence includes introns, promoters and regulatory elements or sequences and the like.
Further, the property of pancreatic cancer is associated with a methylation level of any 1 segment or random 2,3, 4,5, 6,7, 8,9, 10, 11, 12,13, 14,15, 16,17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27,28, 29, 30, 31, 32, 33, 34,35, 36, 37,38, 39, 40,41, 42, 43,44, 45,46, 47, 48, 49,50, 51, 52, 53, 54, 55 segments or all 56 segments selected from the group consisting of: <xnotran> DMRTA2 SEQ ID NO:1, FOXD3 SEQ ID NO:2, TBX15 SEQ ID NO:3, BCAN SEQ ID NO:4, TRIM58 SEQ ID NO:5, SIX3 SEQ ID NO:6, VAX2 SEQ ID NO:7, EMX1 SEQ ID NO:8, LBX2 SEQ ID NO:9, TLX2 SEQ ID NO:10, POU3F3 SEQ ID NO:11, SEQ ID NO:12, TBR1 SEQ ID NO:13, 3242 zxft 3242 2 SEQ ID NO:14, SEQ ID NO:15, HOXD12 SEQ ID NO:16, HOXD8 SEQ ID NO:17, HOXD4 SEQ ID NO:18, SEQ ID NO:19, TOPAZ1 SEQ ID NO:20, SHOX2 SEQ ID NO:21, DRD5 SEQ ID NO:22, RPL9 SEQ ID NO:23, SEQ ID NO:24, HOPX SEQ ID NO:25, SFRP2 SEQ ID NO:26, IRX4 SEQ ID NO:27, TBX18 SEQ ID NO:28, OLIG3 SEQ ID NO:29, ULBP1 SEQ ID NO:30, HOXA13 SEQ ID NO:31, TBX20 SEQ ID NO:32, IKZF1 SEQ ID NO:33, INSIG1 SEQ ID NO:34, SOX7 SEQ ID NO:35, EBF2 SEQ ID NO:36, MOS SEQ ID NO:37, MKX SEQ ID NO:38, KCNA6 SEQ ID NO:39, </xnotran> 40 for the SYT10 gene region, 41 for the AGAP2 gene region, 42 for the TBX3 gene region, 43 for the CCNA1 gene region, 44 for the ZIC2 gene region, 45 for the SEQ ID NO, 46 for the CLEC14A gene region, 47 for the SEQ ID NO, 48 for the OTX2 gene region, 49 for the C14orf39 gene region, 50 for the BNC1 gene region, 51 for the AHSP gene region, 52 for the ZFHX3 gene region, 53 for the LHX1 gene region, 54 for the TIMP2 gene region, 55 for the ZNF750 gene region, and 56 for the SIM2 gene region.
In certain embodiments, the property of pancreatic cancer is associated with the level of methylation of a sequence selected from any one of the following groups, or the complement thereof: <xnotran> (1) SEQ ID NO:9, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:26, SEQ ID NO:40, SEQ ID NO:43, SEQ ID NO:52, (2) SEQ ID NO:5, SEQ ID NO:18, SEQ ID NO:34, SEQ ID NO:40, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:46, (3) SEQ ID NO:8, SEQ ID NO:11, SEQ ID NO:20, SEQ ID NO:44, SEQ ID NO:48, SEQ ID NO:51, SEQ ID NO:54, (4) SEQ ID NO:8, SEQ ID NO:14, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:31, SEQ ID NO:40, SEQ ID NO:46, (5) SEQ ID NO:3, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:29, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, (6) SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:19, SEQ ID NO:44, SEQ ID NO:47, SEQ ID NO:53, (7) SEQ ID NO:12, SEQ ID NO:17, SEQ ID NO:24, SEQ ID NO:28, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:47, (8) SEQ ID NO:5, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:14, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:27, (9) SEQ ID NO:6, </xnotran> 12,20, 24, 26, 47,50, (10) 1,19,27,34,37,46, 47.
The "pancreatic cancer-associated sequence" as used herein includes the above 50 genes, sequences within 20kb upstream or downstream thereof, the above 56 sequences (SEQ ID NOS: 1-56), or their complements.
The positions of the above 56 sequences in the human chromosome are as follows: SEQ ID nos. 1-50887bps, 12-2492414541697-SEQ ID NOs. Herein, the base number of each sequence and methylation site corresponds to the reference genome HG19.
In one or more embodiments, the nucleic acid molecule described herein is a fragment of one or more genes selected from DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, ins 1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, hxc 3, ahzf 2, TLX 1, znx 2, ZNF 750; the length of the fragment is 1bp-1kb, preferably 1bp-700bp; the fragments comprise one or more methylation sites in the chromosomal region of the corresponding gene. Methylation sites in the genes described herein, or fragments thereof, include, but are not limited to: 50884514,50884531,50884533,50884541,50884544,50884547,50884550,50884552,50884566,50884582,50884586,50884589,50884591,50884598,50884606,50884610,50884612,50884615,50884621,50884633,50884646,50884649,50884658,50884662,50884673,50884682,50884691,50884699,50884702,50884724,50884732,50884735,50884742,50884751,50884754,50884774,50884777,50884780,50884783,50884786,50884789,50884792,50884795,50884798,50884801,50884804,50884807,50884809,50884820,50884822,50884825,50884849,50884852,50884868,50884871,50884885,50884889,50884902,50884924,50884939,50884942,50884945,50884948,50884975,50884980,50884983,50884999,50885001,63788628,63788660,63788672,63788685,63788689,63788703,63788706,63788709,63788721,63788741,63788744,63788747,63788753,63788759,63788768,63788776,63788785,63788789,63788795,63788804,63788816,63788822,63788825,63788828,63788849,63788852,63788861,63788870,63788872,63788878,63788881,63788889,63788897,63788902,63788906,63788917,63788920,63788933,63788947,63788983,63788987,63788993,63788999,63789004,63789011,63789014,63789020,63789022,63789025,63789031,63789035,63789047,63789056,63789059,63789068,63789071,63789073,63789077,63789080,63789083,63789092,63789094,63789101,63789106,63789109,63789124,119522172,119522188,119522190,119522233,119522239,119522313,119522368,119522386,119522393,119522409,119522425,119522427,119522436,119522440,119522444,119522446,119522449,119522451,119522456,119522459,119522464,119522469,119522474,119522486,119522488,119522500,119522502,119522516,119522529,119522537,119522548,119522550,119522559,119522563,119522566,119522571,119522577,119522579,119522582,119522594,119522599,119522607,119522615,119522621,119522629,119522631,119522637,119522665,119522673,156611713,156611720,156611733,156611737,156611749,156611752,156611761,156611767,156611784,156611791,156611797,156611802,156611811,156611813,156611819,156611830,156611836,156611842,156611851,156611862,156611890,156611893,156611902,156611905,156611915,156611926,156611945,156611949,156611951,156611960,156611963,156611994,156612002,156612015,156612024,156612034,156612042,156612044,156612079,156612087,156612090,156612094,156612097,156612105,156612140,156612147,156612166,156612188,156612191,156612204,156612209,248020399,248020410,248020436,248020447,248020450,248020453,248020470,248020495,248020497,248020507,248020512,248020516,248020520,248020526,248020536,248020543,248020559,248020562,248020566,248020573,248020579,248020581,248020589,248020591,248020598,248020625,248020632,248020641,248020671,248020680,248020688,248020692,248020695,248020697,248020704,248020707,248020713,248020721,248020729,248020741,248020748,248020756,248020765,248020775,248020791,248020795,248020798,248020812,248020814,248020821,248020826,248020828,248020831,248020836,248020838,248020840,248020845,248020848,248020861,248020869,248020878,248020883,248020886,248020902,248020905,248020908,248020914,248020925,248020930,248020934,248020937,248020940,248020953,248020956,248020975 of chr1 chromosome (ii) a 45028802,45028816,45028832,45028839,45028956,45028961,45028965,45028973,45029004,45029017,45029035,45029046,45029057,45029060,45029063,45029065,45029071,45029106,45029112,45029117,45029128,45029146,45029176,45029179,45029184,45029189,45029192,45029195,45029218,45029226,45029228,45029231,45029235,45029263,45029273,45029285,45029288,45029295,45029307,45029317,45029353,45029357,71115760,71115787,71115789,71115837,71115928,71115936,71115948,71115962,71115968,71115978,71115981,71115983,71115985,71115987,71115994,71116000,71116022,71116024,71116030,71116036,71116047,71116054,71116067,71116096,71116101,71116103,71116107,71116117,71116119,71116130,71116137,71116141,71116152,71116154,71116158,71116174,71116188,71116190,71116194,71116203,71116215,71116226,71116233,71116242,71116257,71116259,71116261,71116268,71116271,73147340,73147350,73147364,73147369,73147382,73147405,73147408,73147432,73147438,73147444,73147481,73147491,73147493,73147523,73147529,73147537,73147559,73147571,73147582,73147584,73147592,73147595,73147598,73147607,73147613,73147620,73147623,73147631,73147644,73147668,73147673,73147678,73147687,73147690,73147693,73147695,73147710,73147720,73147738,73147755,73147767,73147771,73147789,73147798,73147803,73147811,73147814,73147816,73147822,73147825,73147827,73147829,74726438,74726440,74726449,74726478,74726480,74726482,74726484,74726493,74726495,74726524,74726526,74726533,74726536,74726539,74726548,74726554,74726569,74726572,74726585,74726597,74726599,74726616,74726633,74726642,74726649,74726651,74726656,74726668,74726672,74726682,74726687,74726695,74726700,74726710,74726716,74726734,74726746,74726760,74726766,74726772,74726784,74726791,74726809,74726828,74726833,74726835,74726861,74726892,74726894,74726908,74742879,74742882,74742891,74742913,74742922,74742925,74742942,74742950,74742953,74742967,74742981,74742984,74742996,74743004,74743006,74743009,74743011,74743015,74743021,74743035,74743056,74743059,74743061,74743064,74743068,74743073,74743082,74743084,74743101,74743108,74743111,74743119,74743121,74743127,74743131,74743137,74743139,74743141,74743146,74743172,74743174,74743182,74743186,74743191,74743195,74743198,74743207,74743231,74743234,74743241,74743243,74743268,74743295,74743301,74743306,74743318,74743321,74743325,74743329,74743333,74743336,74743343,74743346,74743352,74743357,105480130,105480161,105480179,105480198,105480207,105480210,105480212,105480226,105480254,105480258,105480272,105480291,105480337,105480360,105480377,105480383,105480387,105480390,105480407,105480409,105480412,105480424,105480426,105480429,105480433,105480438,105480461,105480464,105480475,105480481,105480488,105480490,105480503,105480546,105480556,105480571,105480577,105480581,105480604,105480621,105480623,105480630,105480634,105480637,162280237,162280239,162280242,162280245,162280249,162280257,162280263,162280289,162280293,162280297,162280306,162280309,162280314,162280317,162280327,162280331,162280341,162280351,162280362,162280368,162280393,162280396,162280398,162280402,162280405,162280407,162280409,162280417,162280420,162280438,162280447,162280459,162280462,162280466,162280470,162280473,162280479,162280483,162280486,162280489,162280492,162280498,162280519,162280534,162280539,162280548,162280561,162280570,162280575,162280585,162280598,162280604,162280611,162280614,162280618,162280623,162280627,162280633,162280641,162280647,162280657,162280673,162280681,162280693,162280708,162280728,176945102,176945119,176945122,176945132,176945134,176945137,176945141,176945144,176945147,176945150,176945159,176945165,176945170,176945177,176945179,176945186,176945188,176945198,176945200,176945213,176945215,176945218,176945222,176945224,176945250,176945270,176945274,176945288,176945296,176945298,176945316,176945329,176945336,176945339,176945345,176945347,176945351,176945354,176945356,176945372,176945374,176945378,176945381,176945384,176945387,176945392,176945398,176945402,176945417,176945422,176945426,176945452,176945458,176945462,176945464,176945468,176945497,176945507,176945526,176945532,176945547,176945550,176945570,176945580,176945582,176945585,176945604,176945609,176945647,176945679,176945695,176945732,176945747,176945750,176945761,176945770,176945789,176945791,176945795,176964640,176964642,176964663,176964665,176964667,176964670,176964672,176964685,176964690,176964694,176964703,176964709,176964711,176964720,176964724,176964736,176964739,176964747,176964769,176964778,176964805,176964811,176964834,176964838,176964843,176964847,176964863,176964865,176964869,176964875,176964879,176964886,176964892,176964930,176964946,176964959,176964966,176964969,176964978,176965003,176965021,176965035,176965062,176965065,176965069,176965085,176965099,176965102,176965109,176965125,176965130,176965140,176965186,176965196,176994516,176994525,176994528,176994531,176994537,176994546,176994557,176994559,176994568,176994570,176994583,176994586,176994623,176994637,176994654,176994661,176994665,176994682,176994688,176994728,176994738,176994747,176994750,176994753,176994764,176994768,176994773,176994778,176994780,176994783,176994793,176994801,176994804,176994807,176994809,176994811,176994822,176994830,176994832,176994837,176994839,176994848,176994851,176994853,176994859,176994864,176994867,176994871,176994880,176994890,176994905,176994909,176994911,176994931,176994934,176994936,176994938,176994942,176994944,176994948,176994952,176994961,176994964,176994971,176994974,176994980,176994983,176994986,176994996,176995011,176995013,177017050,177017079,177017124,177017173,177017179,177017182,177017193,177017211,177017223,177017225,177017227,177017237,177017239,177017246,177017251,177017253,177017267,177017270,177017276,177017296,177017300,177017331,177017352,177017368,177017374,177017378,177017389,177017446,177017449,177017452,177017463,177017483,177017488,177024359,177024367,177024415,177024502,177024514,177024528,177024531,177024540,177024548,177024550,177024558,177024582,177024605,177024616,177024619,177024634,177024642,177024655,177024698,177024709,177024714,177024723,177024725,177024748,177024756,177024769,177024771,177024776,177024783,177024800,177024836,177024838,177024856,177024861 of chr2 chromosome (ii) a 44063356,44063391,44063404,44063411,44063417,44063423,44063450,44063516,44063541,44063544,44063559,44063565,44063567,44063574,44063586,44063593,44063602,44063606,44063620,44063633,44063638,44063643,44063649,44063657,44063660,44063662,44063682,44063686,44063719,44063745,44063756,44063768,44063779,44063807,44063821,44063832,44063836,44063858,44063877,157812071,157812085,157812092,157812117,157812131,157812152,157812170,157812173,157812175,157812184,157812206,157812212,157812226,157812256,157812259,157812275,157812277,157812287,157812294,157812296,157812302,157812305,157812307,157812312,157812319,157812321,157812329,157812331,157812334,157812354,157812358,157812369,157812380,157812383,157812385,157812404,157812411,157812414,157812420,157812437,157812442,157812457,157812468,157812470,157812475,157812498,157812542,157812548 of chr3 chromosome; 9783036,9783050,9783059,9783075,9783080,9783097,9783105,9783112,9783120,9783126,9783142,9783144,9783153,9783160,9783166,9783185,9783192,9783196,9783198,9783206,9783213,9783218,9783220,9783233,9783244,9783246,9783252,9783271,9783275,9783277,9783304,9783322,9783327,9783342,9783348,9783354,9783358,9783361,9783363,9783376,9783398,9783409,9783425,9783427,9783442,9783449,9783467,9783492,9783494,9783496,9783501,9783508,9783511,39448284,39448302,39448320,39448323,39448340,39448343,39448347,39448365,39448422,39448432,39448453,39448464,39448473,39448478,39448481,39448503,39448516,39448524,39448528,39448549,39448551,39448557,39448562,39448568,39448575,39448577,39448586,39448593,39448613,39448625,39448629,39448633,39448647,39448653,39448662,39448665,39448670,39448683,39448695,39448697,39448729,39448732,39448748,39448757,39448759,39448767,39448773,39448796,39448800,39448809,39448811,39448836,39448845,39448857,39448864,39448869,39448874,57521138,57521209,57521237,57521297,57521304,57521310,57521336,57521348,57521377,57521397,57521411,57521419,57521426,57521442,57521449,57521486,57521506,57521518,57521537,57521545,57521581,57521603,57521622,57521631,57521652,57521657,57521665,57521680,57521687,57521701,57521716,57521725,57521733,154709378,154709414,154709425,154709441,154709492,154709513,154709522,154709540,154709557,154709561,154709576,154709591,154709597,154709607,154709612,154709617,154709633,154709640,154709663,154709675,154709684,154709690,154709697,154709721,154709745,154709756,154709759,154709789,154709812,154709828,154709834 of chr4 chromosome; 1876139,1876168,1876200,1876208,1876213,1876215,1876286,1876290,1876298,1876308,1876311,1876337,1876339,1876347,1876354,1876368,1876372,1876374,1876386,1876395,1876397,1876399,1876403,1876420,1876424,1876432,1876436,1876449,1876456,1876459,1876463,1876483,1876498,1876525,1876527,1876557,1876563,1876570,1876576,1876605,1876630,1876634,1876638 of chr5 chromosome; 85476921,85476930,85476974,85477014,85477032,85477035,85477070,85477083,85477106,85477124,85477151,85477153,85477166,85477175,85477186,85477217,85477228,85477230,85477236,85477245,85477249,85477251,85477253,85477261,85477283,137814512,137814516,137814523,137814548,137814558,137814561,137814564,137814567,137814620,137814636,137814638,137814642,137814645,137814654,137814666,137814679,137814689,137814695,137814707,137814710,137814717,137814723,137814728,137814744,137814746,137814749,137814768,137814776,137814786,137814788,137814792,137814794,137814803,137814807,137814818,137814824,137814837,137814860,137814920,137814935,137814952,137814957,137814960,137814969,137814971,137814986,137814988,137814995,137815016,137815024,137815030,137815034,137815036,137815040,150285620,150285634,150285641,150285652,150285659,150285661,150285670,150285677,150285688,150285695,150285697,150285706,150285713,150285715,150285724,150285731,150285733,150285742,150285760,150285767,150285769,150285775,150285778,150285788,150285813,150285815,150285826,150285829,150285844,150285860,150285887,150285890,150285892,150285901,150285908,150285910,150285926,150285928,150285937,150285944,150285956,150285963,150285966,150285974,150285981,150285983,150285992,150285999,150286001,150286010,150286017,150286019,150286028,150286035,150286038,150286046,150286055,150286063,150286073,150286082,150286089,150286091 of chr6 chromosome; 27244531,27244533,27244537,27244555,27244564,27244578,27244603,27244609,27244612,27244619,27244621,27244627,27244631,27244657,27244673,27244702,27244704,27244714,27244723,27244755,27244772,27244780,27244787,27244789,27244798,27244800,27244810,27244833,27244856,27244869,27244874,27244881,27244885,27244887,27244892,27244897,27244907,27244911,27244917,27244920,27244931,27244948,27244951,27244980,27244982,27244986,27245014,27245018,35293441,35293451,35293470,35293479,35293482,35293488,35293492,35293497,35293502,35293506,35293514,35293531,35293537,35293543,35293588,35293590,35293621,35293652,35293656,35293658,35293670,35293676,35293685,35293687,35293690,35293692,35293700,35293717,35293721,35293731,35293747,35293750,35293753,35293759,35293767,35293780,35293783,35293790,35293796,35293809,35293812,35293815,35293821,35293827,35293829,35293834,35293838,35293840,35293847,35293849,35293860,35293863,35293867,35293869,35293879,35293884,35293892,35293940,50343545,50343548,50343552,50343555,50343562,50343566,50343572,50343574,50343577,50343579,50343587,50343603,50343605,50343608,50343611,50343624,50343628,50343630,50343635,50343637,50343639,50343648,50343651,50343654,50343656,50343659,50343663,50343669,50343672,50343674,50343678,50343682,50343693,50343696,50343699,50343702,50343714,50343719,50343725,50343728,50343731,50343736,50343739,50343758,50343765,50343768,50343770,50343785,50343789,50343791,50343805,50343813,50343822,50343824,50343826,50343829,50343831,50343833,50343838,50343847,50343850,50343853,50343858,50343864,50343869,50343872,50343883,50343890,50343897,50343907,50343909,50343914,50343926,50343934,50343939,50343946,50343950,50343959,50343961,50343963,50343969,50343974,50343980,50343990,50344001,50344007,50344011,50344028,50344041,155167320,155167333,155167340,155167343,155167345,155167347,155167350,155167357,155167379,155167382,155167394,155167401,155167423,155167430,155167467,155167478,155167480,155167486,155167499,155167505,155167507,155167511,155167513,155167516,155167518,155167528,155167543,155167552,155167555,155167560,155167562,155167568,155167570,155167578,155167602,155167608,155167611,155167617,155167662,155167702,155167707,155167716,155167718,155167739,155167750,155167753,155167757,155167759,155167771,155167773,155167791,155167801,155167803,155167805,155167813,155167819,155167821,155167827 of chr7 chromosome; 10588729,10588742,10588820,10588833,10588841,10588851,10588857,10588865,10588867,10588883,10588888,10588895,10588938,10588942,10588946,10588948,10588951,10588959,10588992,10589003,10589007,10589009,10589016,10589034,10589060,10589062,10589076,10589079,10589093,10589152,10589193,10589206,10589241,25907660,25907702,25907709,25907724,25907747,25907752,25907754,25907757,25907769,25907796,25907800,25907814,25907818,25907821,25907824,25907838,25907848,25907866,25907874,25907880,25907884,25907893,25907898,25907900,25907902,25907906,25907918,25907947,25907976,25908055,25908057,25908064,25908071,25908098,25908101,57069480,57069544,57069569,57069606,57069631,57069648,57069688,57069698,57069709,57069712,57069722,57069735,57069739,57069755,57069764,57069773,57069775,57069784,57069786,57069791,57069793,57069800,57069812,57069816,57069823,57069825,57069827,57069839,57069842,57069847,57069851,57069853,57069884,57069889,57069894,57069907,57069914,57069919,57069931,57069940,57069948,57069958,57069968,57069973,57069978,57070013,57070035,57070038,57070042,57070046,57070066,57070079,57070087,57070091,57070126,57070143 of chr8 chromosome; 28034412,28034415,28034418,28034442,28034444,28034467,28034469,28034494,28034501,28034505,28034545,28034556,28034559,28034568,28034582,28034591,28034596,28034599,28034605,28034616,28034619,28034622,28034624,28034645,28034651,28034654,28034658,28034669,28034682,28034687,28034697,28034711,28034714,28034727,28034729,28034739,28034741,28034751,28034757,28034760,28034763,28034768,28034787,28034790,28034792,28034794,28034797,28034801,28034816,28034843,28034853,28034856,28034867,28034871,28034873,28034882,28034888,28034892,28034907 of the chr10 chromosome; 4918962,4918966,4918968,4918975,4918982,4919001,4919056,4919065,4919079,4919081,4919086,4919095,4919097,4919118,4919124,4919138,4919145,4919147,4919164,4919170,4919173,4919184,4919191,4919199,4919215,4919230,4919236,4919239,4919242,4919253,4919260,4919281,4919293,4919300,4919303,4919309,4919327,4919331,4919351,4919358,4919376,4919386,4919395,4919401,4919408,4919421,4919424,4919430,4919438,4919453,4919465,4919469,4919475,4919486,33592615,33592629,33592635,33592642,33592659,33592661,33592663,33592674,33592681,33592683,33592692,33592704,33592707,33592709,33592711,33592715,33592720,33592725,33592727,33592744,33592774,33592798,33592803,33592811,33592831,33592848,33592859,33592862,33592865,33592867,33592875,33592882,33592885,33592887,33592891,33592905,33592908,33592913,33592915,33592923,33592931,33592933,33592953,33592955,33592977,33592981,33592986,33592989,33592998,33593004,33593017,33593035,33593049,33593090,33593093,58131100,58131102,58131111,58131133,58131154,58131168,58131175,58131181,58131224,58131242,58131261,58131277,58131300,58131303,58131306,58131309,58131312,58131318,58131321,58131331,58131345,58131348,58131384,58131390,58131404,58131412,58131414,58131426,58131429,58131445,58131453,58131475,58131478,58131487,58131503,58131510,58131523,58131546,58131549,58131553,58131557,58131564,58131571,58131576,58131586,58131605,58131608,58131624,58131642,115124768,115124773,115124782,115124811,115124838,115124853,115124871,115124874,115124894,115124904,115124924,115124930,115124933,115124935,115124946,115124970,115124973,115124981,115124999,115125013,115125034,115125053,115125060,115125098,115125107,115125114,115125121,115125131,115125141,115125151,115125177,115125192,115125225,115125305,115125335 of chr12 chromosome; 37005452,37005489,37005501,37005520,37005551,37005553,37005557,37005562,37005566,37005570,37005582,37005596,37005608,37005629,37005633,37005635,37005673,37005678,37005686,37005694,37005704,37005706,37005721,37005732,37005738,37005741,37005745,37005773,37005778,37005794,37005801,37005805,37005814,37005816,37005821,37005833,37005835,37005844,37005855,37005857,37005878,37005881,37005883,37005892,37005899,37005909,37005924,37005929,37005934,37005939,37005941,100649486,100649489,100649519,100649538,100649567,100649569,100649577,100649584,100649601,100649603,100649605,100649623,100649625,100649628,100649648,100649671,100649673,100649686,100649689,100649691,100649701,100649705,100649715,100649718,100649721,100649725,100649731,100649734,100649738,100649740,100649745,100649763,100649769,100649777,100649785,100649792,100649800,100649847,100649886,100649912,100649915,100649917,100649941,100649945,100649949,100649965,100649975,100649982,100650005 of chr13 chromosome; 38724435,38724459,38724473,38724486,38724507,38724511,38724527,38724531,38724534,38724540,38724544,38724546,38724565,38724578,38724586,38724597,38724624,38724627,38724646,38724648,38724650,38724669,38724675,38724680,38724682,38724685,38724726,38724732,38724734,38724746,38724765,38724771,38724780,38724796,38724798,38724806,38724808,38724810,38724821,38724847,38724852,38724858,38724864,38724867,38724873,38724896,38724906,38724929,38724935,38724945,38724978,38724995,38725003,38725005,38725014,38725016,38725023,38725026,38725030,38725034,38725038,38725048,38725058,38725077,38725081,38725088,38725101,57275669,57275674,57275677,57275681,57275683,57275687,57275690,57275706,57275725,57275749,57275752,57275761,57275768,57275772,57275778,57275785,57275821,57275823,57275827,57275829,57275831,57275835,57275852,57275874,57275876,57275885,57275896,57275908,57275912,57275914,57275924,57275956,57275967,57275969,57275971,57275981,57275988,57275993,57275995,57276000,57276031,57276035,57276039,57276057,57276066,57276073,57276090,60952394,60952398,60952405,60952418,60952421,60952425,60952464,60952468,60952482,60952500,60952503,60952505,60952517,60952522,60952544,60952550,60952554,60952593,60952599,60952615,60952618,60952634,60952658,60952683,60952687,60952730,60952738,60952755,60952762,60952781,60952791,60952799,60952827,60952829,60952836,60952839,60952841,60952848,60952855,60952857,60952870,60952876,60952878,60952887,60952896,60952898,60952908,60952919,60952921,60952931 of chr14 chromosome; 83952068,83952081,83952084,83952087,83952095,83952105,83952108,83952114,83952125,83952135,83952140,83952156,83952160,83952162,83952175,83952178,83952181,83952184,83952188,83952200,83952206,83952209,83952214,83952220,83952225,83952229,83952236,83952238,83952242,83952266,83952285,83952291,83952298,83952309,83952314,83952317,83952345,83952352,83952358,83952360,83952367,83952406,83952411,83952414,83952418,83952420,83952425,83952430,83952453,83952464,83952472,83952486,83952496,83952498,83952500,83952506,83952508,83952527,83952553,83952559,83952566,83952570,83952582,83952592 of chr15 chromosome; 31579976,31580071,31580078,31580081,31580089,31580100,31580110,31580117,31580138,31580150,31580153,31580159,31580165,31580220,31580246,31580254,31580269,31580287,31580296,31580299,31580309,31580311,31580316,31580343,31580424,31580496,31580524,31580560,73096786,73096842,73096889,73096894,73096903,73096914,73096923,73096929,73096934,73096943,73096948,73096966,73096970,73096979,73097000,73097015,73097017,73097019,73097028,73097037,73097045,73097057,73097060,73097066,73097069,73097078,73097080,73097082,73097084,73097108,73097114,73097142,73097156,73097183,73097260,73097267,73097284,73097296,73097301,73097329,73097357,73097364,73097377,73097381,73097387,73097470 of chr16 chromosome; 35299698,35299703,35299710,35299719,35299729,35299731,35299741,35299746,35299776,35299813,35299816,35299822,35299837,35299850,35299877,35299885,35299913,35299915,35299926,35299928,35299933,35299935,35299944,35299946,35299963,35299966,35299972,35299974,35299990,35299996,35299999,35300006,35300010,35300020,35300027,35300036,35300039,35300044,35300059,35300068,35300074,35300086,35300097,35300109,35300115,35300146,35300151,35300163,35300167,35300172,35300196,35300202,35300214,35300217,35300221,76929645,76929709,76929713,76929742,76929769,76929829,76929873,76929926,76929982,76930043,76930095,76930148,76930169,80846623,80846652,80846683,80846709,80846717,80846730,80846745,80846763,80846794,80846860,80846867,80846886,80846960,80846965,80847079,80847092,80847115,80847128,80847137,80847153,80847158,80847209 of chr17 chromosome; 38081248,38081253,38081300,38081303,38081306,38081321,38081327,38081333,38081341,38081344,38081352,38081354,38081356,38081363,38081394,38081396,38081407,38081421,38081430,38081443,38081454,38081461,38081478,38081480,38081492,38081497,38081499,38081502,38081514,38081517,38081520,38081537,38081557,38081563,38081566,38081577,38081583,38081586,38081606,38081625,38081642,38081665,38081695,38081707,38081719,38081725,38081732 of chr21 chromosome. The base number of the methylation site corresponds to the reference genome HG19.
In one or more embodiments, the nucleic acid molecule is 1bp to 1000bp,1bp to 900bp,1bp to 800bp,1bp to 700bp in length. The nucleic acid molecule length can range between any of the above endpoints.
Herein, methods of detecting DNA Methylation are well known in the art, such as bisulfite conversion based PCR (e.g., methylation-specific PCR (MSP), DNA sequencing, whole genome Methylation sequencing, simplified Methylation sequencing, methylation-sensitive restriction enzyme analysis, fluorometry, methylation-sensitive high resolution melting curve, chip-based Methylation profile analysis, mass spectrometry.
Thus, the present invention relates to a reagent for detecting DNA methylation. Reagents used in the above-described methods for detecting DNA methylation are well known in the art. In detection methods involving DNA amplification, the reagents for detecting DNA methylation include primers. The primer sequences are methylation specific or non-specific. The sequence of the primer may include a non-methylation specific blocking sequence (Blocker). Blocking sequences may enhance the specificity of methylation detection. The reagent for detecting DNA methylation may further comprise a probe. Typically, the sequence of the probe is labeled at the 5 'end with a fluorescent reporter group and at the 3' end with a quencher group. Illustratively, the sequence of the probe comprises MGB (Minor groove binder) or LNA (Locked nucleic acid). MGB and LNA are used to increase Tm, increase specificity of analysis and increase flexibility of probe design.
As used herein, a "primer" refers to a nucleic acid molecule having a specific nucleotide sequence that directs the synthesis at the initiation of nucleotide polymerization. The primers are typically two oligonucleotide sequences synthesized by man, one primer complementary to one DNA template strand at one end of the target region and the other primer complementary to the other DNA template strand at the other end of the target region, which functions as the initiation point for nucleotide polymerization. The primers are usually at least 9bp. Primers designed artificially in vitro are widely used in Polymerase Chain Reaction (PCR), qPCR, sequencing, probe synthesis, and the like. Typically, primers are designed to amplify products of 1-2000bp, 10-1000bp, 30-900bp, 40-800bp, 50-700bp, or at least 150bp, at least 140bp, at least 130bp, at least 120bp in length.
The term "variant" or "mutant" as used herein refers to a polynucleotide that has a nucleic acid sequence that has been altered by insertion, deletion or substitution of one or more nucleotides compared to a reference sequence, while retaining the ability to hybridize to other nucleic acids. A mutant according to any of the embodiments herein comprises a nucleotide sequence having at least 70%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 97% sequence identity to a reference sequence and retaining the biological activity of the reference sequence. Sequence identity between two aligned sequences can be calculated using, for example, BLASTn from NCBI. Mutants also include nucleotide sequences that have one or more mutations (insertions, deletions, or substitutions) in the reference sequence and in the nucleotide sequence, while still retaining the biological activity of the reference sequence. The plurality of mutations typically refers to within 1-10, such as 1-8, 1-5, or 1-3. The substitution may be a substitution between purine nucleotides and pyrimidine nucleotides, or a substitution between purine nucleotides or between pyrimidine nucleotides. The substitution is preferably a conservative substitution. For example, conservative substitutions with nucleotides of similar or analogous properties are not typically made in the art to alter the stability and function of the polynucleotide. Conservative substitutions are, for example, exchanges between purine nucleotides (A and G), exchanges between pyrimidine nucleotides (T or U and C). Thus, substitution of one or more sites with residues from the same in the polynucleotides of the invention will not substantially affect their activity. Furthermore, the methylation sites (e.g., contiguous CG) in the variants of the invention are not mutated. That is, the method of the present invention detects methylation of a methylatable site in the corresponding sequence, and a mutation may occur in the base of a non-methylatable site. Typically, the methylation sites are consecutive CpG dinucleotides.
As described herein, transformation can occur between bases of DNA or RNA. "transformation", "cytosine transformation" or "CT transformation" as used herein is a process of converting an unmodified cytosine base (C) to a base having a lower binding ability to guanine than cytosine, for example, an uracil base (U), by treating DNA using a non-enzymatic or enzymatic method. Non-enzymatic or enzymatic methods of performing cytosine conversion are well known in the art. Exemplary non-enzymatic methods include treatment with a conversion reagent such as bisulfite, or metabisulfite, for example, calcium bisulfite, sodium bisulfite, potassium bisulfite, ammonium bisulfite, and the like. Illustratively, the enzymatic method includes a deaminase treatment. The transformed DNA is optionally purified. DNA purification methods suitable for use herein are well known in the art.
The present invention also provides a methylation detection kit for diagnosing pancreatic cancer, which comprises the primers and/or probes described herein, for detecting the methylation level of pancreatic cancer-associated sequences discovered by the inventors. The kit may further comprise a nucleic acid molecule as described herein, in particular according to the first aspect, as an internal standard or positive control.
As used herein, "hybridization" refers primarily to the pairing of nucleic acid sequences under stringent conditions. Exemplary stringent conditions are hybridization and washing of the membrane at 65 ℃ in a solution of 0.1 XSSPE (or 0.1 XSSC), 0.1% SDS.
In addition to the primers, probes, nucleic acid molecules, the kit contains other reagents required for detecting DNA methylation. Illustratively, other reagents for detecting DNA methylation may comprise one or more of: bisulfite and its derivatives, PCR buffer, polymerase, dNTP, primer, probe, restriction endonuclease sensitive or insensitive to methylation, enzyme digestion buffer, fluorescent dye, fluorescence quencher, fluorescence reporter, exonuclease, alkaline phosphatase, internal standard, and reference substance.
The kit can also include a transformed positive standard in which unmethylated cytosines are converted to bases that do not bind guanine. The positive standard may be fully methylated. The kit may further comprise PCR reaction reagents. Preferably, the PCR reaction reagent comprises Taq DNA polymerase, PCR buffer (buffer), dNTPs and Mg 2+
The present invention also provides a method for pancreatic cancer screening, comprising: (1) Detecting the methylation level of a pancreatic cancer-associated sequence described herein in a sample from a subject; (2) Comparing to a control sample, or calculating a score; and (3) identifying pancreatic cancer in the subject based on the score. Typically, the method further comprises, prior to step (1): extraction of sample DNA, quality inspection, and/or conversion of unmethylated cytosines on DNA to bases that do not bind guanine.
In a specific embodiment, step (1) comprises: treating genomic DNA or cfDNA with a conversion reagent to convert unmethylated cytosines to bases with less binding capacity to guanine than cytosines (e.g., uracil); performing PCR amplification using primers suitable for amplifying transformed sequences of pancreatic cancer-associated sequences described herein; the methylation status or level of at least one CpG is determined by the presence or absence of an amplification product, or sequence identification (e.g., probe-based PCR detection identification or DNA sequencing identification).
Or the step (1) can further comprise: treating genomic DNA or cfDNA with a methylation sensitive restriction enzyme; performing PCR amplification using primers suitable for amplifying sequences having at least one CpG of the pancreatic cancer-associated sequences described herein; the methylation status or level of at least one CpG is determined by the presence or absence of an amplification product.
As used herein, "methylation level" includes any number of sequences involved, and any positional relationship to the methylation status of a CpG. The relationship can be addition or subtraction of a methylation state parameter (e.g., 0 or 1) or a computational result of a mathematical algorithm (e.g., a mean, percentage, fraction, proportion, degree, or calculation using a mathematical model), including but not limited to a methylation level metric, a methylation haplotype ratio, or a methylation haplotype burden. The term "methylation state" indicates the methylation of a particular CpG site, and typically includes methylated or unmethylated (e.g., a methylation state parameter of 0 or 1).
In one or more embodiments, the methylation level of a subject sample is increased or decreased when compared to a control sample. Pancreatic cancer is identified when the methylation marker level meets a certain threshold. Alternatively, the methylation level of the gene being tested can be mathematically analyzed to obtain a score. And for the detected sample, when the score is larger than the threshold value, judging that the result is positive, namely the pancreatic cancer is obtained, otherwise, judging that the result is negative, namely the pancreatic cancer-free plasma. Methods of conventional mathematical analysis and processes of determining thresholds are known in the art, and exemplary methods are mathematical models, e.g., for differential methylation markers, a Support Vector Machine (SVM) model is constructed for two sets of samples, and the sample prediction scores in the test set are statistically determined using the accuracy, sensitivity and specificity of the model statistical test results and the area under the predictive value characteristic curve (ROC) (AUC).
In a preferred embodiment, the model training process is as follows: firstly, obtaining differential methylation sections according to the methylation level of each site and constructing a differential methylation region matrix, for example, a methylation data matrix can be constructed from the methylation level data of a single CpG dinucleotide position of HG19 genome through samtools software; then SVM model training is carried out.
An exemplary SVM model training process is as follows:
a) And constructing a training model mode. The sklern software package (0.23.1) of python software (v3.6.9) was used to construct the training pattern for the training model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a data matrix by using a sklern software package (0.23.1), wherein x _ train represents a training set data matrix, and y _ train represents phenotype information of a training set.
Generally, in constructing the model, the pancreatic cancer type may be encoded as 1 and the non-pancreatic cancer type may be encoded as 0. In the present invention, the threshold is set to 0.895 by python software (v3.6.9), the sklern software package (0.23.1). The constructed model also finally distinguishes the sample from the pancreatic cancer at 0.895.
Herein, the sample is from a mammal, preferably a human. The sample may be from any organ (e.g., pancreas), tissue (e.g., epithelial tissue, connective tissue, muscle tissue, and neural tissue), cell (e.g., pancreatic cancer biopsy), or bodily fluid (e.g., blood, plasma, serum, interstitial fluid, urine). In general, as long as the sample contains genomic DNA or cfDNA (Circulating free DNA or Cell free DNA). cfDNA is called circulating-free DNA or cell-free DNA, and is a degraded DNA fragment released into plasma. Illustratively, the sample is a pancreatic cancer biopsy, preferably a fine needle biopsy. Alternatively, the sample is plasma or cfDNA.
Also disclosed herein are methods of obtaining methylation haplotype ratios associated with pancreatic cancer. Taking methylation data obtained by methylation targeting sequencing (MethylTitan) as an example, the process of screening and testing marker sites is as follows: original double-ended sequencing reading, reading combination to obtain combined single-ended reading, reading without joint, comparison of a binary to a human DNA genome to form a BAM file, extraction of CpG site methylation level of each reading section by samtools to form a Haplotype file, statistics of C site methylation Haplotype proportion to form a meth file, calculation of MHF (Methylated Haplotype Fraction) methylation Haplotype ratio, formation of a meth.matrix matrix file by a Coverage200 filtering site, filtering according to the NA value of more than 0.1 filtering site, division of the sample into a training set and a test set in advance, construction of a logistic regression model for each Haplotype of the training set, selection of a regression P value of each methylation Haplotype ratio, statistics of the methylation Haplotype with the most significant P value in each methylation Haplotype amplification region to represent the methylation level of the region through a support vector machine, formation of a training set result (ROC graph) and verification by using a model prediction test set Titan. Specifically, the method for obtaining methylation haplotypes related to pancreatic cancer comprises the following steps: (1) Obtaining blood plasma of a patient sample to be detected with or without pancreatic cancer, extracting cfDNA, and performing library building and sequencing by adopting a MethTitan method to obtain a sequencing read; (2) Preprocessing sequencing data, including performing joint removal and splicing processing on the sequencing data generated by a sequencer; (3) And aligning the sequencing data after the pretreatment to an HG19 reference genome sequence of the human genome, and determining the position of each fragment. The data of step (2) can be derived from Illumina sequencing platform double-ended 150bp sequencing. The step (2) of removing the joints is to remove the sequencing joints at the 5 'end and the 3' end of the two pieces of double-ended sequencing data respectively, and to remove the low-quality bases after removing the joints. And (3) the splicing treatment in the step (2) is to merge and reduce the double-end sequencing data into the original library fragment. Therefore, the sequencing fragments can be well compared and accurately positioned. Illustratively, the sequencing library is around 180bp in length, and 150bp double ends can completely cover the whole library fragment. The step (3) comprises the following steps: (a) Respectively carrying out CT and GA conversion on HG19 reference genome data, constructing two sets of converted reference genomes, and respectively constructing comparison indexes on the converted reference genomes; (b) The upper combined sequencing sequence data is also subjected to CT and GA transformation; (c) And respectively comparing the transformed reference genome sequences, and finally summarizing comparison results to determine the position of the sequencing data in the reference genome.
In addition, the method for obtaining methylation values associated with pancreatic cancer further comprises (4) calculation of MHF; (5) constructing a methylation haplotype MHF data matrix; and (6) constructing a logistic regression model for each methylation haplotype from the sample groupings. And (4) acquiring the methylation haplotype state and the sequencing depth at the position of the HG19 reference genome according to the comparison result obtained in the step (3). Step (5) comprises merging the methylation haplotype status and sequencing depth information data into a data matrix. Wherein, each data point with the depth less than 200 is treated as a missing value, and the missing value is filled by using a K Nearest Neighbor (KNN) method. Step (6) comprises screening haplotypes with significant regression coefficients between the two groups based on statistical modeling using logistic regression for each position in the matrix.
As used herein, "plurality" refers to any integer. Preferably, a "plurality" of "one or more" can be any integer, e.g., greater than or equal to 2, including 2,3, 4,5, 6,7, 8,9, 10, 11, 12,13, 14,15, 16,17, 18, 19, 20, 30, 40,50, 60, or more.
The invention has the beneficial effects that:
based on the methylated nucleic acid fragment marker, pancreatic cancer can be effectively identified; the invention provides a diagnosis model of the relation between cfDNA methylation markers and pancreatic cancer based on plasma cfDNA high-throughput methylation sequencing, and the model has the advantages of non-invasive detection, safe and convenient detection, high throughput and high detection specificity; based on the optimal sequencing quantity obtained by the invention, the detection cost can be effectively controlled while better detection performance is effectively obtained.
Examples
The present invention will be described in further detail with reference to the following drawings and specific examples. In the following examples, the experimental methods without specifying specific conditions were generally carried out in accordance with the methods described in the conventional conditions.
Example 1: methylation targeted sequencing screening methylation sites for pancreatic cancer difference
The inventors collected a total of 94 pancreatic cancer blood samples, 80 non-pancreatic cancer blood samples, and all enrolled patients signed informed consent. Sample information is shown in the table below.
Figure BDA0003122215630000331
Methylation sequencing data of plasma DNA was obtained by the method of MethylTitan, and methylation classification markers were identified. The process is as follows:
1. extraction of plasma cfDNA samples
2ml of whole blood samples of the patients are collected by a streck blood collection tube, plasma is timely centrifugally separated (within 3 days), and after the plasma is transported to a laboratory, cfDNA is extracted by a QIAGEN QIAamp Circulating Nucleic Acid Kit according to the instruction.
2. Sequencing and data preprocessing
1) The library was paired-end sequenced using the Illumina Nextseq 500 sequencer.
2) Pear (v0.6.0) software combines double-end sequencing data of identical fragments sequenced by 150bp under an Illumina Hiseq X10/Nextseq 500/Nova seq sequencer into a sequence, the shortest overlapping length is 20bp, and the shortest length is 30bp after combination.
3) And performing de-splicing treatment on the combined sequencing data by using Trim _ galore v0.6.0 and cutatapt v1.8.1 software. The linker sequence was removed at the 5' end of the sequence as "AGATCGGAAGAGCAC" and bases with sequencing quality values below 20 at both ends were removed.
3. Sequencing data alignment
The reference genomic data used herein was from the UCSC database (UCSC: HG19, http:// hgdownload. Soe. UCSC. Edu/goldenPath/HG19/big Zips/HG19.Fa. Gz).
1) HG19 was first subjected to cytosine to thymine (CT) and adenine to Guanine (GA) conversions, respectively, using Bismark software, and the converted genomes were indexed using Bowtie2 software, respectively.
2) The pre-processed data were also subjected to CT and GA transformation.
3) The transformed sequences were aligned to the transformed HG19 reference genome, respectively, using Bowtie2 software, with a minimum seed sequence length of 20, the seed sequences not allowing for mismatches.
4. Calculation of MHF
And obtaining the methylation level corresponding to each site according to the alignment result for the CpG sites of each target area HG19. The nucleotide numbering of the sites herein corresponds to the nucleotide position numbering of HG19. A target methylated region may have multiple methylated haplotypes, and for each methylated haplotype in the target region, the calculation of this value is needed, and the MHF calculation formula is exemplified as follows:
Figure BDA0003122215630000341
wherein i represents the methylation interval of the target, h represents the methylation haplotype of the target, N i Denotes the number of reads located in the target methylation interval, N i,h Representing the number of reads containing the target methylated haplotype.
5. Matrix of methylated data
1) And combining the methylation sequencing data of each sample in the training set and the test set into a data matrix respectively, and performing deletion value processing on each site with the depth of less than 200.
2) Sites with a deletion value ratio higher than 10% were removed.
3) Missing data interpolation is performed on the missing values of the data matrix by using a KNN algorithm.
6. Discovery of characteristic methylated segments from training set sample grouping
1) And (3) constructing a logistic regression model for the phenotype by each methylation section, and screening out the methylation section with the most significant regression coefficient for each amplified target region to form a candidate methylation section.
2) And randomly dividing the training set into ten parts and performing ten-fold cross validation incremental feature screening.
3) And (3) sorting the candidate methylation sections of each region from large to small according to the significance of the regression coefficient, adding methylation section data each time, and predicting the test data.
4) Step 3) 10 calculations were performed each time using 10 data generated in 2), and the final AUC was averaged over 10. The candidate methylated segment is retained as the characteristic methylated segment if the AUC of the training data increases, otherwise discarded.
5) And taking the feature combination corresponding to the median of the average AUC under the condition of different feature quantities in the training set as the finally determined feature methylation section combination.
The distribution of the characteristic methylated nucleic acid sequences screened is as follows: <xnotran> DMRTA2 SEQ ID NO:1,FOXD3 SEQ ID NO:2,TBX15 SEQ ID NO:3,BCAN SEQ ID NO:4,TRIM58 SEQ ID NO:5,SIX3 SEQ ID NO:6,VAX2 SEQ ID NO:7,EMX1 SEQ ID NO:8,LBX2 SEQ ID NO:9,TLX2 SEQ ID NO:10,POU3F3 SEQ ID NO:11, SEQ ID NO:12,TBR1 SEQ ID NO:13,EVX2 SEQ ID NO:14, SEQ ID NO:15,HOXD12 SEQ ID NO:16,HOXD8 SEQ ID NO:17,HOXD4 SEQ ID NO:18, SEQ ID NO:19,TOPAZ1 SEQ ID NO:20,SHOX2 SEQ ID NO:21,DRD5 SEQ ID NO:22,RPL9 SEQ ID NO:23, SEQ ID NO:24,HOPX SEQ ID NO:25,SFRP2 SEQ ID NO:26,IRX4 SEQ ID NO:27,TBX18 SEQ ID NO:28,OLIG3 SEQ ID NO:29,ULBP1 SEQ ID NO:30,HOXA13 SEQ ID NO:31,TBX20 SEQ ID NO:32,IKZF1 SEQ ID NO:33,INSIG1 SEQ ID NO:34,SOX7 SEQ ID NO:35,EBF2 SEQ ID NO:36,MOS SEQ ID NO:37,MKX SEQ ID NO:38,KCNA6 SEQ ID NO:39,SYT10 SEQ ID NO:40,AGAP2 SEQ ID NO:41,TBX3 SEQ ID NO:42,CCNA1 SEQ ID NO:43,ZIC2 SEQ ID NO:44,SEQ ID NO:45,CLEC14A SEQ ID NO:46, </xnotran> 47, 48 for the SEQ ID NO of the OTX2 gene region, 49 for the C14orf39 gene region, 50 for the BNC1 gene region, 51 for the AHSP gene region, 52 for the ZFHX3 gene region, 53 for the LHX1 gene region, 54 for the TIMP2 gene region, 55 for the ZNF750 gene region, 56 for the SIM2 gene region. The levels of the above methylation markers increased or decreased in cfDNA of pancreatic cancer patients (table 1). The sequences of the 56 marker regions are shown in SEQ ID NO 1-56. The methylation levels of all CpG sites in each marker region can be obtained by MethylTitan sequencing. The mean of the methylation levels of all CpG sites in each region, as well as the methylation level of a single CpG site, can be used as a marker for diagnosing pancreatic cancer.
Table 1: mean level of methylation markers in the training set
Figure BDA0003122215630000351
Figure BDA0003122215630000361
Figure BDA0003122215630000371
Methylation marker methylation levels for pancreatic cancer versus non-pancreatic cancer populations in the panel are shown in table 2. As can be seen from the table, the distribution of the selected methylation markers in the pancreatic cancer population is significantly different from that in the pancreatic cancer population without pancreatic cancer, and the differentiation effect is good.
Table 2: methylation level of methylation marker in test set
Figure BDA0003122215630000372
Figure BDA0003122215630000381
Table 3 lists the correlation (Pearson correlation coefficient) between the methylation level of 10 random CpG sites or combinations in each selected marker and the methylation level of the whole marker and the corresponding significance p value, and it can be seen that the methylation level of a single CpG site or a combination of multiple CpG sites in the marker has significant correlation (p < 0.05) with the methylation level of the whole region, and the correlation coefficients are all above 0.8, which has strong correlation or strong correlation, indicating that the single CpG site or the combination of multiple CpG sites in the marker has good differentiation effect as the whole marker.
Table 3: correlation of methylation levels at random CpG sites or at combinations of sites in 56 markers with methylation levels of the entire marker
Figure BDA0003122215630000391
Figure BDA0003122215630000401
Figure BDA0003122215630000411
Figure BDA0003122215630000421
Figure BDA0003122215630000431
Figure BDA0003122215630000441
Figure BDA0003122215630000451
Figure BDA0003122215630000461
Figure BDA0003122215630000471
Figure BDA0003122215630000481
Figure BDA0003122215630000491
Figure BDA0003122215630000501
Figure BDA0003122215630000511
Example 2: predicted performance of Individual methylation markers
To verify the differential performance of individual methylation markers on patients with or without pancreatic cancer, the values of methylation levels of individual methylation markers were used to verify the predictive performance of individual markers.
Firstly, the values of the methylation levels of 56 methylation markers are respectively and independently used for training in a training set of samples, the threshold value for distinguishing the pancreatic cancer is determined, and the sensitivity and the specificity are determined, then the threshold value is used for counting the sensitivity and the specificity of the samples in a test set, and the results are shown in the following table 4, and it can be seen that a single marker can also achieve better distinguishing performance.
Table 4 predicted performance of 56 methylation markers
Figure BDA0003122215630000512
Figure BDA0003122215630000521
Figure BDA0003122215630000531
Figure BDA0003122215630000541
Example 3: predictive models for all marker combinations
To validate the potential ability to differentiate pancreatic cancer using methylated nucleic acid fragment markers, a support vector machine disease classification model was constructed based on 56 methylated nucleic acid fragment markers in a training set to validate the classification predictive effect of the set of methylated markers in a test set. The training set and the test set were divided by scale, with 117 training sets (samples 1-117) and 57 test sets (samples 118-174).
Support vector machine models were constructed in the training set for both sets of samples using the discovered methylation markers.
1) The samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2) SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct a training pattern for training the model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
In constructing the model, the pancreatic cancer sample type was coded as 1 and the non-pancreatic cancer sample type was coded as 0, and the threshold was set to 0.895 by default in the sklern software package (0.23.1) type procedure. The constructed model finally distinguishes whether the sample has pancreatic cancer or not by taking 0.895 as a scoring threshold. The prediction scores of the two models for the training set samples are shown in table 5.
Table 5: model prediction scores for training sets
Figure BDA0003122215630000542
Figure BDA0003122215630000551
Figure BDA0003122215630000561
Based on the methylated nucleic acid fragment marker population of the present invention, the model established by SVM according to the present example was predicted in the test set. The test set was predicted using a prediction function, and the outcome was the prediction outcome (disease probability: default score threshold of 0.895, greater than 0.895 the subject was considered malignant). The test set was 57 cases (samples 118-174) and was calculated as follows:
command line:
test_pred=model.predict(test_df)
wherein test _ pred represents a prediction score obtained by a test set sample through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The prediction scores for the test groups are shown in Table 6, the ROC curve is shown in FIG. 2, the prediction score distribution is shown in FIG. 3, and the area under the total AUC for the test groups is 0.911. In a training set, when the specificity of the model is 90.7%, the sensitivity can reach 71.4%; in the test set, the sensitivity of the model can reach 83.9% when the specificity is 88.5%. Therefore, the discrimination of the SVM models established by the selected variables is good.
Fig. 4 and fig. 5 show the distribution of the group of 56 methylated nucleic acid fragment markers in the training group and the test group, respectively, and it can be found that the difference between the group of methylated markers in the plasma of the pancreatic cancer patients and the plasma of the pancreatic cancer subjects is stable.
Table 6: model predictive scores for test set samples
Figure BDA0003122215630000562
Figure BDA0003122215630000571
Example 4: tumor marker predictive comparison
Based on the methylation marker population of the present invention, the model established by SVM according to example 3 was predicted in the test set. Pancreatic cancer prediction was performed in combination with the CA19-9 marker. 130 samples (Table 7), the calculation is as follows:
command line:
Combine_scalar=RobustScaler().fit(combine_train_df)
scaled_combine_train_df=combine_scalar.transform(combine_train_df)
scaled_combine_test_df=combine_scalar.transform(combine_test_df)
combine_model=LogisticRegression().fit(scaled_combine_train_df,train_ca19_pheno)
wherein, combined _ train _ df represents a training set data matrix of combining the prediction scores of the test set samples obtained by the SVM prediction model constructed in embodiment 3 with CA19-9, and scaled _ combined _ train _ df represents the training set data matrix after normalization. scaled _ combine _ test _ df represents the normalized test set data matrix, and combine _ model represents the logistic regression model fitted using the normalized training set data matrix
The prediction scores of the samples are shown in Table 7, the ROC curve is shown in FIG. 6, the distribution of the prediction scores is shown in FIG. 7, and the AUC of the test group in the population is 0.935. It can be seen from the figure that the established logistic regression models all have good discrimination.
FIG. 7 shows the distribution of classification predicted scores using CA19-9 alone, the SVM model constructed using example 3 alone, and the model constructed using example 3 in combination with CA19-9, respectively, and it can be found that the method is more stable in pancreatic cancer identification.
Table 7: CA19-9 predictive score and model-merged CA19-9 predictive score
Figure BDA0003122215630000572
Figure BDA0003122215630000581
Figure BDA0003122215630000591
Figure BDA0003122215630000601
Example 5: performance of classification prediction model in traditional marker negative samples
Based on the methylation marker population of the present invention, the test was performed on a sample of the conventional tumor marker CA19-9 discrimination negativity (CA 19-9 measurement < 37) according to the model established by SVM in example 3.
The CA19-9 measurements and model predictions for the relevant samples are shown in Table 8, and the ROC curve is shown in FIG. 8. Also using 0.895 as a score threshold, the AUC value in the test set reached 0.885, and it can be seen that the SVM model constructed in example 3 still can achieve better effect for patients that cannot be identified by CA 19-9.
Table 8: CA19-9 measurements and predictive scores for SVM models
Figure BDA0003122215630000602
Figure BDA0003122215630000611
Example 6: model construction and performance evaluation of 7 marker combinations SEQ ID NO 9,14,13,26,40,43,52
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:9,14,13,26,40,43,52 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct the training pattern for the training model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 9, the AUC of the built model is 0.881, and when the specificity is 0.846 in the test set, the sensitivity can reach 0.774 (Table 9), and the model can achieve better distinguishing performance for pancreatic cancer patients and healthy people.
Table 9: performance of the 7 marker combination model
Grouping AUC value Sensitivity to Specificity of Threshold value
Training set 0.8586 0.7302 0.8519 0.5786
Test set 0.8809 0.7742 0.8462 0.5786
Example 7: model construction and performance evaluation of 7 marker combinations SEQ ID NO 5,18,34,40,43,45,46
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:5,18,34,40,43,45,46 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct a training pattern for training the model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed using the sklern software package (0.23.1) by inputting the matrix of methylation values, model.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: predict (test _ df), where test _ pred represents a prediction score obtained by subjecting a test set sample to the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 10, the AUC of the established model is 0.881, and when the specificity in the test set is 0.692, the sensitivity can reach 0.839 (Table 10), and better distinguishing performance can be achieved for pancreatic cancer patients and healthy people.
Table 10: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8898 0.8095 0.8519 0.4179
Test set 0.8809 0.8387 0.6923 0.4179
Example 8: model construction and performance evaluation of 7 marker combinations SEQ ID NO 8,11,20,44,48,51,54
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:8,11,20,44,48,51,54 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct the training pattern for the training model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: predict (test _ df), where test _ pred represents a prediction score obtained by subjecting a test set sample to the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 11, the AUC of the built model is 0.880, when the specificity in the test set is 0.769, the sensitivity can reach 0.839 (Table 11), and better distinguishing performance can be achieved for pancreatic cancer patients and healthy people.
Table 11: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of the drug Threshold value
Training set 0.8812 0.7143 0.8519 0.4434
Test set 0.8797 0.8387 0.7692 0.4434
Example 9: model construction and performance evaluation of 7 marker combinations SEQ ID NO 8,14,26,24,31,40,46
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:8,14,26,24,31,40,46 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, 1 part for training the model and 1 part for testing the model.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct the training pattern for the training model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 12, the AUC of the built model is 0.871, when the specificity is 0.885 in the test set, the sensitivity can reach 0.710 (Table 12), and the model can achieve better distinguishing performance for pancreatic cancer patients and healthy people.
Table 12: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8745 0.6984 0.8519 0.5380
Test set 0.8710 0.7097 0.8846 0.5380
Example 10: model construction and performance evaluation of 7 marker combinations SEQ ID NO3,9,8,29,42,40,41
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:3,9,8,29,42,40,41 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct the training pattern for the training model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 13, the AUC of the built model is 0.866, and when the specificity is 0.538 in the test set, the sensitivity can reach 0.903 (Table 13), and the model can achieve better distinguishing performance for pancreatic cancer patients and healthy people.
Table 13: performance of the 7 marker combination model
Grouping AUC value Sensitivity to Specificity of the drug Threshold value
Training set 0.8930 0.8413 0.8519 0.4014
Test set 0.8660 0.9032 0.5385 0.4014
Example 11: model construction and performance evaluation of 7 marker combinations SEQ ID NO 5,8,19,7,44,47,53
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:5,8,19,7,44,47,53 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct the training pattern for the training model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 14, the AUC of the established model is 0.864, and when the specificity is 0.577 in the test set, the sensitivity can reach 0.774 (Table 14), and the model can achieve better distinguishing performance for pancreatic cancer patients and healthy people.
Table 14: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8704 0.6984 0.8519 0.4803
Test set 0.8635 0.7742 0.5769 0.4803
Example 12: model construction and performance evaluation of 7 marker combinations SEQ ID NO 12,17,24,28,40,42,47
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:12,17,24,28,40,42,47 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, 1 part for training the model and 1 part for testing the model.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct a training pattern for training the model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 15, the AUC of the established model is 0.862, and when the specificity is 0.731 in the test set, the sensitivity can reach 0.871 (Table 15), and better distinguishing performance can be achieved for pancreatic cancer patients and healthy people.
Table 15: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8859 0.8571 0.8519 0.4514
Test set 0.8623 0.8710 0.7308 0.4514
Example 13: model construction and performance evaluation of 7 marker combinations SEQ ID NO 5,18,14,10,8,19,27
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:5,18,14,10,8,19,27 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, 1 part for training the model and 1 part for testing the model.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct a training pattern for training the model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed using the sklern software package (0.23.1) by inputting the matrix of methylation values, model.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 16, the AUC of the built model is 0.859, when the specificity in the test set is 0.615, the sensitivity can reach 0.839 (Table 16), and the model can achieve better distinguishing performance for pancreatic cancer patients and healthy people.
Table 16: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8510 0.6667 0.8519 0.4124
Test set 0.8586 0.8387 0.6154 0.4124
Example 14: model construction and performance evaluation of 7 marker combinations SEQ ID NO 6,12,20,26,24,47,50
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:6,12,20,26,24,47,50 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct a training pattern for training the model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed using the sklern software package (0.23.1) by inputting the matrix of methylation values, model.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 17, the AUC of the established model is 0.857, when the specificity is 0.846 in the test set, the sensitivity can reach 0.774 (Table 17), and the model can achieve better distinguishing performance for pancreatic cancer patients and healthy people.
Table 17: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8695 0.6984 0.8519 0.5177
Test set 0.8573 0.7742 0.8462 0.5177
Example 15: model construction and performance evaluation of 7 marker combinations SEQ ID NO 1,19,27,34,37,46,47
In order to verify the predicted performance of different marker combinations, 7 markers of SEQ ID NO:1,19,27,34,37,46,47 were selected for model construction and performance testing based on the 56 methylation marker populations of the present invention. The training set and the test set were divided, wherein the training set was 117 cases (samples 1-117) and the test set was 57 cases (samples 118-174).
Support vector machine models were constructed in the training set using these 7 methylation markers for both sets of samples:
1. the samples were pre-divided into 2 parts, with 1 part for training the model and 1 part for model testing.
2. SVM model training was performed using methylation marker levels in the training set. The specific training process is as follows:
a) The sklern software package (0.23.1) of python software (v3.6.9) was used to construct a training pattern for training the model cross-validation training model, command line: model = SVR ().
b) The SVM model is constructed by inputting a methylation numerical matrix by using a sklern software package (0.23.1), wherein x _ train represents a methylation numerical matrix of a training set, and y _ train represents phenotype information of the training set.
3. And (3) testing by using the data of the test set: and (3) bringing the model into a test set for testing, wherein a command line: prediction (test _ df), where test _ pred represents a prediction score of a test set sample obtained through the SVM prediction model constructed in the embodiment, model represents the SVM prediction model constructed in the embodiment, and test _ df represents test set data.
The ROC curve of the 7 marker combination models is shown in FIG. 18, the AUC of the established model is 0.856, and when the specificity in the test set is 0.808, the sensitivity can reach 0.742 (Table 18), and good distinguishing performance can be achieved for pancreatic cancer patients and healthy people.
Table 18: performance of the 7 marker combination model
Grouping AUC value Sensitivity of the composition Specificity of Threshold value
Training set 0.8492 0.6508 0.8519 0.5503
Test set 0.8561 0.7419 0.8077 0.5503
The research researches the difference between the plasma of a pancreatic cancer-free object and the plasma of a pancreatic cancer population by the methylation level of related genes in the plasma cfDNA, and screens 56 methylated nucleic acid fragments with obvious difference. Based on the methylated nucleic acid fragment marker group, a pancreatic cancer risk prediction model is established by a method of a support vector machine, so that pancreatic cancer can be effectively identified, and the kit has high sensitivity and specificity and is suitable for screening and diagnosing pancreatic cancer.
Sequence listing
<110> Shanghai Kun Yuanbiotech GmbH
<120> pancreatic cancer diagnosis related DNA methylation marker and application thereof
<130> 207241
<160> 56
<170> SIPOSequenceListing 1.0
<210> 1
<211> 501
<212> DNA
<213> Homo sapiens
<400> 1
agtagggcgc catgaaggcc agaccgcggc tgtgcgccgc cgccgcggag taggccaggc 60
gcagggggct gaggccgagc ggcgcgccca gcgggtaggc gcccgcgtcg gcaccgaagt 120
gactggcgtt gggctgcagc ggcgagaagg ccgagcggct gctcagcgag cccagcgccc 180
caggcgccat ggcgccggcc agcaagggtc tgtggtgcgg aggtgcggcg ggccccgcct 240
gcagcggcgc aggcagccca ggccccccgg cggcggcggc ggcggcggcg gcggcgtcga 300
cgcggctggg ccacgcgtcg tctgcagctg ctgcagcacc cacggcggcc ttatctgggg 360
gcgccgcagg gcccaggccg gccgccaggc ccccacggtg gtggttcagc acctgctcga 420
tggcctgcac cacgtcgccg ccgcagccct gcaacaccag ctccaggacg cctcgccggt 480
ggcctgggaa cacgcgtgtc a 501
<210> 2
<211> 542
<212> DNA
<213> Homo sapiens
<400> 2
ccctgccccc catctttcgg gggcactcaa accctcttcc cctgagctcc gtggcagccc 60
ccgaacaccc tcatcgcccg ctgccccctc cccgccgccg ctaccaaccc cgaggaggga 120
tgaccctctc cggcggcggc agcgccagcg acatgtccgg ccagacggtg ctgacggccg 180
aggacgtgga catcgatgtg gtgggcgagg gcgacgacgg gctggaagag aaggacagcg 240
acgcaggttg cgatagcccc gcggggccgc cggagctgcg cctggacgag gcggacgagg 300
tgcccccggc ggcaccccat cacggacagc ctcagccgcc ccaccagcag cccctgacat 360
tgcccaagga ggcggccgga gccggggccg gaccgggggg cgacgtgggc gcgccggagg 420
cggacggctg caagggcggt gttggcggcg aggagggcgg cgcgagcggc ggcgggcctg 480
gcgcgggcag cggttcggcg ggaggcctgg ccccgagcaa gcccaagaac agcctagtga 540
ag 542
<210> 3
<211> 577
<212> DNA
<213> Homo sapiens
<400> 3
atttgttctg cctgatgaaa gcaaaagctc gaactcccct cagggcgcga ggtgtgagac 60
ccttgggttc catttgcatt tctggtttgt cgttggcggg ttcctgattt gtttttgttt 120
tgttttggtc tgttctgttt tttggggggt gtctttcacc agggccttcc cggttagccc 180
agggtcccca catttctcca ggatgtaatt agagctaaga acagccgcca tccctcaggg 240
ttccgggtcc cgggtttcca gggtcccggg tttccaaggc cccgcgataa ccccgggcgc 300
acgcggcgcg atgcggcgag gcgaggcgag gcggtggggc cagcgcggag ccccaggcgc 360
gagaacagga actcgggctg gcacaccgag gcctcgcagc caagccgcgc ctgacccgtt 420
cgccgttccg gccccgcggc gcctccaagg ccgggccgag gggccgaggg gccgagggcg 480
ggcagacgcg gccacggcct aattctgact tctgaaggtc accgaaactg cgctgttttt 540
ccagagatgg gttgaagaga agagatgcaa tcccagt 577
<210> 4
<211> 501
<212> DNA
<213> Homo sapiens
<400> 4
atccgctgaa cgatgtccta cttcgctcgt ccttgctctc gccgctgctg ccggagccga 60
agcagagaag gcagcgggtc ccgtgaccgt cccgagagcc ccgcgctccc gaccaggggg 120
cgggggcggc cccggggagg gcggggcagg ggcgggggga agaaaggggg ttttgtgctg 180
cgccgggagg gccggcgccc tcttccgaat gtcctgcggc cccagcctct cctcacgctc 240
gcgcagtctc cgccgcagtc tcagctgcag ctgcaggact gagccgtgca cccggaggag 300
acccccggag gaggcgacaa acttcgcagt gccgcgaccc aaccccagcc ctgggtaggt 360
gagtgcctcc gcagccccgc cgcccgccgt ggggtcgggg acagggagaa gggagtgcct 420
gcctggtctg cgccccccgc ctgtcagccc ttgcctcgag gctctggggc acccaactcg 480
tcgactcctg acaccgcagc g 501
<210> 5
<211> 589
<212> DNA
<213> Homo sapiens
<400> 5
cttttcaccg ggtgtggctc gtctgagctc ttgaactgaa gccagcggac accacccgtc 60
ggcgcctgct ttcctggggc gtgggctcct ccccctgtgc agaccgcgag gggagacggt 120
gcgggcggcc gggagcgcag ccctccggga ggcgggtcat ggcctgggcg ccgcccgggg 180
agcggctgcg cgaggatgcg cggtgcccgg tgtgcctgga tttcctgcag gagccggtca 240
gcgtggactg cggccacagc ttctgcctca ggtgcatctc cgagttctgc gagaagtcgg 300
acggcgcgca gggcggcgtc tacgcctgtc cgcagtgccg gggccccttc cggccctcgg 360
gctttcgccc caaccggcag ctggcgggcc tggtggagag cgtgcggcgg ctggggttgg 420
gcgcggggcc cggggcgcgg cgatgcgcgc ggcacggcga ggacctgagc cgcttctgcg 480
aggaggacga ggcggcgctg tgctgggtgt gcgacgccgg ccccgagcac aggacgcacc 540
gcacggcgcc gctgcaggag gccgccggca gctaccaggt gaggcgccc 589
<210> 6
<211> 583
<212> DNA
<213> Homo sapiens
<400> 6
atccaccgtc acactctctc cgagcagcca gctccccgct taacggggaa attgaagcag 60
acagcctttg tctaaacact tcttttgccc agaatatctt aattttccta tttgaatgtt 120
taataaggtt tggggtgcag cagcttcctt ttaattgtga cggtgcggcc gcttgggcgt 180
gatcccttgg ctggggctgc agggggcccg tcctccaggg gcgcagaggg aaggaccagc 240
gtttccaagc cgggctctgg ccgccggcgc gagagcgagg ccaaggtctg ggggcagttc 300
agggggaccc cgaagtcggg acggcccaga aacgctttgc ccacagccac cgccctttcc 360
tttgtgagtt tccccaaagc cgtcggtgcg acccggcgcc gactctcctc ctcttctccc 420
tgcgagggcc cgcgccgccc gggcccagtc ctgggggata gatccctcgg ggcccaacgg 480
ctgggccacc gccggtctcc ggccactgct gcgaggacag gcgctgccta actaatttct 540
cctctaaggg ggctgtgcgt gcgtctcctt cccaactgat gtc 583
<210> 7
<211> 542
<212> DNA
<213> Homo sapiens
<400> 7
cagaaggtgt cacactctgg gatctgttcc gcagggaggc cacaggtgcc aggagacgcg 60
ggagagactg gctcctgcca agaacatttc tttgcattgt tcagtgcggt tttttatttt 120
tatttttgac tgtttgtttg cctaagagat gacttccctt gccagaaaaa aaaaagtgtt 180
gtaaaaataa aaggaaacgg gattacgatg taaaagacga atagataaac ccggttccgc 240
agatctgcgg cgcgcgcgcc tggcgacctc ggatacattc attgaagttg ccgcgcactc 300
gtacccgggt tcacctcgcc cctcgctcat tcctcccgac caaggcccat ggtcagaggt 360
gtcctcgccc cgcggccgtc agagggcgcg gcctacactc gaatgccggc cgagccctcc 420
acgcgctcgg aacttgggct tcccggtgca gcctccccgc gatcgcaatg cccgctgcct 480
ttcccgagcc cagtccggaa cccgcctctc tcggggacct tgacctcgcg cggacctcgt 540
cg 542
<210> 8
<211> 501
<212> DNA
<213> Homo sapiens
<400> 8
cagctccggc tctgagcgtc tccagtcagg cgaggcggat aaatccttcg caaaaccctc 60
ttggaaattg ccgccgcttc ctgagccatc agtcccagcg ggtacgttat cgagtagcac 120
aaacagttgg atttttccct caagaaccga gtctggacgc ggagatggag ccaagtgtgg 180
ctgcattttc ggacccggaa atccgttggg cactgaagga cttttcgaac cctgtagcgc 240
tgttgcttcg cggtccatcg tcgccgctgc agacggatgc gctccccggc ggctctacgc 300
cctccagtcc cggccaggcc tctgggctgg gagccgagcc gtctcgggcc ctccggcgcc 360
gcgttttcta gagaaccggg tctcagcgat gctcatttca gccccgtctt aatgcaacaa 420
acgaaacccc acacgaacga aaaggaacat gtctgcgctc tctgcgcagc gcttgggcgg 480
cgcggtcccg gcgcgcgggg a 501
<210> 9
<211> 522
<212> DNA
<213> Homo sapiens
<400> 9
aggaggagag aggtgaggaa aaggctaagt cagagtccgc gaccttgccg gctctatacc 60
ttcagagggc tgcagagcgc gcgcgtcaag tccgcggaaa gttttactag tcagctcctc 120
cagcgcgcac agcggcgacg ttggacccgg acccgactct ggaagctgcg gcgcagaggg 180
tgctcggggg accatgcgcg gggctaggat gtctgcgatg cttaagagtg tccggggtgt 240
tcggggctcg cgtcccgagt tcatggtcgg ccgggctggg gcggtccggc tgtccgttgc 300
gctaggctcc gcaaacgcct gggccccagt gctcggctcc caatccgggc ccccagcctc 360
ggacccgccc ccggctctgg gcccgagtcc cgtgtgcccc tcctcctgcg cccccacctc 420
tccaccccgg gccgcggtgg atctggagct cctagatgtc cggggagggt atttctacag 480
gctggggcag gcgcggagag cagaagccga ccaaactacc ca 522
<210> 10
<211> 501
<212> DNA
<213> Homo sapiens
<400> 10
caggtgctgg agttggagcg gcgcttcctg cgccagaagt acctggcctc tgcggagagg 60
gcggcgctgg ccaaggcctt gcgcatgacc gacgcacagg tcaaaacgtg gttccagaac 120
cgacgcacca agtggcggtg aggcgcggcg cgggcgaggg cggactgggg ttcccgagca 180
gggcctggtg agaagcgacg cggcgggcgc cccgctgacc ccgcgtctcc ctcccttagg 240
cgccagacgg cggaggagcg cgaggccgag cggcaccgcg cgggccggct gctcctgcat 300
ctgcagcagg acgcgttgcc acggccgctg cggccgccgc tgcccccgga ccctctctgc 360
ctgcacaact cgtcgctctt cgcgctgcag aacctgcagc cctgggccga ggacaacaaa 420
gtggcttcag tgtccgggct cgcctcggtg gtgtgagcga cgcccgtccg atcggcgtgg 480
agcgccgggc ccggagcggt g 501
<210> 11
<211> 501
<212> DNA
<213> Homo sapiens
<400> 11
cggactgtgg cccagcccac agaccagggc ccgaaattga ggtggggggc gtactctgtt 60
tgtcttcccg aaggatgcgg cgcgtggaag gagatgcgct gacttgttcc aacccataac 120
ctttcgctcg ggtccccatg tgcgggcaga agaagtcaga gcggaacagc ctagtgcact 180
ggcagggctc attgtctggg aagacaccga ggtctaggca gctgggactg cggagtggag 240
gcaaggccgg aggcggccgg cggctttgtg gaagtttcgc gccgccaggc cctgcgcgcc 300
gcacggggcg gtggagttct tgggcagccc ccggcgcttg gcccacgcct ccgcttcccg 360
cgtgtgggaa actcgagcac cctacaggca ccagggtaaa ctgcctgtgc ctggcccggt 420
gagggtcgct cccccaggcc ccgtctccgc ccgaggactg caggcctagg cctgcgggga 480
gatcctgaga ccgcggtgtg c 501
<210> 12
<211> 503
<212> DNA
<213> Homo sapiens
<400> 12
ggcccgaaat tgaggtgggg ggcgtactct gtttgtcttc ccgaaggatg cggcgcgtgg 60
aaggagatgc gctgacttgt tccaacccat aacctttcgc tcgggtcccc atgtgcgggc 120
agaagaagtc agagcggaac agcctagtgc actggcaggg ctcattgtct gggaagacac 180
cgaggtctag gcagctggga ctgcggagtg gaggcaaggc cggaggcggc cggcggcttt 240
gtggaagttt cgcgccgcca ggccctgcgc gccgcacggg gcggtggagt tcttgggcag 300
cccccggcgc ttggcccacg cctccgcttc ccgcgtgtgg gaaactcgag caccctacag 360
gcaccagggt aaactgcctg tgcctggccc ggtgagggtc gctcccccag gccccgtctc 420
cgcccgagga ctgcaggcct aggcctgcgg ggagatcctg agaccgcggt gtgcgggcgc 480
cggcagcagg gcaaggcagg gac 503
<210> 13
<211> 504
<212> DNA
<213> Homo sapiens
<400> 13
cttacgcggc ggcgggcgtg aaggcgctgc cgctgcaggc tgcaggctgc actggccgcc 60
cgctcggcta ctacgccgac ccgtcgggct ggggcgcccg cagtcccccg cagtactgcg 120
gcaccaagtc gggctcggtg ctgccctgct ggcccaacag cgccgcggcc gccgcgcgca 180
tggccggcgc caatccctac ctgggcgagg aggccgaggg cctggccgcc gagcgctcgc 240
cgctgccgcc cggcgccgcc gaggacgcca agcccaagga cctgtccgat tccagctgga 300
tcgagacgcc ctcctcgatc aagtccatcg actccagcga ctcggggatt tacgagcagg 360
ccaagcggag gcggatctcg ccggccgaca cgcccgtgtc cgagagttcg tccccgctca 420
agagcgaggt gctggcccag cgggactgcg agaagaactg cgccaaggac attagcggct 480
actatggctt ctactcgcac agct 504
<210> 14
<211> 507
<212> DNA
<213> Homo sapiens
<400> 14
tgaggcacga gcagggtgca gagccgccgc tggggggcgc gccggccgcc gccgccgagg 60
aggccgcagc cgctgcggct gccgcggctg ccgcggcaga ggccgcgctg ttgagccccg 120
cggcggccgc gggagcctgg tagagaccag ggtggcggaa gctacacagc agctccggcc 180
gagagtaggg gtgcgagagg gcgcggaagg tgtccagtgg ccggatggaa gtagcgaagg 240
gcgacgaagc cgcggccgcc gcgcctgagg ctgcagccgc ggccgccgcc gccgtgacgc 300
ccacgtgcgg gtagtagtgc agcggcacgt gcgagtggaa ggggtagggc aggcttccgg 360
tggcggccgc gtgcgtcatc atgtaggtgt agaagctggg gtcggctggg tgcggccagg 420
acatggccag gcgctgccgc ttgtccttca tgcgccggtt ctggaaccac acctgcgggg 480
agagacgcgc cgcagcctgg gttaggg 507
<210> 15
<211> 501
<212> DNA
<213> Homo sapiens
<400> 15
tggaagtagc gaagggcgac gaagccgcgg ccgccgcgcc tgaggctgca gccgcggccg 60
ccgccgccgt gacgcccacg tgcgggtagt agtgcagcgg cacgtgcgag tggaaggggt 120
agggcaggct tccggtggcg gccgcgtgcg tcatcatgta ggtgtagaag ctggggtcgg 180
ctgggtgcgg ccaggacatg gccaggcgct gccgcttgtc cttcatgcgc cggttctgga 240
accacacctg cggggagaga cgcgccgcag cctgggttag ggagcgcccc gtgttcccag 300
ctcctgtccc aggacctctg ccccttccgg acctctgaat ggcttggtct acttctctcc 360
gaccaagccc aaccccgagt accctgtggt ctcccagctg ggaaagtgtg gacggcagtg 420
tgtggaccgc cgtgggcaca ccgtcctcaa cgaagagggt cctctccccc gcgtccggct 480
gctgctgctc ctcaggcttt t 501
<210> 16
<211> 581
<212> DNA
<213> Homo sapiens
<400> 16
ggccagttgg ccgcgcttcc ccctatctcc tacccgcgcg gcgcgctgcc ctgggccgcc 60
acgcccgcct cctgcgcccc cgcgcagcct gcgggcgcca ctgccttcgg cggcttctcg 120
cagccctacc tggctggctc cgggcctctc ggcctgcagc ccccaacagc caaagacgga 180
cccgaagagc aggctaagtt ctatgcgccc gaagcggccg ctgggccaga ggagcgcggt 240
cgtacccggc cgtccttcgc ccccgagtct agcctggctc ctgcagtggc tgctctcaaa 300
gcggccaagt atgactacgc tggtgtgggt cgtgccacgc cgggctccac gaccctgctc 360
cagggggctc cctgcgcccc tggcttcaag gacgacacca agggcccgct caacttgaac 420
atgacagtgc aggcggcggg cgttgcctct tgcctgcgac cttcactgcc cgacggtaaa 480
cggtgcccat gctccccggg ccggtttggg ccgggatggg aggtggggtt caagggagag 540
tgtaagggga ggtgaaccgc ctgggggcgg gcaatagaca g 581
<210> 17
<211> 501
<212> DNA
<213> Homo sapiens
<400> 17
gtcggcagcc tcggcggcgg gggcgagatt ggcgggaggg gggcgcgggg ggggcgcggt 60
aagaggtggc ggcgggcaga gggtgttttt tttcttttcc ctccagagcc ggggtttgta 120
aaccgaggcc agagtgtccc cgtgggccga gcgcactttt ttcttgtccg ggtgcgctca 180
gtcactggtg cctgagagga aacagtggag gcagcggggc aggtcgcctg gggcgtcggc 240
gattatattg cggccgagcc ggggcgcgcc gggaaaggcc gggagggcgg cggcgcgcgg 300
gggctgggcg aggccccgcg acccgcgagg gaggcggcgc gaagccgagg cggcgggcgc 360
aagagccggg catgagcgcc cagtagctga gcgcccgcgg ctgcctggcc tcagaagcga 420
cgcgcgagcg cgggcgggcg gcagcagcga cgtagcccgg cggtcccggc ggcgagagca 480
gccgccccac aggcccccgc g 501
<210> 18
<211> 515
<212> DNA
<213> Homo sapiens
<400> 18
gggtggggat gggggggtgg gggaggactc cattttcaga gcagggggaa ggctgtggag 60
gagcggggga tttccaaaat gcttgagggt tccggacctg gtggtgggcc cagaagaagg 120
agcacatttg gggatcccgc aagcctgggg tatgtgggtg tgtttgagga ggtgggtggg 180
agtgagcgtg tgcgccgggg agagggcggg agggaggaag caagcgagct tgggagcgcg 240
cggggagggc cgcgggcctc ggggcgcgcc aggaagtgag cggcggaggc gaggggccta 300
actagtggcc gggcgctgac ctgcctgtcc tgtctgtttt gtctcgcagt gaaccccaac 360
tacaccggtg gggaacccaa gcggtcccga acggcctaca cccggcagca agtcctagaa 420
ctggaaaaag aatttcattt taacaggtat ctgacaaggc gccgtcggat tgaaatcgct 480
cacaccctgt gtctgtcgga gcgccagatc aagat 515
<210> 19
<211> 512
<212> DNA
<213> Homo sapiens
<400> 19
ctggcgctgg cacgcttaat tcttttttcc cacattgcag aatcattccc accagccact 60
cggagagtgg tgggaatctg tcttggttta atatttctaa aatataagtt tcattgtccc 120
ccaggttagc ccagccagga ctcattgcgc agtcctcctc gccttcctgg aggcgccgca 180
ggaagcggga agtcgcggct tggcggttgc tgggcctgtg ggatctgcgg gtcctgccca 240
gacctggagt cgcacagatc acggcgggca gtggctcagc gcctaggcgg ctccaggcct 300
cgaaggacca ggttggggtg ctcagggatc agagagggga ggtcgctctg ggtccgggtc 360
gcctgctacg cgccttttct gtctcagaag tggcggtgac tcggctgctg agtccgcgga 420
acgagccacg gaatggtggt ggtggcgggg ttttctgagg tgactggcca gagctgagag 480
tcgcggcttc cacctttggg ccggagcggg tc 512
<210> 20
<211> 558
<212> DNA
<213> Homo sapiens
<400> 20
tgggattgat ttttggcccc cgctgcagca agttgggggc tggtgaggag tgtagcggtg 60
actgggggcg gagtgcggac tcgcatccgc tgtaccagga gcccactgcc acctcgggat 120
ttttttttta acttggaatt tccatatgac aaaaaagaaa gaggtttctc ctcaatctaa 180
cggagccatt aacatctatt aataacgccg acagggtaag taacggagcc gcgctcctcg 240
gggtggtcac cgggctgcgt ggtcctcggc cggcctcctg catccgctgc ccctgtgcgc 300
tccgggccgg atgcgcaagg gcggcgcggg gaccaagcct ggctgccggc cgcctactcc 360
tccccttccc taaggtaagg ggtcgttttc acactcacca gagctcctgc gggctgagct 420
cgccccctcc cccgacttct ttgcggggca ttttctcttg ctggtgtatt acgtgtcatt 480
tctcacgggg cattgccggc cgcttttctg caactgtcct ttcggatttg gtgatctggt 540
ccggcacaga ggctctcc 558
<210> 21
<211> 548
<212> DNA
<213> Homo sapiens
<400> 21
gagagcaggc cttgcgggag tctggacccg aagggcgaga ctccacaggg ccaaggaaag 60
cggcctctgt cctccgttag tcttggggga gcagacgcaa gaggaggcaa gggcgccgcg 120
agctccccgg atgcactggt cccacaggcc gtgcccgagt ggagcactgc gaatggggcc 180
aagaaatttt ggcctttctc gccggacctg gctgcctccg cgggcctctc cgcctaccgc 240
gctcccgccg cggcccgact cccgcgggtc tccgcgccga acccacctgg ctcctatcgc 300
acgggacatt cccgacccac ccacgccgcg tcactgagcc tctgtaccga tacccggcgc 360
ctccgccagc agggcctgga cgcaccgcct cctttgacct cgggcttccc ccgcgctccg 420
ctgcttgggg cagactggcc ccgagaggga gccaccatct cccctgctcc agggtctcca 480
gggtccgaac ccgtgttggg atctgggtta ggattagggt ttggagcttg gagcctgcct 540
gttaggac 548
<210> 22
<211> 503
<212> DNA
<213> Homo sapiens
<400> 22
ctccagggat gcgccaagca cccttcggtt ttcccgggga gaattttccc cggcccgggg 60
actagggtct ggcgctgggg cgcccctcgg acctgcggga tcgcccctac actctggcgc 120
gctgagggcg gtgagcgagg gcgccaaggc acaggtgggg cgggagtcga gcgcggaggc 180
tcggggggcg ggacgcgggg cctgggagcg gccagggacc gcggcagcgc ctcagtgcca 240
gcctggcgcc cgcgactgcc tgccccagcc cctcagtggc ggcttgctct cttctctcgc 300
tccgaaccag acacagccgc tgccgctgcc gtccggcgcg ctacagactc ccgagaacag 360
ccctggctgt cagcgagcac cagccgcttc ctgtccccat cgcggagact ggaggggcgc 420
accacggcca tggagccaga ggcgcttcag gaggcaagag aagtccccgc gcgctccgca 480
gcccggcgca gctcatggtg agc 503
<210> 23
<211> 501
<212> DNA
<213> Homo sapiens
<400> 23
gacccacgcc cacctaggcc tccccgagcc tctgttgcat gccgacgggt ggctgaaccc 60
atcgacggcc gaggccttcc aggcctacgc tgggctgtgc ttccaggagc tgggggacct 120
ggtgaagctc tggatcacca tcaacgagcc taaccggcta agtgacatct acaaccgctc 180
tggcaacgac acctacgggg cggcgcacaa cctgctggtg gcccacgccc tggcctggcg 240
cctctacgac cggcagttca ggccctcaca gcgcggggcc gtgtcgctgt cgctgcacgc 300
ggactgggcg gaacccgcca acccctatgc tgactcgcac tggagggcgg ccgagcgctt 360
cctgcagttc gagatcgcct ggttcgccga gccgctcttc aagaccgggg actaccccgc 420
ggccatgagg gaatacattg cctccaagca ccgacggggg ctttccagct cggccctgcc 480
gcgcctcacc gaggccgaaa g 501
<210> 24
<211> 553
<212> DNA
<213> Homo sapiens
<400> 24
tggctgaacc catcgacggc cgaggccttc caggcctacg ctgggctgtg cttccaggag 60
ctgggggacc tggtgaagct ctggatcacc atcaacgagc ctaaccggct aagtgacatc 120
tacaaccgct ctggcaacga cacctacggg gcggcgcaca acctgctggt ggcccacgcc 180
ctggcctggc gcctctacga ccggcagttc aggccctcac agcgcggggc cgtgtcgctg 240
tcgctgcacg cggactgggc ggaacccgcc aacccctatg ctgactcgca ctggagggcg 300
gccgagcgct tcctgcagtt cgagatcgcc tggttcgccg agccgctctt caagaccggg 360
gactaccccg cggccatgag ggaatacatt gcctccaagc accgacgggg gctttccagc 420
tcggccctgc cgcgcctcac cgaggccgaa aggaggctgc tcaagggcac ggtcgacttc 480
tgcgcgctca accacttcac cactaggttc gtgatgcacg agcagctggc cggcagccgc 540
tacgactcgg aca 553
<210> 25
<211> 610
<212> DNA
<213> Homo sapiens
<400> 25
aaaagagaag tcggagttta gacagggttt taaaagtcag ctaaaggctc ccacattgca 60
cctgtggtta acaaccacag gccgtgttgc attctttacc tggcactttt cgggataata 120
caggagcatt taaaaaatag ataagtcaat gaatgcactt agggggacat cggctgccgc 180
tgccgtcagc tgaaatgtta gctatctacc gtcttataaa acgccaggaa aaacctctaa 240
accttagagc cggggaattt tttaaaaaat cggaaccaaa tctccgtggc ttcgtgcagc 300
gtgagttctg cagctcgggg gacgctgcag tgtgatgtgg tggagagagc atgcttcacc 360
gctcctgcca tcctgacagc gccctccctc ccggcctcag cctcctggtt cgccaaaccg 420
gaggactgaa tttatggcta gctggtctct ggggcgcctt ccagctctga cattcccgcc 480
tagaatagat cttcccgaag gtttcgcaga cagaccagag gggaccgagc cgggaaggcg 540
agacagggac aggcgagaga cgctgctccc aactcgcaga gggagaaagc gtgtatcccg 600
ggctgccggg 610
<210> 26
<211> 506
<212> DNA
<213> Homo sapiens
<400> 26
gaacttctgc ccttcccgct actggcaccc caagcaggga tgcactggga tgcgtggcag 60
gggcgggatc tcctgggagc gtctcagccc agcagggagt ggggaagcaa gagggaaggc 120
ttaccttcct cggtggctgg caggaggtgg tcgctgctag cgagggggat gcaaaggtcg 180
ttgtcctggg ggaaacggtc gcactcaagc atgtcgggcc aggggaagcc gaaggcggac 240
atgaccgggg cgcagcggtc cttcacctgc acgcagagcg agtggcatgg ctggatggtc 300
tcgtctaggt catcgaggca gacgggggcg aagagcgagc acaggaactt cttggtgtcc 360
gggtggcact gcttcatgac cagcgggatc caagcgccgg cctgctccag cacctccttc 420
atggtctcgt ggcccagcag gttgggcagc cgcatgttct ggtattcgat gccgtggcac 480
agctgcaggt tggcagggat gggctt 506
<210> 27
<211> 510
<212> DNA
<213> Homo sapiens
<400> 27
attcgagttc ttttgccctt ttcagtctaa gacgtgggct ttctgcaaag cctccccctg 60
ccagcgagct ctcggagcgc ggagccttta gaaattgagg ggtttactgt caaaatgaaa 120
atttcacttc aaattacctt ggctgatgct cgctcgccag gccgggggct cccgccgcag 180
ccttttgaca ggcacatgag ccgcgagctt ccgaacctcg ataatatcat ctcgagcgcg 240
aaagtcaata cggtgacagc gcgcggccgg atacaatcca attacgctcg gctgcccggg 300
cgctcctggg gctcggggtc cggcggccga gggtccccct cagggcccgg tccaggccct 360
gtcgccaggg ttcagggcag gccccaccac gcgggggact ttggtggccc aggggtcccc 420
acgaggccgc agtccgggtc cgcccagccc caggctccta gaggaaagcc gagcctagtg 480
agtccctcca aggccgcccg cccgcaagac 510
<210> 28
<211> 501
<212> DNA
<213> Homo sapiens
<400> 28
acttgcgtta agttcggctc aggctactgg attgggcagg accagctaac ccaggtcccg 60
aggggcagtg tgtcacagac tgcagcccac tccaacctcg gctcctggag aaggggcgtc 120
gaatctctct tgggcatggg agggaaagac attccgagtt ggctgggcgg agtggcagcc 180
ttgagagtga cgagtgacag caaagcctcg tcctagcaag gccttttacc aacagcgcgg 240
catgcccttt cgaggagagc gccaggccct cgcactttgc aagtcaagag agcaaagaaa 300
gcggggacag ggcgcgtaat cgcaatgtcc ggtcgcgcgt gtgcacgtgt ctgtgtttgc 360
atgtgtgcgt gagcatgtgc acctgctcaa gtgtaaatgt gtctgttggc agttggggtc 420
taagtacctg agaatgtgtg tcttctgttg ctttaggaga ttaaaatgtc ttttcccagt 480
attgagctac attgaggaaa c 501
<210> 29
<211> 555
<212> DNA
<213> Homo sapiens
<400> 29
aagtccttgg actcggccga cagccgggcc atgttggctg tggagagagc ggacaggtgc 60
ggcggcggcg gcatctggca gatggtgcag gggcagggca gaccagccca gtgctggaag 120
ccgctgccca gctgcagcgc gggcggcgtg gagggcgcct tgagtagcga gtggggaggc 180
cggatggtgc cgatggcggg aagtgaggcg gcggacagcg gtgacgaggc gttgccagat 240
gagagcgcgc cgcccaagat ggggtgcacc gggtgcacgg agttggccgc gtgcgcgggg 300
tggccggccg agtggcccac ggtcccgcag tgaaaggccg agtggtggcc cccatagatc 360
tcgccaacca gcctcttcat ctcctccagg gagctggtga gcatgaggat gtagtttctg 420
gcgagcagga gtgtggcgat cttggagagc ttgcgcaccg acggcccatg cgcgtagggc 480
atgacttcgc gcagcccgtc catggctagg ttcaggtcgt gcatccgctt gcgttcgcgt 540
ccgttgatct tcagc 555
<210> 30
<211> 501
<212> DNA
<213> Homo sapiens
<400> 30
tcaggccagg aggtttctgg aaggaccggt gctgtctccc cgaacatcgt ggtctccccg 60
aacatcgcgg cctctccgaa catcgccctc tctccgagca acgcgatctc cccgaacatc 120
gcggtctccc cgaaaatcgc gatctccccg aacattgcca tctcaccgaa catcgcgatc 180
tcgccgaaca tgcccggctg aaggcactca gttcccctcc gcggctcctt tccgccgggt 240
ctgattcctg cggctgctgc ttgccccgca ggccaggagg cttctggtag caccggcgcg 300
atgcccccga acatcgcgtt ctaccccaac atcgcgatcc ctccgaacat cgtgatcccc 360
cccgaacatc gccgtccccc cgagtaacgc ggtctccccg aacatcgcgg tccccccgaa 420
catcgcggta cccccgaaca tcgccgtctc cccgtacatt gcgatccccc gaaacattgc 480
gatctccccg aacatcgcga t 501
<210> 31
<211> 516
<212> DNA
<213> Homo sapiens
<400> 31
ttggccagcc gcgcccggac tcctcagagc tggcgcaaac tccgtcctcc aaaactcggc 60
tctgggaggc ctaagtgact ccgaagccgg cggcagccgc ggcagcggcc gtggtggtgg 120
aagagctctt ttccccgaca gtgccactga tcgctcttca ctggagctgg aaacagcctt 180
cgcggaaagg accggagcat gcgttagaag cagagggagc ttggtgaagg gctcggctgg 240
aaggaggaaa cgccttctcg cagtgcgcgg ccagcccgcg ggggacaccg gcttgctgga 300
ctgcaggggc ccgtgccacc caggaagtga cctgcgggtc actcagccgg ggcgctgggc 360
gagcgcggga cggcccggag aattccgtgc ggctgcgacg ggaaaaggac gaggggtctc 420
tgtacccgac gctgccactg gcccaaagga attttacccg cgagcgccca ccccacccta 480
gcttgatgct tacgcccgca acaaaacagg aaacca 516
<210> 32
<211> 516
<212> DNA
<213> Homo sapiens
<400> 32
agacttcgaa ggcagccgga gaggagaggg cccaccgagc actacggcgg gtgcgcacgc 60
cccggggcgc tcggcaggac gacagtctgc acagcccgaa ggcggaaacg agcatcaact 120
gcacaaagtc ctggggtcct ggagcatccc ctccgcgtcc ttcctccctc tggggctggg 180
gacagccggg atgtcccagg ctgaggtggc caccagccga gcgcggctgc taggacgctg 240
gcgtggggag cgcggcgcgg aactacggac agtgagccct ggcgctcgct gccctgcgcc 300
ttaatttgct ggcggcggcg atcccggagg cccgcagcca gtcagcgccg tctcacgtca 360
ccgcttcctg attccgccgc cgggggcggg gccgcgggcc gggcgcggag ggcgcgccca 420
gggtgcggcg cccgcgtggc ctgtcgcccc ggctgttcgg taccccagca caggttcagg 480
gaaaagggtg ccaccactag gctgacgcag cagcca 516
<210> 33
<211> 501
<212> DNA
<213> Homo sapiens
<400> 33
agcgccggcc gccgcatccc gtgcggggcc gcggcgcgat gctgcgctgg aatgaggaag 60
cgcggcggcg aggggagggc ccgggcgcgg tgcgcgcggg ggtggcggcg gcgcgccgag 120
cgggcccggc gcgggcgagc gggctgcagc cggcggcggc gccagcaggt acggcccgca 180
cccgccgccg ccccggcggc ctttgggggc tgagccggag cccggcgcga ttgcaaagtt 240
ttcgtgcgcg gcccctctgg cccggagttg cggctgagac gcgcgccgcg cgagccgggg 300
gactcggcga cggggcgggg acgggacgac gcaccctctc cgtgtcccgc tctgcgccct 360
tctgcgcgcc ccgctccctg taccggagca gcgatccggg aggcggccga gaggtgcgcg 420
cggggccgag ccggctgcgg ggcaggtcga gcagggaccg ccagcgtgcg tcaccccaaa 480
gtttgcgggg tggcagggcg c 501
<210> 34
<211> 517
<212> DNA
<213> Homo sapiens
<400> 34
agagctcccg gagggcttgg ccggccaccg ccgcgcggcg ctgctcgggg actgctactt 60
tgcaaggcgg cggctgcccc tgcggggttc gggttgcagg gtcaagtgtc acgtcctccg 120
caatctccaa tattcctgta atgtatttaa atggacgaat tcattacgcg gggccgtgtg 180
aatggggcga ggccgcgagc gcggcgcgat cagtagcgcc cactaacagt tcgttctgca 240
cggcggagcg cgagaccgcg gacccacgga agccccctca atggtgtttg cgtcctcgcc 300
gccaccggct tggtagggtc ctttagggaa ggaggaagag ttcaggcacc cggacagatc 360
ctaatggtct ttctgatttt tctttccctt cggtccgctt tccccgcgac ctcctccacc 420
ctcagtccgc ctttcaaacg tcgtccgcgg ggatggctgc gcgatggaga aattggtctc 480
gtccagagac gcgcgcacag ccgtccccgc gcacacg 517
<210> 35
<211> 562
<212> DNA
<213> Homo sapiens
<400> 35
tctcagccac ctgattgatt tctcctctca ctccacccgc acccagtctc cgggtccagg 60
cctccagctc cctcacttct ggctcttctc accctgaatt ttctccttat attttttctt 120
tcttcctccg attggcagtc ccgcttctcc gagtggagtc gctcccgccc tctcgcgtcc 180
ccccctggct gcgctgcgac ctgcgaactc ccccagtttc cctcatctgc acaccctggt 240
gtagaccgac cgtgcgcgcc gggcccacgt gcagcctggg gactgcaggc tgggagctca 300
cggccatctc tcggccgcgc tcaccgcagc tcccctgtca cccggccccc tgtgaggagc 360
tctgttcccg cgctctcata taagcgccgg cacacagtag gcgctcaagg cctgcagaat 420
gagtgagcaa atatagctca gacacctact gaatgaaagt cggcaggttt gactagatcc 480
tggaatttaa aatttactga gcgccaccca tgtgcggggc tccacagagg tgatcctgga 540
aggaggcagc gttgtggggg tg 562
<210> 36
<211> 503
<212> DNA
<213> Homo sapiens
<400> 36
ctcagtgata accgaagagc tactctgaaa tgcccccctt ttcctggtgg tgcccgccag 60
ccggcagggg aaagcccgag ggacctccca gctccttccc ggatcgcggc ggaggtgtga 120
gcgatgtgtt gattattcat atttttaccg agcgcatact ctgctgcggc cggcgccgcc 180
acatttcaca cgtacactga cgtacccaca tgcacaagcg ctcactcggc cccgcacgca 240
agcagcgccc cgcgcgcccg gggccctcct cggataaggg aggggtgaca aaagtctccc 300
gctcactgct gcctacccac ccccaacccg gctgcctttt cctccaggcc cccacaaaca 360
cccttggctt tcagatccaa ctttcttcct cataatatac tagtcaccgc gactcccgcc 420
tcccggattt gaggatgggg gagactttgg cggcgggggt cagctgcaaa tatggcacca 480
tctagaattt cattccattt agc 503
<210> 37
<211> 701
<212> DNA
<213> Homo sapiens
<400> 37
agtattagca tagagaatcc agtaatgtgt cgacaacaag cagatagttc ccaaaatgcc 60
aacctgtttc aacaaagatg aaaacaccaa taaacgaaaa gtagaaaaac ctatgtggac 120
gcatcaacag atgctgaaaa ggcatttcct agaagtcggc agccaaactt ggtaattctt 180
gcgtgtgata aaggcagccg tctgttctgc tcagaagggg tttcctaaca ggaggggccg 240
aatgcaggcg tcacatccac gccgccccag gtcgtacacc taggccgtcc gggctgtccc 300
agagccgcag gccccgcatc atccgcgtcc ttagcgcggg gcgcggagcc cgcagccagg 360
tgcggccgag acccgcgcgc cagggaaagc ggcgcagcgg acgcgggaga aggctggtgg 420
gtacaggttg cctccgggcc ggagcgccca tgcagggcga gctgcgctcc gcacaaaatt 480
gcggtggggg cgccagaccg ccttgctccg cccctgagcg gggcgccccg gcccaacccc 540
ctgagggagg agggtccagg tgccgcagac tcttagcccc tggcccggcg tccgcccggc 600
aggttctggc actcctcgtt ggtaagcccc gttatttcgt gcgcagtgtt tacagaatat 660
aaagttcttc aggaaacgat gttataggag aaacgcctgg a 701
<210> 38
<211> 505
<212> DNA
<213> Homo sapiens
<400> 38
tcccccaacg ccggcgaata attttaaagc aaaggaggcg cggccaggtg ggctcccaag 60
ctccgcgcag acccttgggc cagccttggc cgctacccga gcgcctctcc accagacctt 120
ggagggaagt tgggggaagg gcgggagagc accggcgccc agggcgcagg ggccagagcg 180
agcctggcgt tccgccgcag ccggctgaga ctcggcgacg cgggggctgt acctgtggct 240
gcggggccga cggccggctg cagggcggct ggctctcccg cctcgagact aggcgcactc 300
ccatccccgc cgcatgttct ccacgcgggc tccagcgcgc tcaccaccgc caccgccgtc 360
gtctcggctt tatttaccca gcccggcgcg cgccgcccgg gaacaggaat agcgaggcct 420
tctcatgttt cctgactgcc ggtcccagcc ggcgaacatc ctgcgggcgc ggtatccacg 480
ttcccgggcg ggtggagagg aagcg 505
<210> 39
<211> 549
<212> DNA
<213> Homo sapiens
<400> 39
atcccagtaa gctctagcac ccgggcgcgg gtaacgggaa gcgcagaacc aaatccccag 60
cgcccaggtc acctccccag acccagcctt gcagggacca gggctttagg gctcacggac 120
ccaacggcca ggtcagaccg cgaaccggga ggagcgcggg ccccacccta aagagggcgc 180
agccgggagc tggggagcgg gtgccgcgct ccagagattg tgtcgtgggc gccgtcctag 240
tggcggggag cgcacctccg agggggcatg agatcggaga aatcccttac gctggcggcg 300
ccgggggagg tccgtgggcc ggagggagag caacaggatg cgggagactt cccggaggcc 360
ggcgggggcg ggggctgctg tagtagcgag cggctggtga tcaatatctc cgggctgcgc 420
tttgagacac aattgcgcac cctgtcgctg tttccggaca cgctgctcgg agaccctggc 480
cggcgagtcc gcttcttcga ccccctgagg aacgagtact tcttcgaccg caaccggccc 540
agcttcgac 549
<210> 40
<211> 506
<212> DNA
<213> Homo sapiens
<400> 40
ccacgttggc cccatggcgg gagcggaggg cgtaggggaa ggagaggcgc gcgaggaggc 60
tgcggctgcc gcgaggtttg cgccaactct cccgccgcgc gagcgagccg aggcgcgctg 120
gaactagaga cccggcatgg agtgctgagg ggagggggga gccgtaaaaa agccaaagca 180
agccctcgac tcgcaagcac gcccccctcc tctccccagc gcactggtgt ttctggcggg 240
tgcctggcgg cgacgcgtcc aatcgcagcc cggcgcgggc gctaggtgac aggcggcgga 300
gcgcgcagac ccggctcccc gcgtcctctg aagaagggac tcgcgaggga gggagggagg 360
gagggcgggc ggcccggcgc ccctgccgag gccggggatg ctcatcgttg cccagagttg 420
gcccgaggag ccctctccgt tttcccaata cttttccctg catcagtgca gccatccccg 480
ccgcctttgt ctctccaact tttcca 506
<210> 41
<211> 560
<212> DNA
<213> Homo sapiens
<400> 41
ggctgcgcgg aagcagcggt gacagcagtg gctggactcg gagttggtgg gagggttagc 60
ggaggaggag agccggcagg cggtcccgga tgcaagtcac tgttgtccaa ggtcttactc 120
ttgcctttcc gaggggacaa cttccctcgg gctccagccc cagccccgac cccaccagag 180
gtcgaagctg tagagccccc tcccccggcg gcggcggcgg tggcggcggc agagaccgaa 240
gctccagtcc cggcgctgct ctttgacccc ttgaccctgg gcttgccctc gctttcgggc 300
catgacaggc ggctacccgc gcccttgccc ccgccggctt tggctccact cgtggtcacg 360
gtcttgcaag gcttgggagc cggcggagga ggcgccacct tgagcctccg gctgccggtg 420
ccagggtgcg gagaggatga gccagggatg ccgccgcccg cccggccttc gggctccggg 480
ccgccccagc tcgggctgct gagcaggggg cgccgggagg aggtgggggc gcccccaggc 540
ttggggtcgg ggctcagtcc 560
<210> 42
<211> 586
<212> DNA
<213> Homo sapiens
<400> 42
gggttcgaat cgaaaatgtc gacatcttgc taatggtctg caaacttccg ccaattatga 60
ctgacctccc agactcggcc ccaggaggct cgtattaggc agggaggccg ccgtaattct 120
gggatcaaaa gcgggaaggt gcgaactcct ctttgtctct gcgtgcccgg cgcgcccccc 180
tcccggtggg tgataaaccc actctggcgc cggccatgcg ctgggtgatt aatttgcgaa 240
caaacaaaag cggcctggtg gccactgcat tcgggttaaa cattggccag cgtgttccga 300
aggcttgtgc tgggcctggc ctccaggaga acccacgagg ccagcgctcc ccggaccccg 360
gcattaggcg ccagctgccg gctatctgcg gtctttttct ctctgcagac ccctcgatcc 420
tctttccttc ggtctcacac tcaacaaaag acagactaga gacgttgaaa gagcctgccc 480
ttcaacagag tcccagaaaa gggtgactta aggggaggag aagggagaaa gagggcagat 540
tccgggtcag aaaagaccca gataatttct ggcgtctctg aaatat 586
<210> 43
<211> 501
<212> DNA
<213> Homo sapiens
<400> 43
ataaagttcg attatttcac ctggcttgtc agtcacctat gcaggcgtct gagcccccgg 60
gtttccagga gccccccgta taaggacccc agggactcct ctccccacgc ggccgggccg 120
cccgcccggc ccccagcccg gagagctgcc accgaccccc tcaacgtccc aagccccagc 180
tctgtcgccc gcgttccttc ctcttcctgg gccacaatct tggctttccc gggccggctt 240
cacgcagttg cgcaggagcc cgcgggggaa gacctctcgt ggggacctcg agcacgacgt 300
gcgaccctaa atccccacat ctcctctgcc gcctcgcagg ccacatgcac cgggagccgg 360
gcggggcagg cgcggcccgc aaggaccccc gcgatggaga cgcaacactg ccgcgactgc 420
acttggggca gccccgccgc gtcccagccg cctcccggca ggaagcgtag gtgtgtgagc 480
cgacccggag cgagccgcgc c 501
<210> 44
<211> 528
<212> DNA
<213> Homo sapiens
<400> 44
agactccctc cttgggaacg tcgaactctc tctgccttgg ggagtggggc tcgataaagg 60
gtacctaggt cgcaccctgg caggggagca ctagagggcc gcgaggtccc gggtttcgcc 120
atcctgagac ccccgcgcgg atggcccagg aggggcgcgg cggccctgag tcaaggtggg 180
cgggggcagg tgcttccctc caccgcgttg tcctatgccg gcgcggtccc caccgcccga 240
cctagcccgg cgccggccga gcacggcggc cgcgcttcgc actccttcct cccaccgggt 300
ccgcaggccc ggcttcacga ttcccgggcc ctcgggcatg tgagggactt gagtgaatgc 360
agctccctca actcactccc gcaaaaccac agccaagagg gccttaagtc agagaacccg 420
gcctaggagc ctcccctaga gcctcggcgc gggccccttc cccttcccca catcggtcgg 480
ccgagggagc ctagagccgg tgggagacgg gcagcggcct ctcctgat 528
<210> 45
<211> 515
<212> DNA
<213> Homo sapiens
<400> 45
ggggctcgat aaagggtacc taggtcgcac cctggcaggg gagcactaga gggccgcgag 60
gtcccgggtt tcgccatcct gagacccccg cgcggatggc ccaggagggg cgcggcggcc 120
ctgagtcaag gtgggcgggg gcaggtgctt ccctccaccg cgttgtccta tgccggcgcg 180
gtccccaccg cccgacctag cccggcgccg gccgagcacg gcggccgcgc ttcgcactcc 240
ttcctcccac cgggtccgca ggcccggctt cacgattccc gggccctcgg gcatgtgagg 300
gacttgagtg aatgcagctc cctcaactca ctcccgcaaa accacagcca agagggcctt 360
aagtcagaga acccggccta ggagcctccc ctagagcctc ggcgcgggcc ccttcccctt 420
ccccacatcg gtcggccgag ggagcctaga gccggtggga gacgggcagc ggcctctcct 480
gatcctttcc tgcggtcata caagttccta gggtg 515
<210> 46
<211> 517
<212> DNA
<213> Homo sapiens
<400> 46
gcaaagcctc ccaagtcgtc taggcagtta gggagctctg cgcatttgcc agcacggagg 60
tacctcccgg ggcagggaca caacacatcg cccgagagtt tgtcccagcg agcgccgatt 120
tcgtccgcga tgcaagtaac tgagatcggg agctgtcccc ggcagagcgc actcacctcg 180
gtcccaggtg gactgaagtc cagagcggcg ctgtgcagct ggaagggcgc gcgatagctc 240
aagttagagg cggccccggg gcgcggcgca ggacacaaga cctcaaactg gtacttgcac 300
aggtagccgt tggcgcgcag gtggcatcgc atctccttcc agcctgcggg ctcgacccca 360
ccggtggcct ggagtaccgc gcatctccgc gcggtgcagg agcgttgggg ctcctccacc 420
cactgcagcg tgtcgctttc gagaccgccg gggtcggagg acagccagga gaaaccccgc 480
aaaggctcgt tctccagggt gcagtgggaa cgcctgc 517
<210> 47
<211> 507
<212> DNA
<213> Homo sapiens
<400> 47
ccaggtggac tgaagtccag agcggcgctg tgcagctgga agggcgcgcg atagctcaag 60
ttagaggcgg ccccggggcg cggcgcagga cacaagacct caaactggta cttgcacagg 120
tagccgttgg cgcgcaggtg gcatcgcatc tccttccagc ctgcgggctc gaccccaccg 180
gtggcctgga gtaccgcgca tctccgcgcg gtgcaggagc gttggggctc ctccacccac 240
tgcagcgtgt cgctttcgag accgccgggg tcggaggaca gccaggagaa accccgcaaa 300
ggctcgttct ccagggtgca gtgggaacgc ctgcgctcca gtgcgaccca gaacagcagg 360
tctttggagc cccctccggg ccctgggcct gcccgcagga gcgcgagcac agcgcgcagc 420
tcggcgcccg cacgcacggt gctgagcgcc ccacctcgca ggatgcaggc ctcctcggcc 480
gcctgccgct tcatggtagc gtggtgc 507
<210> 48
<211> 517
<212> DNA
<213> Homo sapiens
<400> 48
agacttcttg ggagtttgca gagcgacccg tcgcccgcgc ccggcgctgg cagggacctt 60
cggatggttc ttactgggcc gatccatggc acaggctggg cctcggcgaa cccctcggcc 120
cccgcccggc cccgagccac gacacctcat tgtcctggag cctgggaagg gggtgcgcga 180
gcgcgcgggc gagccctgcc tctccccgcc agagaacagc tgaggggccg cggtcccagc 240
gggaggattc cggtccctgg cccggccgcg gccttgggcg gagcaggggc cactagctgc 300
cacttctgcc cgccccaggt gcgcgcggag ggctacgtgg ggcgggccgc gacccggcaa 360
agtcatgttg aaaaaacact cttcacgttc gctcggcctg gtgaccaggg tcggggacca 420
cgacaaccgg gggttgggag gctgcgtaat tacaacccag ggtggtttgg attttggggg 480
gtggtggata tttaaaaaca aaaaggagat ctggaag 517
<210> 49
<211> 550
<212> DNA
<213> Homo sapiens
<400> 49
aaccacagcc cgtgcgcctc ccgcagtggg agttcgccgg ccgactccca ccctcacagc 60
ctcctgtcct ggcttcccct cgcccgaggc tgcaacaccg catccccccc atcccccgcc 120
gcgccctcag cctcgggccg caccaaccca ggggataagg cgactccggt cgctctgagg 180
ggcagggcca gccagccccc tcccacccac gcacacgctc cccctcagag ccgccggccc 240
agagaaaaac cgccacatgc agctcccttc cacacgcacc taaacagctc ctctggaccc 300
gaacgcccac accctccctc cctggggtcc caaactccac tcaggacgcc acagcggatc 360
ctaactacaa acggtccccg gagccctggg ctggactcgc tcagccccgc ccccacgccc 420
ctggtaccag ccctgagaga ccccgcggag cacgccgcgg gagccgcaga tcgcgctgaa 480
gagcagcgag atcgcgctct ggacgagacc tgcgcggctg caaccgctcc ttcttcgcgg 540
gtggaagcgc 550
<210> 50
<211> 537
<212> DNA
<213> Homo sapiens
<400> 50
gaggaactcc ggcaaagcca ggcggcggcg gggctccggg tctgggcggc ggctccggag 60
gagcagcggg agaccccgca gcggcctcct ccttctccgc ccgcggcccc cagcctcgcc 120
gccgccgccc ggctcccagc acggaaccga cggggcgctc ccgagacggg cgagccacgc 180
gctcgcaggt cccaaggcca ggctgggcgg gactgttaag ggagctcgaa gtcgggggcc 240
gggggcttcc cgtcccggcg cttcccatgc aaacccctga aggaagcggc aggcgcagcc 300
gcgggctccg cagcccaggc ccacttcctg tcactccagg aaaacctcgg agcggcggac 360
gcggctcggc ccggcttcca gcccagagcc caagcgcctt agccccgtcc cagcgctttc 420
tgaaagacgg gccacctcgc gcggagccgc gacaaggact ccagggtccg cagtgaagct 480
ggtcaaatct gccccgcaca cggtcaacgc tcggtctgtg tcccggaagc tttcgga 537
<210> 51
<211> 592
<212> DNA
<213> Homo sapiens
<400> 51
ggatttcgat gaaatggtcc ctgaagttgt gctccttctg ggactccatc cttcagctct 60
ccaatctcaa cagcctgtat tctgttggga tggggtaaga ccggtgagcg acggtcaaac 120
gtctgtccca cgtggtaagg cgggaaccgc tgctgcctgt actgggggcg atactggggg 180
cggcgcagcc gattccgggc cccagagaac tgcctatcag tggtaggggg gtcaaatcct 240
tcactgctgc cgctcccttc ttcctcctcc tcccagcgta atcccgggga gggccatggc 300
gccttccata gtagccacgt ctgtaacggc gccaatcagc gcgtaacgac tgccctccac 360
aggaactcca ccccggccag tcacactggc tgcttctgca cccttctctc cttaaaccac 420
atcaaactct acagtttctc catctcctac actgcgcaga tatttctgtg ggttattctt 480
cttgatggca gtctgatgta taaatagatc ttctttggtg tcatttcgat ttataaatcc 540
atatccattt ctgacgttga accatttgac agtgccaagg actttggtgg cg 592
<210> 52
<211> 701
<212> DNA
<213> Homo sapiens
<400> 52
tctattgtat gtacgtgttg cagtcctttc atttgccaca acatatggat tccataaatg 60
cagacatgcc gaagtgcatc tgtctgggta gttaacatga tctaaacatc cctcttcgtt 120
ccgctaactc cggctcttct tcgggctcct cggcagcgct ccgggccagc cggcccgtgc 180
cccaggcttg cagcgcccgg cagcctcgtc cttttgtggt ctctgcacgg gatccaaggt 240
gccgcgcgga ggaggcgggc tgctcgcagt gccggggtca gaggcgccgc caccggcggc 300
ctctgcgcgc gcggggagga aagggttaag ctgcccgagc ccggggaagg ggctgctctc 360
atcctggagc gaggtgcagc caccggcagc tgtgatttag gggtcaagtc cgagatcacc 420
tttctcctgc ctctggaaat ggcagaagat gagataggga gggagaaact agagagtggc 480
agccaggcgc agcacgtggg ctccatccat ccgacacccc catcgccccg gtccactccc 540
tgacccccag acaaatcgga cagttccctt ttctggtaga gatgcggggt gcgcttcttc 600
tgagcgtccg gaatcgctcc atccaaggct ctgccctaag gttaagccac tgtgccctga 660
gcctcaacca ccagatctca aaagtttgct ctcaatgcgc c 701
<210> 53
<211> 531
<212> DNA
<213> Homo sapiens
<400> 53
aggtcggggc gggcttcgtt ggaagcgggt ggcagcgcgg gggggcacgc ctcgctctct 60
gtaagccact ggagagttgg ggcgagtagg gagaaggctg ggagtaaatc aaggggaggc 120
ggcgagaccg aggacccaat tcacggccct gaataacggg ggtagctggt aaggggcagc 180
tcccgggctt gcgcccagcc tcctccctgc acccaggccc gcgagggctc cccgcgatcc 240
gcgagttccc cgcgtggcct tcctcagccc gccgaggtcg cgtcttccct ccctttcggt 300
cccgccggcc cccggccggg ccctgacgtc ctgcgccctc cccgccgctc cgcagattac 360
cagagcgagt actacgggcc cgggggcaac tacgacttct tcccgcaagg ccccccgtcc 420
tcgcaggccc agacaccagt ggacctaccc ttcgtgccgt catctgggcc gtccgggacg 480
cccctgggtg gcctggagca cccgctgccg ggccaccacc cgtcgagcga g 531
<210> 54
<211> 554
<212> DNA
<213> Homo sapiens
<400> 54
tcactggctc tacagactgc cacgggtaaa cagcttagac cagatgactc aggctgaaag 60
catatgaacc cttcctgcag ggaggccgcc cggatgcaac agtggtttca cccctgagcc 120
gggcagcctt ggcagacctt gcttcacgtg ggctgaaatg gcagtgtctc tcctctttgt 180
ggccaggttt tgcctcctct ttgactcgga agcatctcct atcctgcaag gacagtttga 240
gcagggcccc cgggccctcc ttccaagagg cttctgcagc tgtggacccc caagagttta 300
tgccgctgag ctctgctgtc tctccccacc tgctccccac ctgtctgccc ccacacctgc 360
gactctggct ctcctggatc ctcctgtaga ctggttccta taagcacaag gaggaacatg 420
cgagatgctg ggattggatg ctctgggcct ggggctggtg tttcctcatg cccgggctat 480
ttccttttgg ccctgggcat gcagtcatgt gcttcctttc atgggcgggt tggggaccag 540
ggccagcgag caga 554
<210> 55
<211> 594
<212> DNA
<213> Homo sapiens
<400> 55
ggtgcccgtc tgtgtgtgct cctcccagca gccatcgctc aaccttgctc tcaggaagcc 60
cccaggcgag tgttggcagg aatcctgcca ggcgggaggt cgctcctcca gagcgtggtc 120
cctgaagccg ccagcctccc tggcctcgcc ccttgctggt ggtgtgtgtg gtgtggccgt 180
gggtgcactt tgctgggtct tcctgggaca ctgaagtctc ctgtgtctcc agccctgaga 240
actcggagcc cgggtgcttt tgggaaggac ggggcaccag ctggtgacac atgggaaggg 300
aggtgtggtt gtcaccttgc ccaggtaacc tgctctgcct ggtcggtgcg cctaaggggg 360
gcagggtgtt tggggaggac atgagaggcc tcctggaagc acttcatcct gttgaagttc 420
acattttgac cttttcagca gcccttgctc tgggcctgtg cccggccctg ggactcggcc 480
tggagagcct attgacaccg tgccatgggt gcgggcaggg cgccctccct ggagggcggc 540
acgtggtgcc agttggtgac catgagctgc ctcactcctg aggaagagtg ttcg 594
<210> 56
<211> 506
<212> DNA
<213> Homo sapiens
<400> 56
ccgctgcggg gatttctccc ccagcctttt ctttttaaca gagggcaaag gggcgacggc 60
gagagcacag atggcggctg cggagccggg gaggcggcgg ggagacgcgc gggactcgtg 120
gggagggctg gcagggtgca ggggttccgc gtgacctgcc cggctcccag gcatcgggct 180
gggcgctgca gtttaccgat ttgctttcgt ccctcgtcca ggtttaggag acgcgtgggg 240
acagccgagc cgcgccgggc ccctggacgg cgtcgccaag gagctgggat cgcacttgct 300
gcaggtagag cggcctcgcc gggggaggag cgcagccgcc gcaggctccc ttcccacccc 360
gccaccccag cctccaggcg tcccttcccc aggagcgcca ggcagatcca gaggctgccg 420
ggggctgggg atggggtggt ccccactgcg gagggatgga cgcttagcat gtcggatgcg 480
gcctgcggcc aaccctaccc taaccc 506

Claims (10)

1. An isolated mammalian-derived nucleic acid molecule that is a methylation marker of a pancreatic cancer-associated gene, the sequence of the nucleic acid molecule comprising (1) one or more or all of the sequences selected from: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, , (2) (1) , (3) (1) (2) , </xnotran> The treatment converts unmethylated cytosine to a base that has less binding capacity to guanine than cytosine,
preferably, the nucleic acid molecule is used as an internal standard or control for detecting the level of DNA methylation of the corresponding sequence in a sample.
2. A reagent for detecting methylation of DNA, said reagent comprising a reagent for detecting the level of methylation of a DNA sequence or fragment thereof, or the methylation status or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof, in a sample from a subject, said DNA sequence being selected from one or more or all of the following gene sequences, or sequences within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, ZNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2,
preferably, the first and second electrodes are formed of a metal,
the DNA sequence is selected from one or more or all of the following sequences or the complementary sequences thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , , / </xnotran>
Said agent is a primer molecule which hybridizes to said DNA sequence or a fragment thereof and which is capable of amplifying said DNA sequence or a fragment thereof, and/or
The agent is a probe molecule that hybridizes to the DNA sequence or a fragment thereof.
3. A medium carrying a DNA sequence or a fragment thereof and/or methylation information thereof, wherein the DNA sequence is (i) one or more or all of the following gene sequences, or a sequence within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, SP, HX3, LHX1, TIMP2, SIM 750, ZNF 2, AHI (ii) (treated with a purine sequence that converts to cytosine and/or pyrimidine sequences that are less than those treated with cytosine (i) that convert them to cytosine sequences that are not capable of binding to cytosine,
preferably, the first and second electrodes are formed of a metal,
the medium is used for alignment with gene methylation sequencing data to determine the presence, amount and/or methylation level of a nucleic acid molecule comprising the sequence or fragment thereof, and/or
The DNA sequence comprises a sense strand or an antisense strand of DNA, and/or
The length of the fragment is 1-1000bp, and/or
The DNA sequence is selected from one or more or all of the following sequences or the complementary sequences thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , , </xnotran>
More preferably still, the first and second liquid crystal compositions are,
the medium is a carrier printed with the DNA sequence or its fragment and/or its methylation information, and/or
The medium is a computer-readable medium having stored thereon the sequence or a fragment thereof and/or methylation information thereof and a computer program which, when executed by a processor, performs the steps of: comparing the methylation sequencing data of the sample with the sequence or fragment thereof, thereby obtaining the presence, amount and/or methylation level of a nucleic acid molecule comprising the sequence or fragment thereof in the sample, wherein the presence, amount and/or methylation level is used for diagnosing pancreatic cancer.
4. Use of the following items (a) and/or (b) for the preparation of a kit for diagnosing pancreatic cancer in a subject,
(a) Reagents or devices for determining the methylation level of a DNA sequence or fragment thereof or the methylation state or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof in a sample of a subject,
(b) A treated nucleic acid molecule of said DNA sequence or fragment thereof, said treatment converting unmethylated cytosine to a base having a lower binding capacity for guanine than cytosine,
wherein the DNA sequence is selected from one or more or all of the following gene sequences, or sequences within 20kb of the upstream or downstream of the gene sequences: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, ZNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2,
preferably, the fragment is 1-1000bp in length.
5. Use according to claim 4, wherein the DNA sequence is selected from one or more or all of the following sequences or their complements: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , . </xnotran>
6. The use according to claim 4or 5,
said agent comprising a primer molecule hybridizing to said DNA sequence or fragment thereof, and/or
Said reagent comprising a probe molecule which hybridizes to said DNA sequence or fragment thereof, and/or
The reagent comprises the medium of claim 3.
7. Use according to claim 4or 5,
the sample is derived from a tissue, cell or body fluid of a mammal, e.g. from pancreatic tissue or blood, and/or
The sample comprises genomic DNA or cfDNA, and/or
The DNA sequence is transformed, wherein unmethylated cytosine is converted to a base having a lower binding capacity to guanine than cytosine, and/or
The DNA sequence is treated with a methylation sensitive restriction endonuclease.
8. The use of claim 4or 5, wherein said diagnosis comprises: comparing with a control sample or calculating to obtain a score, and diagnosing pancreatic cancer according to the score; preferably, the calculation is performed by constructing a support vector machine model.
9. A kit for identifying pancreatic cancer, comprising:
(a) An agent or device for determining the methylation level of a DNA sequence or fragment thereof or the methylation state or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof in a sample from a subject, and
optionally (b) a treated nucleic acid molecule of said DNA sequence or fragment thereof, said treatment converting unmethylated cytosine to a base having less ability to bind guanine than cytosine,
wherein the DNA sequence is selected from one or more (e.g., at least 7) or all of the following gene sequences, or sequences within 20kb upstream or downstream thereof: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14A, OTX, C14orf39, BNC1, ZNSP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2,
preferably, the first and second liquid crystal display panels are,
the DNA sequence is selected from one or more or all of the following sequences or the complementary sequences thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , , / </xnotran>
The kit is suitable for use according to any one of claims 6 to 8, and/or
Said agent comprising a primer molecule hybridizing to said DNA sequence or fragment thereof, and/or
Said agent comprising a probe molecule which hybridizes to said DNA sequence or a fragment thereof, and/or
Said reagent comprising the medium of claim 3, and/or
The sample is derived from a tissue, cell or body fluid of a mammal, e.g. from pancreatic tissue or blood, and/or
The DNA sequence is transformed, wherein unmethylated cytosine is converted to a base having a lower binding capacity to guanine than cytosine, and/or
The DNA sequence is treated with a methylation sensitive restriction endonuclease.
10. An apparatus for diagnosing pancreatic cancer, the apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of:
(1) Obtaining the methylation level of a DNA sequence or fragment thereof or the methylation state or level of one or more CpG dinucleotides in said DNA sequence or fragment thereof in a sample from a subject, said DNA sequence being selected from one or more or all of the following gene sequences: DMRTA2, FOXD3, TBX15, BCAN, TRIM58, SIX3, VAX2, EMX1, LBX2, TLX2, POU3F3, TBR1, EVX2, HOXD12, HOXD8, HOXD4, TOPAZ1, SHOX2, DRD5, RPL9, HOPX, SFRP2, IRX4, TBX18, OLIG3, ULBP1, HOXA13, TBX20, IKZF1, INSIG1, SOX7, EBF2, MOS, MKX, KCNA6, SYT10, AGAP2, TBX3, CCNA1, ZIC2, CLEC14 zxft 3242, C14orf39, BNC1, SP, HX3, LHX1, TIMP2, ZNF750, AHZNF 2,
(2) Comparing with a control sample, or calculating a score, and
(3) Diagnosing the pancreatic cancer based on the score,
preferably, the first and second electrodes are formed of a metal,
the DNA sequence is selected from one or more or all of the following sequences or the complementary sequences thereof: <xnotran> SEQ ID NO:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ ID NO:51, SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ ID NO:56, 70% , , / </xnotran>
Step (1) comprises detecting the methylation level of said sequence in a sample by means of a nucleic acid molecule according to claim 1 and/or a reagent according to claim 2 and/or a medium according to claim 3, and/or
The sample comprises genomic DNA or cfDNA, and/or
The sequence is transformed, wherein unmethylated cytosine is converted to a base having a lower binding capacity to guanine than cytosine, and/or
The DNA sequence is treated with a methylation-sensitive restriction endonuclease, and/or
And (3) calculating the score in the step (2) by constructing a support vector machine model.
CN202110679281.8A 2021-06-18 2021-06-18 Pancreatic cancer diagnosis related DNA methylation marker and application thereof Pending CN115491421A (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
CN202110679281.8A CN115491421A (en) 2021-06-18 2021-06-18 Pancreatic cancer diagnosis related DNA methylation marker and application thereof
PCT/CN2022/099311 WO2022262831A1 (en) 2021-06-18 2022-06-17 Substance and method for tumor assessment
KR1020247001904A KR20240021975A (en) 2021-06-18 2022-06-17 Materials and Methods for Tumor Evaluation
CA3222729A CA3222729A1 (en) 2021-06-18 2022-06-17 Substance and method for tumor assessment
US18/571,373 US20240141442A1 (en) 2021-06-18 2022-06-17 Substance and method for tumor assessment
CN202280042761.6A CN117500942A (en) 2021-06-18 2022-06-17 Substances and methods for assessing tumors
AU2022292704A AU2022292704A1 (en) 2021-06-18 2022-06-17 Substance and method for tumor assessment
EP22824304.4A EP4372103A1 (en) 2021-06-18 2022-06-17 Substance and method for tumor assessment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110679281.8A CN115491421A (en) 2021-06-18 2021-06-18 Pancreatic cancer diagnosis related DNA methylation marker and application thereof

Publications (1)

Publication Number Publication Date
CN115491421A true CN115491421A (en) 2022-12-20

Family

ID=84465293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110679281.8A Pending CN115491421A (en) 2021-06-18 2021-06-18 Pancreatic cancer diagnosis related DNA methylation marker and application thereof

Country Status (1)

Country Link
CN (1) CN115491421A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117344014A (en) * 2023-07-19 2024-01-05 上海交通大学医学院附属瑞金医院 Pancreatic cancer early diagnosis kit, method and device thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117344014A (en) * 2023-07-19 2024-01-05 上海交通大学医学院附属瑞金医院 Pancreatic cancer early diagnosis kit, method and device thereof

Similar Documents

Publication Publication Date Title
US20210095341A1 (en) Multiplex 5mc marker barcode counting for methylation detection in cell free dna
CN110982907B (en) Thyroid nodule-related rDNA methylation marker and application thereof
CN109182517B (en) Gene for molecular typing of medulloblastoma and application thereof
WO2014163445A1 (en) Method for manufacturing gastric cancer prognosis prediction model
CN113308544B (en) Reagent for DNA methylation detection and esophageal cancer detection kit
CN114317736B (en) Methylation marker combination for pan-cancer species detection and application thereof
WO2019012543A1 (en) Dna targets as tissue-specific methylation markers
CN107630093B (en) Reagent, kit, detection method and application for diagnosing liver cancer
CN111788317B (en) Compositions and methods for characterizing cancer
WO2019012542A1 (en) Detecting tissue-specific dna
KR20240021975A (en) Materials and Methods for Tumor Evaluation
US11535897B2 (en) Composite epigenetic biomarkers for accurate screening, diagnosis and prognosis of colorectal cancer
CN115491411A (en) Methylation marker for identifying pancreatitis and pancreatic cancer and application thereof
US20190345489A1 (en) Reagent for use in assessment of remaining very small lesion of neuroblastoma; and method for analyzing biological sample using same
CN115491421A (en) Pancreatic cancer diagnosis related DNA methylation marker and application thereof
US11542559B2 (en) Methylation-based biomarkers in breast cancer screening, diagnosis, or prognosis
CN114787385A (en) Methods and systems for detecting nucleic acid modifications
US11505829B2 (en) Methods, treatment, and compositions for characterizing thyroid nodule
CN110317875B (en) Methylation gene related to lung cancer and detection kit thereof
CN115985486A (en) Pancreatic cancer diagnosis method based on machine learning
US20130310550A1 (en) Primers for analyzing methylated sequences and methods of use thereof
CN116804218A (en) Methylation marker for detecting benign and malignant lung nodules and application thereof
CN113493834A (en) Method and kit for screening large intestine tumor by detecting methylation state of PKNOX2 gene region
CN117821585A (en) Colorectal cancer early diagnosis marker and application
TW202330938A (en) Substance and method for evaluating tumor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: No. 6, Yikang Road, High-tech Industrial Development Zone, Yangzhou City, Jiangsu Province, 225012

Applicant after: Jiangsu Huayuan Biotechnology Co.,Ltd.

Address before: 201318 6th floor, building 1, Lane 500, Furonghua Road, Pudong New Area, Shanghai

Applicant before: Shanghai Fuyuan Biotechnology Co.,Ltd.