CN117385027A - Lung cancer specific methylation marker and application thereof in diagnosis of lung cancer - Google Patents
Lung cancer specific methylation marker and application thereof in diagnosis of lung cancer Download PDFInfo
- Publication number
- CN117385027A CN117385027A CN202210787412.9A CN202210787412A CN117385027A CN 117385027 A CN117385027 A CN 117385027A CN 202210787412 A CN202210787412 A CN 202210787412A CN 117385027 A CN117385027 A CN 117385027A
- Authority
- CN
- China
- Prior art keywords
- gene
- methylation
- lung cancer
- cancer
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011987 methylation Effects 0.000 title claims abstract description 284
- 238000007069 methylation reaction Methods 0.000 title claims abstract description 284
- 208000020816 lung neoplasm Diseases 0.000 title claims abstract description 202
- 206010058467 Lung neoplasm malignant Diseases 0.000 title claims abstract description 200
- 201000005202 lung cancer Diseases 0.000 title claims abstract description 200
- 239000003550 marker Substances 0.000 title claims abstract description 82
- 238000003745 diagnosis Methods 0.000 title claims description 13
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 86
- 201000011510 cancer Diseases 0.000 claims abstract description 79
- 238000000034 method Methods 0.000 claims abstract description 66
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 34
- 238000012216 screening Methods 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 13
- 101000731726 Homo sapiens Rho guanine nucleotide exchange factor 16 Proteins 0.000 claims abstract description 10
- 238000002360 preparation method Methods 0.000 claims abstract description 4
- 108090000623 proteins and genes Proteins 0.000 claims description 115
- 239000000523 sample Substances 0.000 claims description 84
- 210000001519 tissue Anatomy 0.000 claims description 83
- 238000011144 upstream manufacturing Methods 0.000 claims description 68
- 238000012549 training Methods 0.000 claims description 52
- 238000012360 testing method Methods 0.000 claims description 46
- 108020004414 DNA Proteins 0.000 claims description 41
- 150000007523 nucleic acids Chemical class 0.000 claims description 24
- 102000039446 nucleic acids Human genes 0.000 claims description 22
- 108020004707 nucleic acids Proteins 0.000 claims description 22
- 125000003729 nucleotide group Chemical group 0.000 claims description 22
- 238000007477 logistic regression Methods 0.000 claims description 19
- 239000002773 nucleotide Substances 0.000 claims description 19
- 230000000295 complement effect Effects 0.000 claims description 18
- 238000012164 methylation sequencing Methods 0.000 claims description 16
- 238000010801 machine learning Methods 0.000 claims description 14
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 claims description 13
- 210000004027 cell Anatomy 0.000 claims description 13
- 108091008146 restriction endonucleases Proteins 0.000 claims description 13
- 208000005718 Stomach Neoplasms Diseases 0.000 claims description 12
- 206010017758 gastric cancer Diseases 0.000 claims description 12
- 201000007270 liver cancer Diseases 0.000 claims description 12
- 201000011549 stomach cancer Diseases 0.000 claims description 12
- 206010006187 Breast cancer Diseases 0.000 claims description 11
- 208000026310 Breast neoplasm Diseases 0.000 claims description 11
- 206010009944 Colon cancer Diseases 0.000 claims description 11
- 208000000461 Esophageal Neoplasms Diseases 0.000 claims description 11
- 206010030155 Oesophageal carcinoma Diseases 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 11
- 210000000349 chromosome Anatomy 0.000 claims description 11
- 201000004101 esophageal cancer Diseases 0.000 claims description 11
- 208000014018 liver neoplasm Diseases 0.000 claims description 11
- 208000001333 Colorectal Neoplasms Diseases 0.000 claims description 10
- 101000720962 Homo sapiens 5-oxoprolinase Proteins 0.000 claims description 10
- 101001044807 Homo sapiens Diacylglycerol kinase gamma Proteins 0.000 claims description 10
- 101000864600 Homo sapiens Diacylglycerol kinase iota Proteins 0.000 claims description 10
- 101000795365 Homo sapiens E3 ubiquitin-protein ligase TRIM58 Proteins 0.000 claims description 10
- 101000847062 Homo sapiens Exportin-4 Proteins 0.000 claims description 10
- 101001067880 Homo sapiens Histone H4 Proteins 0.000 claims description 10
- 101000852596 Homo sapiens Inositol-trisphosphate 3-kinase A Proteins 0.000 claims description 10
- 101000736368 Homo sapiens PH and SEC7 domain-containing protein 4 Proteins 0.000 claims description 10
- 101000652707 Homo sapiens Transcription initiation factor TFIID subunit 4 Proteins 0.000 claims description 10
- 101000854951 Homo sapiens Wings apart-like protein homolog Proteins 0.000 claims description 10
- 101000911019 Homo sapiens Zinc finger protein castor homolog 1 Proteins 0.000 claims description 10
- 206010061902 Pancreatic neoplasm Diseases 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 10
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 claims description 10
- 201000002528 pancreatic cancer Diseases 0.000 claims description 10
- 208000008443 pancreatic carcinoma Diseases 0.000 claims description 10
- 108010044191 Dynamin II Proteins 0.000 claims description 9
- 101000896692 Homo sapiens BTB/POZ domain-containing protein 16 Proteins 0.000 claims description 9
- 101000729811 Homo sapiens Beta-1,4 N-acetylgalactosaminyltransferase 1 Proteins 0.000 claims description 9
- 101000860854 Homo sapiens COUP transcription factor 1 Proteins 0.000 claims description 9
- 101001049849 Homo sapiens Calcium-activated potassium channel subunit beta-1 Proteins 0.000 claims description 9
- 101000859570 Homo sapiens Carnitine O-palmitoyltransferase 1, liver isoform Proteins 0.000 claims description 9
- 101000914166 Homo sapiens Cilia- and flagella-associated protein 46 Proteins 0.000 claims description 9
- 101001053490 Homo sapiens Dihydropyrimidinase-related protein 4 Proteins 0.000 claims description 9
- 101000832767 Homo sapiens Disintegrin and metalloproteinase domain-containing protein 8 Proteins 0.000 claims description 9
- 101001052714 Homo sapiens Fibrosin-1-like protein Proteins 0.000 claims description 9
- 101000818310 Homo sapiens Forkhead box protein C1 Proteins 0.000 claims description 9
- 101000862581 Homo sapiens GTP cyclohydrolase 1 Proteins 0.000 claims description 9
- 101001041136 Homo sapiens Homeobox protein Hox-D4 Proteins 0.000 claims description 9
- 101001017833 Homo sapiens Leucine-rich repeat-containing protein 4 Proteins 0.000 claims description 9
- 101001055097 Homo sapiens Mitogen-activated protein kinase kinase kinase 6 Proteins 0.000 claims description 9
- 101001125322 Homo sapiens Na(+)/H(+) exchange regulatory cofactor NHE-RF2 Proteins 0.000 claims description 9
- 101000591385 Homo sapiens Neurotensin receptor type 1 Proteins 0.000 claims description 9
- 101000988407 Homo sapiens PDZ and LIM domain protein 2 Proteins 0.000 claims description 9
- 101000601997 Homo sapiens Protocadherin gamma-C5 Proteins 0.000 claims description 9
- 101000648676 Homo sapiens Putative protein encoded by LINC02912 Proteins 0.000 claims description 9
- 101001061893 Homo sapiens RAS protein activator like-3 Proteins 0.000 claims description 9
- 101000733264 Homo sapiens Rho guanine nucleotide exchange factor 33 Proteins 0.000 claims description 9
- 101000709025 Homo sapiens Rho-related BTB domain-containing protein 2 Proteins 0.000 claims description 9
- 101000618139 Homo sapiens Sperm-associated antigen 6 Proteins 0.000 claims description 9
- 101000642528 Homo sapiens Transcription factor SOX-8 Proteins 0.000 claims description 9
- 101000626594 Homo sapiens Transmembrane protein 179 Proteins 0.000 claims description 9
- 108010018525 NFATC Transcription Factors Proteins 0.000 claims description 9
- 108091006628 SLC12A8 Proteins 0.000 claims description 9
- 108091006285 SLC17A9 Proteins 0.000 claims description 9
- 108010048349 Steroidogenic Factor 1 Proteins 0.000 claims description 9
- 108090001097 Transcription Factor DP1 Proteins 0.000 claims description 9
- 238000013188 needle biopsy Methods 0.000 claims description 9
- 238000004949 mass spectrometry Methods 0.000 claims description 8
- 238000002844 melting Methods 0.000 claims description 8
- 230000008018 melting Effects 0.000 claims description 8
- 238000001712 DNA sequencing Methods 0.000 claims description 7
- 108091034117 Oligonucleotide Proteins 0.000 claims description 7
- 238000009585 enzyme analysis Methods 0.000 claims description 7
- 239000000872 buffer Substances 0.000 claims description 6
- 238000011002 quantification Methods 0.000 claims description 5
- 102000002260 Alkaline Phosphatase Human genes 0.000 claims description 3
- 108020004774 Alkaline Phosphatase Proteins 0.000 claims description 3
- 108060002716 Exonuclease Proteins 0.000 claims description 3
- 238000003776 cleavage reaction Methods 0.000 claims description 3
- 102000013165 exonuclease Human genes 0.000 claims description 3
- 239000007850 fluorescent dye Substances 0.000 claims description 3
- 230000007017 scission Effects 0.000 claims description 3
- 241000894007 species Species 0.000 description 42
- 210000002381 plasma Anatomy 0.000 description 19
- 238000012163 sequencing technique Methods 0.000 description 14
- 238000003752 polymerase chain reaction Methods 0.000 description 13
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical compound NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 11
- 230000035945 sensitivity Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 230000007067 DNA methylation Effects 0.000 description 8
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 8
- 238000006467 substitution reaction Methods 0.000 description 8
- 102000054766 genetic haplotypes Human genes 0.000 description 7
- 238000003860 storage Methods 0.000 description 7
- 108091028043 Nucleic acid sequence Proteins 0.000 description 6
- 238000005259 measurement Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000035772 mutation Effects 0.000 description 5
- 238000007481 next generation sequencing Methods 0.000 description 5
- 208000002154 non-small cell lung carcinoma Diseases 0.000 description 5
- 208000029729 tumor suppressor gene on chromosome 11 Diseases 0.000 description 5
- 108091029430 CpG site Proteins 0.000 description 4
- 241000677647 Proba Species 0.000 description 4
- 239000012472 biological sample Substances 0.000 description 4
- 238000001369 bisulfite sequencing Methods 0.000 description 4
- 229940104302 cytosine Drugs 0.000 description 4
- 238000012217 deletion Methods 0.000 description 4
- 230000037430 deletion Effects 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000004083 survival effect Effects 0.000 description 4
- 108091029523 CpG island Proteins 0.000 description 3
- 206010027476 Metastases Diseases 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 239000000835 fiber Substances 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000009401 metastasis Effects 0.000 description 3
- 102000040430 polynucleotide Human genes 0.000 description 3
- 108091033319 polynucleotide Proteins 0.000 description 3
- 239000002157 polynucleotide Substances 0.000 description 3
- 239000002213 purine nucleotide Substances 0.000 description 3
- 150000003212 purines Chemical class 0.000 description 3
- 239000002719 pyrimidine nucleotide Substances 0.000 description 3
- 150000003230 pyrimidines Chemical class 0.000 description 3
- 238000003753 real-time PCR Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000012070 whole genome sequencing analysis Methods 0.000 description 3
- 229930024421 Adenine Natural products 0.000 description 2
- 108010077544 Chromatin Proteins 0.000 description 2
- 241000124008 Mammalia Species 0.000 description 2
- 229960000643 adenine Drugs 0.000 description 2
- GFFGJBXGBJISGV-UHFFFAOYSA-N adenyl group Chemical group N1=CN=C2N=CNC2=C1N GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 238000002869 basic local alignment search tool Methods 0.000 description 2
- 239000011324 bead Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 210000001124 body fluid Anatomy 0.000 description 2
- 239000010839 body fluid Substances 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 229910052799 carbon Inorganic materials 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 210000003483 chromatin Anatomy 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 239000013068 control sample Substances 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000001973 epigenetic effect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000012530 fluid Substances 0.000 description 2
- 230000012010 growth Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000011528 liquid biopsy Methods 0.000 description 2
- 125000002496 methyl group Chemical group [H]C([H])([H])* 0.000 description 2
- 238000007855 methylation-specific PCR Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000006116 polymerization reaction Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004393 prognosis Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- RWQNBRDOKXIBIV-UHFFFAOYSA-N thymine Chemical compound CC1=CNC(=O)NC1=O RWQNBRDOKXIBIV-UHFFFAOYSA-N 0.000 description 2
- 210000004881 tumor cell Anatomy 0.000 description 2
- 239000000439 tumor marker Substances 0.000 description 2
- 206010003445 Ascites Diseases 0.000 description 1
- 206010005003 Bladder cancer Diseases 0.000 description 1
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 108010022366 Carcinoembryonic Antigen Proteins 0.000 description 1
- 102100025475 Carcinoembryonic antigen-related cell adhesion molecule 5 Human genes 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 208000009458 Carcinoma in Situ Diseases 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 101710116123 Disintegrin and metalloproteinase domain-containing protein 8 Proteins 0.000 description 1
- 101001061898 Homo sapiens RasGAP-activating-like protein 1 Proteins 0.000 description 1
- 208000005016 Intestinal Neoplasms Diseases 0.000 description 1
- 108010076876 Keratins Proteins 0.000 description 1
- 102000011782 Keratins Human genes 0.000 description 1
- 102000003960 Ligases Human genes 0.000 description 1
- 108090000364 Ligases Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 108060004795 Methyltransferase Proteins 0.000 description 1
- 102000016397 Methyltransferase Human genes 0.000 description 1
- 101100178955 Mus musculus Hoxd4 gene Proteins 0.000 description 1
- 101100043067 Mus musculus Sox8 gene Proteins 0.000 description 1
- 108091008640 NR2F Proteins 0.000 description 1
- 108091008637 NR5A Proteins 0.000 description 1
- 101100228519 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) gch-1 gene Proteins 0.000 description 1
- 101150067565 Nfatc1 gene Proteins 0.000 description 1
- 101150108752 Ntsr1 gene Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102000043276 Oncogene Human genes 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 235000014443 Pyrus communis Nutrition 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 108091006163 SLC12 Proteins 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 102000013529 alpha-Fetoproteins Human genes 0.000 description 1
- 108010026331 alpha-Fetoproteins Proteins 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 210000003567 ascitic fluid Anatomy 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 210000000601 blood cell Anatomy 0.000 description 1
- 238000004113 cell culture Methods 0.000 description 1
- 230000011712 cell development Effects 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000306 component Substances 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 230000023753 dehiscence Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 230000019975 dosage compensation by inactivation of X chromosome Effects 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 230000004049 epigenetic modification Effects 0.000 description 1
- 210000003527 eukaryotic cell Anatomy 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000005558 fluorometry Methods 0.000 description 1
- 238000007672 fourth generation sequencing Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 230000006607 hypermethylation Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 201000002313 intestinal cancer Diseases 0.000 description 1
- 239000006210 lotion Substances 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000001821 nucleic acid purification Methods 0.000 description 1
- 210000004910 pleural fluid Anatomy 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 229940113082 thymine Drugs 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 230000005748 tumor development Effects 0.000 description 1
- 230000005751 tumor progression Effects 0.000 description 1
- 201000005112 urinary bladder cancer Diseases 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/154—Methylation markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Organic Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Analytical Chemistry (AREA)
- Evolutionary Biology (AREA)
- Immunology (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Pathology (AREA)
- Microbiology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- General Engineering & Computer Science (AREA)
- Hospice & Palliative Care (AREA)
- Biochemistry (AREA)
- Oncology (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Provides a lung cancer specific methylation marker and application thereof in diagnosing lung cancer. The present invention relates to the use of a reagent or module for the preparation of a kit or device for distinguishing lung cancer patients from non-lung cancer patients or for tissue traceability of lung cancer during a pan-cancer screening procedure, wherein the reagent or module comprises a reagent or module for detecting methylation levels of a lung cancer tissue specific methylation marker such as the gene ARHGEF16, such as SEQ ID NOs: 1-48. The method is used for tracing the tissues of the lung cancer in the early stage screening process of the pan-cancer seeds, and achieves the aim of better distinguishing the lung cancer.
Description
Technical Field
The invention belongs to the field of molecular auxiliary diagnosis, and particularly relates to a lung cancer tissue specific methylation marker and application thereof in diagnosing lung cancer.
Background
Lung cancer is the cancer responsible for the highest mortality worldwide. Although the combined use of surgery, chemotherapy, targeting, and immunotherapy significantly improves the survival rate of lung cancer, the prognosis of lung cancer patients is still relatively poor compared to other cancers. The main reason is that most lung cancer is diagnosed in the late stage, which is associated with the lack of widespread early screening of lung cancer.
The early-stage related signals of the cancer high-risk group are detected for cancer screening, so that early-stage cancer patients can be found in time, the early-stage cancer patients can be completely cured through surgical excision, and the death rate of the cancer patients can be greatly reduced through cancer screening. About 85% of lung cancers are non-small cell lung cancers (NSCLC), the five-year survival rate of early stage in-situ cancer patients is up to 55.6%, metastasis easily occurs in middle and late stages, and the five-year survival rate of patients after metastasis is only 4.5%. Early stage NSCLC patients were asymptomatic, and more than 80% of NSCLC patients were diagnosed as having been in the middle and late stages of cancer, with lymph node spread or distant metastasis, with lower survival (Weichert W et al, 2014). From 1990 to 2015, the overall cancer mortality in the united states was reduced by 25%, with a reduction in amplitude of up to 45% in men with lung cancer. A part of the reason why reduction in cancer mortality is important is the widespread use of cancer screening techniques (Byers T et al 2016).
Traditional cancer screening methods include endoscope, imaging detection (CT, MRI, etc.), tumor markers (such as alpha fetoprotein for clinically assisting in diagnosing primary liver cancer, carcinoembryonic antigen which is a broad-spectrum tumor marker, and cytokeratin 19Cyfra21-1 which is a tumor marker for detecting lung cancer), etc., but the traditional methods have certain limitations. For example, the most widely used early screening for lung cancer in clinical practice is Low Dose CT (LDCT). Although LDCT can detect early stage NSCLC patients to a certain extent, its specificity is low, and diagnosis of positive patients requires long follow-up, continuous review or other diagnosis means to make a diagnosis, which can significantly increase patient pain, and medical resource waste due to excessive diagnosis. The existing tumor markers are generally poor in performance, can only be used as clinical references, and are difficult to screen and apply on a large scale.
In recent years, the liquid biopsy for researching fire heat is based on free DNA (ctDNA) released by tumor cells into blood plasma, and compared with the traditional method, the liquid biopsy has the advantages of convenience in sampling, non-invasiveness, capability of realizing early screening of the pan-cancer seeds, capability of overcoming tumor heterogeneity and the like, and is widely applied. ctDNA can reflect cancer information from various aspects such as mutation, fragmentation length distribution, methylation, etc., wherein ctDNA methylation has become a hotspot for research and development of early-stage cancer screening products with superior properties, and there have been numerous applications of early-stage ctDNA methylation screening, such as PanSeer for pan-cancer species methylation screening, which can reach 88% sensitivity in 5 cancer species (gastric cancer, esophageal cancer, liver cancer, colorectal cancer, lung cancer) at 96% specificity, and can be 4 years earlier than traditional methods (Xingdong Chen et al 2020).
Cancer screening, especially early screening of pan-cancerous species, requires not only prediction of the presence or absence of cancer signals, but also tissue tracing of positive samples, whereas cancerous species in different positions of the human body have different methylation characteristics (Kundaje a et al 2015), with which tissue tracing can be achieved. However, the discovery of tissue-specific methylation markers requires extensive methylation sequencing data and stringent screening validation procedures for multiple cancer species is a challenging task. There is a need in the art for tissue-specific methylation markers for lung cancer.
Disclosure of Invention
In view of the current lack of tissue-specific methylation markers for lung cancer in the art, the present inventors screened a large number of Next Generation Sequencing (NGS) cfDNA methylation-targeted sequencing data for 7 cancer species (lung cancer, liver cancer, lung cancer, stomach cancer, esophageal cancer, pancreatic cancer, breast cancer). The inventor uses the methylation marker obtained by screening to construct and verify a machine learning model, and is used for tracing the tissue of lung cancer in the early stage screening process of the pan-cancer species so as to achieve the aim of better distinguishing the lung cancer.
In one aspect, the invention provides the use of a reagent or component in the preparation of a kit or device for (1) distinguishing between a lung cancer patient and a non-lung cancer patient, (2) diagnosing or aiding in the diagnosis of lung cancer; or (3) tissue traceability to lung cancer during a pan-cancer screening procedure, wherein the reagent or module comprises a reagent or module that detects the methylation level of a lung cancer tissue-specific methylation marker in the genomic DNA of the sample, said methylation marker being the region or locus thereof that is the gene that is 2.2kb upstream and 2.2kb downstream in the chromosome in which it is located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated. In one embodiment, the length of the site is 120bp to 500bp, preferably 200bp to 480bp.
In one embodiment, the cancer or carcinoma other than lung cancer includes colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer, and/or breast cancer.
In one embodiment, the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS: 1-48.
In one embodiment, the reagent or component comprises a reagent or component used in one or more of the following methods of detecting methylation: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescence quantification, methylation-sensitive high resolution melting curve, and chip-based methylation profile analysis and mass spectrometry.
In one embodiment, the reagent or assembly comprises primers and/or probes for detecting a methylation marker, and/or the sample is a cell, tissue, fine needle biopsy and/or plasma, preferably the sample genomic DNA is free DNA in plasma.
In another aspect, the invention provides a method of constructing a predictive model for distinguishing lung cancer from other non-lung cancer cancers, comprising:
(1) Obtaining methylation levels of methylation markers in genomic DNA of lung cancer samples and non-lung cancer samples as a training set; the methylation marker is selected from the following regions or the sites of the regions, the regions being the following genes and the 2.2kb upstream region and the 2.2kb downstream region of the chromosome in which the genes are located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated; and
(2) A logistic regression machine learning model was constructed using methylation level data of methylation markers.
In one embodiment, the length of the site is 120bp to 500bp, preferably 200bp to 480bp. In one embodiment, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer, and/or breast cancer.
In one embodiment, the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS: 1-48.
In one embodiment, the sample is a cell, tissue, fine needle biopsy, or plasma. In one embodiment, the genomic DNA is free DNA in plasma.
In one embodiment, step (1) comprises obtaining methylation sequencing data of the sample DNA.
In one embodiment, step (2) includes building a logistic regression model to obtain model predictive scores; and training using the methylation level of the obtained methylation marker as a training set, and determining a correlation threshold of the model according to a sample of the training set. For example, a logistic regression model in the sklearn (V1.0.1) package in python (V3.9.7) may be used: the formula of the model is as follows, wherein x is the methylation level value of the methylation marker in the sample, w is the coefficient of the methylation marker, b is the intercept value, and y is the model predictive value
The methylation level of the obtained methylation markers can be used as a training set for training: all model (Traintata, traintheno), wherein Traintata is the data of the training set, traintheno is the property of the training set sample, wherein lung cancer is 1, and other cancer species are 0. The correlation threshold of the model may be determined from samples of the training set.
In another aspect, a predictive model of lung cancer constructed according to the methods of the invention is provided.
In another aspect, there is provided an apparatus for diagnosing lung cancer comprising a memory and a processor for processing instructions stored by the memory, the instructions performing a method according to the present invention to construct a predictive model of lung cancer; and the methylation level of the methylation marker in the genome DNA of the sample to be detected is used as a test set to obtain a model predictive value, whether the sample is lung cancer is judged according to a threshold value by using the predictive value, and lung cancer is predicted to be larger than the threshold value, otherwise, other cancer species are predicted to be. The methylation level of a methylation marker in genomic DNA of a sample to be tested can be used as a test set: testpred=allrodel. Prediction_ proba (TestData) [: 1], where TestData is test set data and TestPred is model predictive score.
In another aspect, a kit or device for detecting lung cancer tissue-specific methylation markers is provided comprising reagents or components for detecting the status and/or level of one or more lung cancer tissue-specific methylation markers in genomic DNA from a sample, the lung cancer tissue-specific methylation markers being the following regions or sites thereof, the regions being the following genes and the 2.2kb upstream region and the 2.2kb downstream region of the chromosome in which the genes are located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated. In one embodiment, the length of the site is 120bp to 500bp, preferably 200bp to 480bp.
In one embodiment, the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS: 1-48.
In one embodiment, the sample is a cell, tissue, fine needle biopsy, or plasma. In one embodiment, the nucleic acid is free DNA in plasma.
In one embodiment, the reagent or component comprises a reagent or component used in one or more of the following methods: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescence quantification, methylation-sensitive high resolution melting curve, and chip-based methylation profile analysis and mass spectrometry.
In one embodiment, the reagent comprises an oligonucleotide for detecting a methylation marker. In one embodiment, the oligonucleotide is a primer and/or probe;
in one embodiment, the primer is a primer that detects the methylation level/state of a site using methylation sequencing or a PCR primer for amplifying one or more methylation sites.
In one embodiment, the reagent comprises bisulfite and derivatives thereof, PCR buffers, polymerase, dntps, primers, probes, methylation sensitive or insensitive restriction enzymes, cleavage buffers, fluorescent dyes, fluorescence quenchers, fluorescent reporters, exonucleases, alkaline phosphatase, internal standards, and/or controls that are the aforementioned specific methylation markers from normal subjects or cancer patients other than lung cancer. In one embodiment, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer, and/or breast cancer.
The present invention provides isolated nucleic acids that are one or more specific methylation markers. In one embodiment, the isolated nucleic acid is a lung cancer tissue-specific methylation marker. In one embodiment, the lung cancer tissue-specific methylation marker is the following region or a site thereof, which is the following gene and a 2.2kb upstream region and a 2.2kb downstream region of the chromosome in which the gene is located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated. In one embodiment, the length of the site is 120bp to 500bp, preferably 200bp to 480bp. In one embodiment, the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS: 1-48. In one embodiment, the isolated nucleic acid is isolated from a sample. In one embodiment, the sample is a cell, tissue, fine needle biopsy, or plasma. In one embodiment, the isolated nucleic acid is obtained from a lung cancer patient. For example, the isolated nucleic acid is obtained from free DNA in plasma.
In embodiments of aspects of the invention, the variant comprises a sequence having at least 70% identity to the sequence of either gene. For example, a variant comprises a sequence that is at least 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% identical to the sequence of any one gene.
In embodiments of aspects of the invention, the region is the gene and the 2.2kb upstream region and the 2.2kb downstream region of the chromosome in which the gene is located. In one embodiment, the upstream region is a 2.1kb, 2kb, 1.9kb, 1.8kb, 1.7kb, 1.6kb, 1.5kb, 1.4kb, 1.3kb, 1.2kb, 1.1kb, 1kb, 900bp, 800bp, 700bp, 600bp, 500bp, 400bp, 300bp, 200bp, 100bp, 90bp, 80bp, 70bp, 60bp, 50bp, 40bp, 30bp, 20bp, 10bp or 5bp upstream region upstream of the gene. The downstream region is a 2.1kb, 2kb, 1.9kb, 1.8kb, 1.7kb, 1.6kb, 1.5kb, 1.4kb, 1.3kb, 1.2kb, 1.1kb, 1kb, 900bp, 800bp, 700bp, 600bp, 500bp, 400bp, 300bp, 200bp, 100bp, 90bp, 80bp, 70bp, 60bp, 50bp, 40bp, 30bp, 20bp, 10bp or 5bp downstream region downstream of the gene.
In embodiments of aspects of the invention, the length of the sites may vary. In one embodiment, the length of the site may be 120bp to 500bp, preferably 200bp to 480bp. In one embodiment, the length of the site may be 130bp, 140bp, 150bp, 160bp, 170bp, 180bp, 190bp, 200bp, 210bp, 220bp, 230bp, 240bp, 250bp, 260bp, 270bp, 280bp, 290bp, 300bp, 310bp, 320bp, 330bp, 340bp, 350bp, 360bp, 370bp, 380bp, 390bp, 400bp, 410bp, 420bp, 430bp, 440bp, 450bp, 460bp, 470bp, 480bp, 490bp or 500bp.
In embodiments of aspects of the invention, a variant is a variant sequence having at least 70%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identity to a nucleotide sequence set forth in any one or more of the above.
In one aspect, the invention provides a method of (1) distinguishing between a lung cancer patient and a non-lung cancer patient, (2) for diagnosing or aiding in diagnosing lung cancer; or (3) tissue traceability of lung cancer during a pan-cancer screening procedure, comprising determining the methylation level of one or more methylation markers described herein in the genomic DNA of the sample. In one embodiment, the method is performed using the lung cancer prediction model of the present invention.
Advantages of the invention include:
1. the invention provides a novel lung cancer tissue-specific methylation marker, which can be used for tracing the lung cancer tissue in the early stage screening process of the pan-cancer species, so as to achieve the aim of better distinguishing the lung cancer;
2. based on free DNA (ctDNA) released by tumor cells into plasma, the method is a non-invasive method and can realize early screening of lung cancer;
3. the lung cancer tissue-specific methylation marker can detect lung cancer with high sensitivity and specificity.
Drawings
Fig. 1: the selected lung cancer tissue-specific methylation markers are methylated at a level in the training set.
Fig. 2: the selected lung cancer tissue-specific methylation markers are methylated at the level of the test set.
Fig. 3: methylation level of lung cancer tissue specific methylation marker Seq ID No. 1 in each cancer species of the training set.
Fig. 4: methylation level of lung cancer tissue specific methylation marker Seq ID No. 1 in each cancer species of the test set.
Fig. 5: all lung cancer tissue-specific methylation markers are distributed in training and test sets with lung cancer and other cancer species model scores.
Fig. 6: ROC curves for all lung cancer tissue-specific methylation markers in training and test sets.
Fig. 7: score for lung cancer tissue specific methylation marker combination 1 model.
Fig. 8: ROC curve for lung cancer tissue specific methylation marker combination model 1.
Fig. 9: lung cancer tissue specific methylation markers combined 2 model scores.
Fig. 10: lung cancer tissue specific methylation markers combined 2 model ROC curve.
Detailed Description
The inventor screens the methylation markers specific to lung cancer tissues from a large number of NGS methylation sequencing data of 7 cancer species, can achieve a good tissue tracing effect in related verification data, and provides important technical support for tissue tracing of lung cancer in the early screening process of flood cancer species.
Machine learning modeling is a process of finding the most appropriate representation for an input data feature, enabling it to solve specific problems, such as classification problems. The modeled data has better discrimination than each of the individual data features entered. The best model and the classification effect of each marker in the model are presented herein, and the discrimination effect of selecting any combination of features for modeling is between the best model and a single feature. As shown herein, each individual marker has a distinguishing effect, and the results of randomly selecting markers for classification are also shown in the examples of this patent. Thus, the present application protects one or a combination of all markers and the model they construct.
The inventors found that lung cancer is associated with the methylation level of the following gene regions or regions upstream and downstream thereof: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9.
DNA methylation is a mechanism of epigenetic inheritance, which is a common epigenetic modification of the genome of eukaryotic cells that can alter genetic manifestations without altering DNA sequences. By DNA methylation is meant the covalent attachment of a methyl group at the cytosine carbon number 5 of a genomic CpG dinucleotide under the action of a DNA methyltransferase. DNA methylation plays an important role in cell proliferation, differentiation, development and the like, has close relation with the occurrence and development of tumors, and has the effects of transcriptional inhibition, chromatin structure regulation, X chromosome inactivation, genome imprinting and the like. Abnormal DNA methylation can be involved in tumor development and progression by affecting chromatin structure and expression of oncogenes and tumor suppressor genes.
As used herein, "primer" refers to a nucleic acid molecule of a particular nucleotide sequence that is synthesized by directing the synthesis at the initiation of nucleotide polymerization. Primers are typically two oligonucleotide sequences that are synthesized, one complementary to one strand of the DNA template at one end of the target region and the other complementary to the other strand of the DNA template at the other end of the target region, and function as a starting point for nucleotide polymerization. Primers designed artificially in vitro are widely used in Polymerase Chain Reaction (PCR), qPCR, sequencing, probe synthesis, etc. Typically, the primers are designed to amplify a product of 50-150bp, 60-140, 70-130, 80-120bp in length. The primers contained in the reagents herein may be genome sequencing primers, such as whole genome sequencing primers or sequencing primers directed to a region of the genome, or PCR primers for amplifying a specific region or PCR primers for amplifying one or more methylation sites in a region. The primer may be a whole genome sequencing primer, which may yield a number of amplification products, which may contain the region or the region after splicing. Based on the whole genome sequencing results, the methylation status of each methylation site (CpG) in the region is obtained after sequencing, thereby obtaining the methylation level of the entire region. The primer is complementary or substantially complementary to the gene or region of interest.
As used herein, the term "variant" refers to a polynucleotide that changes a nucleic acid sequence by insertion, deletion, or substitution of one or more nucleotides as compared to a reference sequence, while retaining its ability to hybridize to other nucleic acids. Variants of any of the embodiments herein include nucleotide sequences that have at least 70%, preferably at least 80%, preferably at least 85%, preferably at least 90%, preferably at least 95%, preferably at least 97% sequence identity to a reference sequence or reference gene and retain the methylation site of the reference sequence or reference gene. Sequence identity between two aligned sequences can be calculated using BLASTn, e.g., NCBI. Variants also include nucleotide sequences that have one or more mutations (insertions, deletions, or substitutions) in the nucleotide sequence of the reference sequence, while still retaining the methylation site of the reference sequence. A plurality of mutations generally refers to within 1-10, such as 1-8, 1-5, or 1-3. The substitution may be between purine nucleotides and pyrimidine nucleotides, or may be between purine nucleotides or pyrimidine nucleotides. The substitution is preferably a conservative substitution. For example, conservative substitutions with nucleotides that are similar or analogous in nature generally do not alter the stability and function of the polynucleotide in the art. Conservative substitutions such as exchanges between purine nucleotides (A and G), exchanges between pyrimidine nucleotides (T or U and C). Thus, substitution of one or several sites in a polynucleotide of the invention with residues from the same residue will not substantially affect its activity. Furthermore, the methylation sites described herein contained in the variants of the invention are not mutated. That is, the method of the present invention detects methylation at methylation sites in the corresponding sequence, and mutations may occur at bases other than those sites.
As used herein, the term "biological sample" or "sample" generally refers to a sample obtained or derived from a biological source of interest (e.g., a tissue or organism or cell culture). In some embodiments, the organism from which the sample is derived is an animal or a human, preferably a human. In some embodiments, the sample is or includes biological tissue or fluid. In some embodiments, the biological sample may be or include a cell, tissue, or body fluid. In some embodiments, the biological sample may be or include blood, blood cells, cell-free DNA, free floating nucleic acid, ascites, biopsy, surgical samples, cell-containing body fluids, sputum, saliva, stool, urine, cerebrospinal fluid, peritoneal fluid, pleural fluid, lymph, gynecological fluid, secretions, excretions, skin swabs, vaginal swabs, oral swabs, nasal swabs, lotions such as catheter or bronchoalveolar lavage, aspirates, swabs, and the like. In some embodiments, the biological sample is or includes cells obtained from a single subject or from multiple subjects. The sample may be a "primary sample" obtained directly from a biological source, or may be a "treated sample".
As used herein, the term "cancer" is used to refer to a disease or disorder in which cells exhibit abnormal, uncontrolled and/or autonomous growth such that they exhibit an abnormally elevated proliferation rate and/or abnormal growth phenotype. In the present invention, the cancer of interest may be lung cancer.
As used herein, the term "diagnosis" refers to a quantitative and/or qualitative probability of determining whether a subject has or is at risk of developing cancer. For example, in the diagnosis of cancer, the diagnosis may include a determination as to the risk, type, stage, malignancy, etc. of the cancer.
As used herein, the term "marker" is consistent with its use in the art and refers to an entity whose presence, level or form is associated with a particular biological event or state of interest, and thus is considered to be a "marker" of that event or state. One of skill in the art will recognize that in the context of a methylation marker, the methylation marker can be or include a locus (e.g., one or more methylation loci) and/or a state of a locus (e.g., a state of one or more methylation loci). The marker may be or include a marker of a particular disease, or may be a marker of a quantitative probability of a particular disease developing, or relapsing in a subject. Methylation markers of the invention may be markers for the prediction, prognosis and/or diagnosis of lung cancer.
As used herein, "DNA region" or "region" refers to any contiguous portion of a larger DNA molecule. In this context, a DNA region refers to a gene of interest and regions upstream and downstream thereof. "upstream" of a gene or region refers to the region relative to the 5' end of the gene or region. "downstream" of a gene or region refers to the region that is 3' relative to the gene or region.
As used herein, the term "identity" refers to the overall relatedness between nucleic acid molecules (e.g., DNA molecules and/or RNA molecules). Methods for calculating the percent identity between two provided sequences are known in the art. For example, the percent identity of two nucleic acids can be calculated as follows: alignment of the two sequences for optimal comparison purposes (e.g., gaps may be introduced in one or both of the first and second sequences for optimal alignment, and non-identical sequences may be omitted for comparison purposes); then comparing the nucleotides at the corresponding positions; when a position in a first sequence is occupied by the same residue (e.g., nucleotide or amino acid) as the corresponding position in a second sequence, then the molecules are identical at that position. The percent identity between two sequences is a function of the number of identical positions shared by the sequences (considering the number of gaps introduced for optimal alignment and the length of each gap). Comparison of sequences and determination of percent identity between two sequences may be accomplished using a computational algorithm such as BLAST (basic local alignment search tool).
As used herein, the term "methylation" includes (i) any C5 position of cytosine; (i i) cytosine at position N4; (ii) methylation of adenine at position N6; and (iv) other types of nucleotide methylation. Methylated nucleotides can be referred to as "methylated nucleotides" or "methylated nucleotide bases". In certain embodiments, methylation as described herein specifically refers to methylation of cytosine residues. In some cases, methylation refers to methylation of cytosine residues present in CpG sites.
As used herein, the term "methylation analysis" refers to any technique that can be used to determine the methylation state or level of a methylation site.
As used herein, the term "methylation marker" refers to a marker of at least one methylation site and/or the methylation state of at least one methylation site (e.g., a hypermethylation site). In particular, the methylation marker is characterized by a methylation state of one or more nucleic acid sites that changes between a first state and a second state (e.g., between a cancerous state and a non-cancerous state).
As used herein, "methylation state" refers to the number, frequency, or pattern of methylation sites within a methylation locus. Thus, the change in methylation state between the first state and the second state may be or include an increase in the number, frequency or pattern of methylation sites, or may be or include a decrease in the number, frequency or pattern of methylation sites. In each case, the change in methylation state is a change in methylation value.
As used herein, the term "methylation value" refers to a numerical representation of methylation status, for example, in the form of a number representing the frequency or ratio of methylation of a methylated locus. In some cases, the methylation value can be generated by a method comprising quantifying the amount of intact nucleic acid present in the sample after restriction digestion of the sample with the methylation dependent restriction endonuclease. In some cases, the methylation value can be generated by a method comprising comparing amplification profiles of samples after bisulfite reaction. In some cases, methylation values can be generated by comparing the sequences of bisulfite treated and untreated nucleic acids. In some cases, the methylation value is a quantitative PCR result, including a quantitative PCR result or based on a quantitative PCR result. Herein, methylation level represents the proportion of one or more sites in a methylated state. The methylation level of a region (or group of sites) is the average of the methyl levels of all sites in the region (or all sites in the group). Thus, an increase or decrease in the methylation level of a region does not indicate an increase or decrease in the methylation level of all methylation sites in the region. The process of converting the results obtained by methods for detecting DNA methylation (e.g., simplified methylation sequencing) to methylation levels is known in the art. For example, the methylation level of CpG sites can be obtained using software Bismark (v0.17.0). Methods of detecting DNA Methylation are known in the art and include, but are not limited to, bisulfite conversion-based PCR (e.g., methylation-specific PCR (MSP)), DNA sequencing (e.g., bisulfite sequencing (Bisulfite sequencing, BS), whole genome Methylation sequencing (white-genome bisulfite sequencing, WGBS), simplified Methylation sequencing (Reduced Representation Bisulfite Sequencing, RRBS)), methylation-sensitive restriction enzyme analysis (methyl-Sensitive Dependent Restriction Enzymes), fluorescent quantitation, methylation-sensitive High-resolution melting curve (methyl-sensitivity High-resolution Melting, MS-HRM), chip-based Methylation profile analysis or mass spectrometry (e.g., flight mass spectrometry), large-scale parallel sequencing techniques (e.g., next generation sequencing techniques), e.g., sequencing by synthesis, real-time (e.g., single molecule) sequencing, bead emulsion sequencing, nanopore sequencing, and the like. In one or more embodiments, detecting includes detecting any strand at a gene or site. DNA methylation can also be detected using reduced genome methylation sequencing (RRBS). Simplified genome methylation sequencing is a technique that uses restriction enzymes to cleave the genome, bisulfite-treat it, and sequence the CpG regions of the genome. For example, reagents used to simplify genome methylation sequencing include: plasma nucleic acid purification kit, ligase, bisulfite and derivatives thereof, dNTP, polymerase, primer, nuclease-free water and/or magnetic beads, etc.
As used herein, "specificity" of a marker refers to the percentage of a sample characterized by the absence of an event or state of interest, wherein measurement of the marker accurately indicates the absence of the event or state of interest (true negative rate). In various embodiments, the characterization of the negative sample is independent of the marker and may be accomplished by any relevant measurement, such as any relevant measurement known to those of skill in the art. Thus, the specificity reflects the probability that the marker will detect the absence of an event or state of interest when measured in a sample that does not characterize the event or state of interest. In certain embodiments in which the event or condition of interest is lung cancer, specific refers to the probability that a marker will detect the absence of lung cancer in a subject lacking lung cancer. The absence of lung cancer may be determined, for example, by histology.
As used herein, "sensitivity" of a marker refers to the percentage of a sample characterized by the presence of an event or state of interest, wherein measurement of the marker accurately indicates the presence of the event or state of interest (true positive rate). In various embodiments, the characterization of the positive sample is independent of the marker and may be accomplished by any relevant measurement, such as any relevant measurement known to those of skill in the art. Thus, sensitivity reflects the probability that a marker will detect the presence of an event or state of interest when measured in a sample characterized by the presence of the event or state of interest. In particular embodiments in which the event or state of interest is lung cancer, sensitivity refers to the probability that a marker will detect the presence of lung cancer in a subject having lung cancer. The presence of lung cancer may be determined, for example, by histology.
The term "subject" as used herein refers to an organism, typically a mammal (e.g., a human). In some embodiments, in one embodiment, the subject has cancer. In one embodiment, the subject has lung cancer.
Nucleic acid isolated from lung cancer patients
The invention provides isolated nucleic acids that are isolated from a sample of a subject. For example, the isolated nucleic acid is isolated from free DNA in the plasma of a lung cancer patient. The isolated nucleic acid is one or more specific methylation markers, preferably lung cancer tissue specific methylation markers. Methylation markers are the following regions or the sites of the regions, which are the following genes and the 2.2kb upstream region and the 2.2kb downstream region of the chromosome in which they are located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9. The site is a site of methylation. It will be appreciated by those skilled in the art that mutations may be present in the genes of the genome, and thus it is contemplated that variants of these genes may also serve as methylation markers, provided that the methylation sites in the variants are not mutated. A variant may comprise a sequence having at least 70% identity to the sequence of either gene. The site selected as a marker may comprise 1 or more cpgs, for example 2 cpgs, 3 cpgs, 4 cpgs, 5 cpgs, 6 cpgs, 10 cpgs, 20 cpgs or 30 cpgs. Suitable sites may be 150bp to 500bp in length. For example, the length of the site may be 160bp, 170bp, 180bp, 190bp, 200bp, 210bp, 220bp, 230bp, 240bp, 250bp, 260bp, 270bp, 280bp, 290bp, 300bp, 310bp, 320bp, 330bp, 340bp, 350bp, 360bp, 370bp, 380bp, 390bp, 400bp, 410bp, 420bp, 430bp, 440bp, 450bp, 460bp, 470bp, 480bp, 490bp or 500bp.
Those skilled in the art understand that a gene has the same or similar methylation levels or status as regions upstream and downstream thereof. Thus, when the inventors found that a methylation site within a particular gene, it was contemplated that the gene and the 2.2kb upstream region and the 2.2kb downstream region in situ on the chromosome also possessed the same or similar methylation levels or status. The invention encompasses the gene of the invention and the 1.9kb, 1.8kb, 1.7kb, 1.6kb, 1.5kb, 1.4kb, 1.3kb, 1.2kb, 1.1kb, 1kb, 900bp, 800bp, 700bp, 600bp, 500bp, 400bp, 300bp, 200bp, 100bp, 90bp, 80bp, 70bp, 60bp, 50bp, 40bp, 30bp, 20bp, 10bp or 5bp upstream and downstream regions of the chromosome in which the gene is located.
In this context, the following nucleotide sequences are used as methylation markers in the present invention.
Wherein the coordinates of the chromosomal location are determined with reference to the human whole genome sequence hg 19. Based on the methylation markers screened for lung cancer tissue specificity and the genes in which they reside, one skilled in the art will appreciate that the following loci can be used as methylation markers: located within or upstream or downstream of the gene ARHGEF 16; located within gene CASZ1 or upstream or downstream; located within or upstream or downstream of the gene MAP3K 6; within gene TRIM58 or upstream or downstream; located within or upstream or downstream of the gene ARHGEF 33; located within gene PSD4 or upstream or downstream; located within or upstream or downstream of the gene HOXD 4; located within or upstream or downstream of the gene SLC12 A8; located within or upstream or downstream of the gene DGKG; located within or upstream or downstream of the gene TERT; located within or upstream or downstream of the gene NR2F 1; located within or upstream or downstream of the gene PCDHGC 5; located within or upstream or downstream of the gene KCNMB 1; located within or upstream or downstream of gene FOXC 1; located within gene HIST1H4F or upstream or downstream; within gene TYW, or upstream or downstream; located within or upstream or downstream of the gene LRRC 4; located within or upstream or downstream of the gene DGKI; located within or upstream or downstream of gene PDLIM 2; located within or upstream or downstream of the gene RHOBTB 2; located within or upstream or downstream of gene TMEM 75; located within or upstream or downstream of the gene OPLAH; located within or upstream or downstream of the gene NR5A 1; within or upstream or downstream of gene SPAG 6; within the gene WAPAL or upstream or downstream regions; located within or upstream or downstream of gene BTBD 16; located within or upstream or downstream of gene DPYSL 4; located within or upstream or downstream of gene TTC 40; located within or upstream or downstream of gene ADAM 8; located within or upstream or downstream of the gene SLC22A 11; located within or upstream or downstream of the gene CPT 1A; located within or upstream or downstream of gene B4GALNT 1; located within or upstream or downstream of the gene FBRSL 1; within gene XPO4 or upstream or downstream; located within or upstream or downstream of gene TFDP 1; located within or upstream or downstream of the gene GCH 1; located within or upstream or downstream of gene TMEM 179; within or upstream or downstream of the gene ITPKA; located within or upstream or downstream of gene SOX 8; located within or upstream or downstream of the gene SLC9A3R 2; located within or upstream or downstream of gene SEPT-9; located within or upstream or downstream of the gene MBP; located within or upstream or downstream of the gene NFATC 1; located within or upstream or downstream of gene DNM 2; located within or upstream or downstream of the gene RASAL 3; located within gene TAF4 or upstream or downstream; located within or upstream or downstream of gene NTSR 1; located within or upstream or downstream of the gene SLC17A 9. Combinations of one or more methylation markers alone may be used as lung cancer specific methylation markers. In one embodiment, the methylation marker is within a 2kb upstream and 2kb downstream region of any of the genes described above.
The precursor Andy fibre of the epigenetic kingdom has once indicated that most methylation changes in colon cancer occur not only in the promoter, but also not on CpG islands, but in the 2kb sequence upstream thereof, we call "CpG island coast" (Andy fibre et al 2009). CpG island shore methylation is closely related to gene expression, is highly conserved in mammals, and can distinguish tissue types. In subsequent studies, researchers have found this phenomenon not only in intestinal cancer species, but also in breast cancer, gastric cancer, bladder cancer, and in some tissue types, the vicinity of these target methylation sites (Guo YL et al, 2016; rao X et al, 2013; dudziec E et al, 2011; chae H et al, 2016). Therefore, protection of these neighboring areas is also important as is protection of the target area.
Kit for diagnosing lung cancer
According to the methylation markers of the present invention, one skilled in the art can prepare kits or devices for detecting the methylation level or status of these markers for diagnosing lung cancer, or distinguishing lung cancer from other pan-cancerous species. The kit or device may comprise reagents or components to detect the status and/or level of one or more lung cancer tissue-specific methylation markers in nucleic acid from the sample. For example, the reagent or component may comprise a reagent or component used in one or more of the following methods: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescence quantification, methylation-sensitive high resolution melting curve, and chip-based methylation profile analysis and mass spectrometry. The reagent may comprise an oligonucleotide for detecting a methylation marker. For example, an oligonucleotide is a primer and/or probe. Preferably, the primer is a primer for detecting the methylation level/state of a site using methylation sequencing or a PCR primer for amplifying one or more methylation sites. Preferably, the reagent comprises bisulfite and derivatives thereof, PCR buffers, polymerase, dntps, primers, probes, methylation sensitive or insensitive restriction enzymes, cleavage buffers, fluorescent dyes, fluorescence quenchers, fluorescence reporters, exonucleases, alkaline phosphatase, internal standards, and/or controls that are the aforementioned specific methylation markers from normal subjects or cancer patients other than lung cancer. Preferably, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer and/or breast cancer.
Methods for diagnosing lung cancer
The present invention provides a method of diagnosing lung cancer in a subject comprising: (1) Determining the methylation status or level of one or more lung cancer tissue-specific methylation markers of the invention in a sample of a subject; and (2) determining lung cancer based on the determined tissue-specific methylation status or level of lung cancer. In one embodiment, the subject is a cancer patient or a subject at risk for cancer. In one embodiment, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer, and/or breast cancer. In one embodiment, the sample is a cell, tissue, fine needle biopsy, or plasma. In one embodiment, the method of obtaining the methylation level data may be any suitable method of determining the methylation level of a nucleic acid sequence, such as bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorometry, methylation-sensitive high resolution melting curve, and chip-based methylation profile analysis and mass spectrometry.
The present invention also provides a method for diagnosing lung cancer, comprising: (1) Detecting in a sample from a subject the methylation level of a sequence described herein; (2) Comparing with a control sample, or calculating to obtain a score; (3) identifying lung cancer in the subject based on the score. Typically, the method further comprises, prior to step (1): extraction of sample DNA and conversion of unmethylated cytosine on the DNA to bases that do not bind guanine. In one or more embodiments, the subject sample has an elevated or reduced methylation level when compared to a control sample. When the methylation level meets a certain threshold, lung cancer is identified. A mathematical analysis of the methylation level of the measured gene was performed to obtain a score. For the detected sample, when the score is greater than the threshold, the result is judged to be lung cancer, otherwise, the result is negative, namely the cancer except lung cancer. Methods of conventional mathematical analysis and processes for determining thresholds are known in the art.
The invention also provides a method comprising: (1) Obtaining the methylation level of a methylation marker described herein in genomic DNA of a lung cancer sample and a non-lung cancer sample; and (2) constructing a machine learning model of logistic regression using the data of methylation levels of the methylation markers. The sample may be cells, tissue, fine needle biopsy or plasma. Genomic DNA may be free DNA in plasma. Step (1) may include obtaining methylation sequencing data of sample DNA by a method that includes MethylTitan, and step (2) may include using a logistic regression model in the sklearn (V1.0.1) package in python (V3.9.7): the formula of the model is as follows, wherein x is the methylation level value of a sample target marker, w is the coefficient of a methylation marker, b is the intercept value, and y is the model predictive value
Training using the methylation level of the obtained methylation markers as a training set: all model (Traintata, traintheno), wherein Traintata is the data of the training set, traintheno is the property of the training set sample, wherein lung cancer is 1, other cancer species are 0, and the relevant threshold of the model is determined according to the training set sample. The method further comprises using the methylation level of the methylation marker in the genomic DNA of the sample to be tested as a test set: testpred=allrodel. Prediction_ proba (TestData) [: 1 ]Wherein TestData is test set data, testPred is model predictive score, and whether a sample is lung cancer is judged according to a threshold value by using the predictive score and is greater than the threshold valueThe value is predicted to be lung cancer, otherwise other cancer species. The method may be used (1) to distinguish between a lung cancer patient and a non-lung cancer patient, (2) to diagnose or aid in diagnosing lung cancer; or (3) is used for tracing the tissue of the lung cancer in the process of screening the pan-cancer.
System or device for diagnosing lung cancer
The invention also provides a system or a device. The system or apparatus may include a computer readable storage medium or memory for storing programs or instructions. The program or instructions are for executing a predictive model of the invention that distinguishes lung cancer from other non-lung cancers, or for executing a method of the invention. Computer-readable storage media or memory includes, but is not limited to, tangible storage media, carrier wave media, or physical transmission media. Nonvolatile storage media includes, for example, optical or magnetic disks, such as any storage devices in any computer or the like, volatile storage media includes dynamic memory, such as the main memory of a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier wave transmission media can take the form of electrical or electromagnetic signals, or acoustic or light waves, such as those generated during radio frequency and infrared data communications. Thus, common forms of computer-readable media include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, RAM, ROM, PROM and EPROM, FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, a cable or link transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution. The memory and processor may be physically separate. In this case, the operative connection may be realized via a wired and a wireless connection between the units allowing data transmission. The wireless connection may use a Wireless LAN (WLAN) or the internet. The wired connection may be made through optical and non-optical cable connections between the units. The cable for wired connection is further suitable for high-throughput data transmission.
Use for diagnosing lung cancer
The invention also provides the use of an isolated nucleic acid or reagent or component in the preparation of a kit or device for (1) distinguishing between a lung cancer patient and a non-lung cancer patient; (2) for diagnosing or aiding in diagnosing lung cancer; or (3) is used for tracing the tissue of the lung cancer in the process of screening the pan-cancer. Preferably, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer and/or breast cancer. The kit or device may contain reagents for determining the methylation level in a variety of available ways.
Examples
The invention will now be described in further detail with reference to the drawings and to specific examples. In the following examples, experimental procedures without specifying the specific conditions were generally carried out as described in conventional conditions.
Example 1: screening of lung cancer specific methylation sites by methylation targeted sequencing
The inventors collected a total of 490 patients with each cancer species, all patients in the group signed informed consent. The samples are divided into a training set and a testing set according to a certain proportion, wherein the training set is used for constructing a machine learning model, the testing set is used for performance testing of the model, sample information is shown in the following table 1, the total number of lung cancer samples in the training set is 51, and the total number of lung cancer samples in the testing set is 20.
TABLE 1 statistical table of the number of plasma samples for each cancer species
MethylTitan developed by the applicant TM Methylation sequencing data of cfDNA of the plasma of the target sample is obtained, and DNA methylation classification markers are identified. The process is as follows:
1. extraction of blood plasma cfDNA samples
2ml whole blood samples of patients were collected by a streck blood collection tube, and after timely centrifugation of plasma (within 3 days), cfDNA was extracted according to instructions by a QIAGEN QIAamp Circulating Nucleic Acid Kit kit after transport to the laboratory.
2. Sequencing and data preprocessing
a) The library was double-ended sequenced using an Illumina Nextseq 500 sequencer.
b) The Pear (v0.6.0) software combines double-ended sequencing data of the same fragment sequenced by 150bp on both ends of an Illumina Hiseq X10/Nextseq 500/Novaseq sequencer machine into one sequence, the shortest overlapping length is 20bp, and the shortest 30bp after combination.
c) The pooled sequencing data was subjected to a dehiscence process using trim_galore v0.6.0, cutadaptv 1.8.1 software. The linker sequence was removed at the 5' end of the sequence as "AGATCGGAAGAGCAC" and the bases with a sequencing mass value below 20 at both ends were removed.
3. Sequencing data alignment
The reference genome data used herein are from the UCSC database (UCSC: HG19, http:// hgdownload. Soe. UCSC. Edu/goldenPath/HG19/bigZips/HG19.Fa. Gz).
a) HG19 was first transformed with cytosine to thymine (CT) and adenine to Guanine (GA), respectively, using Bismark software, and the transformed genomes were indexed using Bowtie2 software, respectively.
b) CT and GA conversion were also performed on the data pre-processed on-press data from Illumina Nextseq 500 sequencer.
c) The transformed sequences were aligned to the transformed HG19 reference genome using Bowtie2 software, respectively, with a minimum seed sequence length of 20, the seed sequence not allowing for mismatches.
4. Calculation of Methylation Haplotype Frequencies (MHF)
And (3) for the CpG sites of each target region HG19, acquiring the methylation state corresponding to each site according to the comparison result. The nucleotide numbering of the sites herein corresponds to the nucleotide position numbering of HG 19. There may be multiple methylation maps for a target methylation region, and this value is calculated for each methylation map within the target region, as shown in the MHF calculation formula:
wherein i represents a target methylation interval, h represents a target methylation haplotype, N i Representing the number of reads (reads) located in the methylation interval of interest, N i,h The number of reads comprising the methylation haplotype of interest is indicated.
5. Methylation data matrix
a) The methylation sequencing data (methylation haplotype frequency) of each sample of the training set and the test set are respectively combined into a data matrix, and each site with depth lower than 200 is subjected to deletion value processing.
b) Sites with a deletion value ratio higher than 10% were removed.
c) And performing missing data interpolation on missing values of the data matrix by using a KNN algorithm.
6. Finding out lung cancer tissue specific methylation markers according to training set samples
a) Calculating the AUC of each methylation haplotype marker in the training set compared with other cancer species, and sorting from high to low, and screening out methylation markers which can better distinguish lung cancer from other cancer species as candidate markers;
b) And constructing a logistic regression model in a training set by using the methylation markers constructed in the previous step, and then verifying the effect of the model by using a test set sample. The steps are mainly based on the logic regressions function of the python3 sklearn packet linear_model module, and the specific steps are as follows:
1. the standard scaler is used for standardizing the training set data and storing a standardized conversion formula, wherein the formula is as follows: x= (x-u)/σ, μ is the mean value of all sample data, σ is the standard deviation of all sample data;
2. Inputting the standardized data into a logistic regression function, and training a logistic regression model;
3. applying a normalization formula to the test set data to normalize the test set;
4. and applying the trained logistic regression model to the test set sample for testing.
The methylation levels of these methylation markers in lung cancer and other 6 cancer species are shown in table 2 below and figures 1 and 2. These methylation markers have significant differences (u-test, p-value less than 0.05) in lung cancer versus other cancer species in both training and test sets, and also have large differences in methylation levels.
TABLE 2 methylation level mean of methylation markers in lung cancer and other 6 cancer species in training and test sets
Taking a single lung cancer tissue specific methylation marker Seq ID NO. 1 as an example to look at the distribution of methylation levels of the lung cancer tissue specific marker in seven cancer species in a training set and a test set as shown in figures 3 and 4 respectively, it can be seen that the methylation levels of the lung cancer tissue specific marker have significant differences (wilcox test: P < = 0.05) in lung cancer compared with other 6 cancer species, and the lung cancer tissue specific methylation marker is a good lung cancer tissue specific methylation marker.
Example 2: discrimination performance of single lung cancer tissue-specific methylation markers
To verify the potential of a single lung cancer tissue specific methylation marker to differentiate lung cancer from other 6 cancer species, a model was trained in the training set data of example 1 using methylation level data of a single lung cancer tissue specific methylation marker, and performance of the model was verified using the test set sample, as follows:
1. the logistic regression model in the sklearn (V1.0.1) package in python (V3.9.7) was used: allmodel=logistic regression (), the formula of the model is as follows, where x is the methylation level value of the sample target lung cancer tissue-specific methylation marker, w is the coefficient of the different markers, b is the intercept value, and y is the model predictive value:
2. the training is performed using samples of the training set, allrodel. Fit (Traintata, traintheno), wherein Traintata is data of target methylation sites in the training set samples, traintheno is the property of the training set samples (lung cancer is 1, other cancer species is 0), and the correlation threshold of the model is determined according to the samples of the training set.
3. The test is performed using samples of the test set, testpred=allrodel. Prediction_ proba (TestData) [: 1], where TestData is the data of the target methylation sites in the test set samples, testPred is the model predictive score, and the predictive score is used to determine whether the sample is lung cancer according to the above-mentioned threshold.
4. And (3) counting the AUC of the model, and counting indexes such as sensitivity, specificity, accuracy and the like according to the determined threshold.
The effect of the logistic regression model of the single lung cancer tissue-specific methylation marker in this example is shown in table 3, from which it can be seen that all lung cancer tissue-specific methylation markers can reach AUC above 0.67 and accuracy above 0.58 in both the test set and the training set, and are all good lung cancer tissue-specific markers, wherein excellent markers such as Seq ID No. 45,Seq ID NO:23,Seq ID NO:42 can reach sensitivity above 75% under more than 80% of the specificity in the test set, and the overall accuracy reaches more than 80%.
TABLE 3 expression of single lung cancer tissue-specific methylation marker logistic regression models
Example 3: machine learning model for all target lung cancer tissue-specific methylation markers
In this example, a logistic regression machine learning model was constructed using the methylation levels of all 48 lung cancer tissue-specific methylation markers to accurately distinguish lung cancer samples from multiple cancer species data. The specific procedure is consistent with example 2 except that the relevant sample brings data for all 48 methylation markers of interest. The method comprises the following steps:
1. The logistic regression model in the sklearn (V1.0.1) package in python (V3.9.7) was used: allmodel=logistic regression (), the formula of the model is as follows, where x is the methylation level value of the sample target methylation marker, w is the coefficient of the different methylation markers, b is the intercept value (the parameters are obtained by training a logistic regression model), and y is the model predictive score:
2. training is performed using samples of the training set, allrodel. Fit (traintata, traintheno), where traintata is the data of the training set (methylation haplotype frequency), traintheno is the behavior of the training set samples (lung cancer is 1, other cancer species is 0), and the correlation threshold of the model is determined from the training set samples.
3. The test is performed using samples of the test set testpred=allrodel. Prediction_ proba (TestData) [: 1], where TestData is the test set data (methylation haplotype frequency), testPred is the model predictive score, which is used to determine if the sample is lung cancer based on the above-described threshold.
Model predictive score distribution in training and test sets is shown in fig. 5, from which it can be seen that lung cancer and other cancer species have significant differences in model scores (wilcox test: P < = 0.05). The ROC curve is shown in fig. 6, in the test set, AUC of distinguishing lung cancer from other cancer species reaches 0.903, and when the AUC is greater than the AUC, the AUC is set to be 0.336, the lung cancer is predicted to be the lung cancer, otherwise, the AUC is predicted to be the other cancer species, when the specificity is 94.7%, the sensitivity reaches 80.0%, the accuracy of the overall prediction of the sample reaches 85.0%, and the lung cancer sample can be well distinguished from 7 cancer samples.
Example 4 Lung cancer tissue specific methylation marker combination 1 machine learning model
To verify the effect of the relevant lung cancer tissue-specific methylation marker combinations, the present example randomly selected 10 lung cancer tissue-specific methylation markers from all 48 lung cancer tissue-specific methylation markers, and constructed a new machine learning model based on the methylation level data of Seq ID NO:2,Seq ID NO:5,Seq ID NO:9,Seq ID NO:13,Seq ID NO:24,Seq ID NO:37,Seq ID NO:39,Seq ID NO:41,Seq ID NO:46,Seq ID NO:48.
The machine learning model construction method was also consistent with example 2, but the relevant samples used only data for 10 lung cancer tissue-specific methylation markers in this example, the model scores for the model in the training set and test set are shown in fig. 7, and the model ROC curves are shown in fig. 8. The model can be seen that in a training set and a test set, the lung cancer sample score has a significant difference (wilcox test: P < = 0.05) with other cancer species scores, the AUC of the model test set reaches 0.895, when the threshold is set to 0.226, the model test set is larger than the predicted value and is smaller than the predicted value and is other cancer species, when the specificity is 88.7%, the sensitivity reaches 80.0%, the overall accuracy reaches 87.7%, and the good performance of the combined model is illustrated.
Example 5: lung cancer tissue specific methylation marker combination 2 machine learning model
This example uses another lung cancer tissue-specific methylation marker combination: a machine learning model was constructed from a total of 5 lung cancer tissue-specific methylation markers of Seq ID No. 24,Seq ID NO:36,Seq ID NO:41,Seq ID NO:43,Seq ID NO:46.
The model construction method was identical to example 2, but the relevant samples used only the data of 5 markers in this example. The model scores of the model in the training set and the test set are shown in fig. 9, and the roc curve is shown in fig. 10. The graph shows that in the training set and the testing set, the lung cancer sample score is obviously higher than the scores of other cancer species (wilcox test: P < = 0.05), when the threshold is set to be 0.253, the sensitivity reaches 75.0% when the specificity is 95.4% in the testing set, the overall accuracy can reach 93.0%, and the lung cancer and other cancer species can be well distinguished.
According to the method, 48 lung cancer specific methylation markers are selected from methylation NGS sequencing data of 7 cancer species, a machine learning model constructed according to methylation level data of the methylation markers can well distinguish lung cancer samples from the data of 7 cancer species, and the methylation markers are good lung cancer tissue specific methylation markers and provide important references for tissue tracing of lung cancer in the early screening process of the pan-cancer species.
While various embodiments have been described, it will be apparent that the basic disclosure and examples can provide other embodiments that utilize or are encompassed in the markers and methods described herein. It is, therefore, to be understood that the scope of the invention is to be defined by the scope of the disclosure and the appended claims, and not by the specific embodiments.
Sequences as used herein:
>Seq ID NO:1
CTGGCCCTGACAGACTGCAGACCAGACCGGGGCATTGTTCTCTTTCTCGGCCTTCCCCGCCGTGGACGGGCCCCCCACCTGGTTTGTGAAACCTGCGCCCAGGCTGAGTTCACAGCTAAACTTAGCGCCTCCCATTGTTTCCCCGGGGCCGTGGAGTTTGGTTAATAACTTCCCCTGATTTTCCTCGGGATGGGCTGGAAAGAGCCACGAGCCAGCCAGGCGCATCCTGCGTTTGTTTGTGCGGGGAGCGAGGCCGGGAATATCTGATCGGGCGGAGCAAGCCGGGCGGGAGAGGCCCACCCAGGCCCGAGGAAGGGAGCCCAGCGGGGGGCAGTTTCCATTGTCCCTCCTGCCCGCTGCCCCCACGG
>Seq ID NO:2
CGAGAGAGTGCATTCAAGAAGGGCGATCCGGGCACATATGCGACCTGTGAGAGGCGGAGTCGGTGACAGGTGGGTCTTGTTTTTTAATAAAGAGCTTGTTCCTaatcagatcatggcactcagaactcttcaaaaagcttcttatttcactctgggtaaaagccagagttctcacaatggcctgcaaggcctacgggatctgagggccccccaccctgaccccctcgacttcagatggcatctgcccctcactctgctctagcca
>Seq ID NO:3
CCCAGTCCACAGGGCTCGAACTCTCAGGTCCTACGAGCCCGCCCACTAGGCCCCGCCCACAGGAGCCGCTCCGCTCGTGGCCCGGCTCACTCGGCCCTCGCGAGCCCTCAGCCCCACCCGCGCTGCCACGCACCGCACCTGCTGTCCCGCTCCGGGATCTCCTTGATGGCGATGCGCACCCTCGTGTGGCGATCGCGGCCCGCGTACACCACCCCATACGTGCCCTTGCCCAGCACCAGCCGCTCGCCCGTCTCCGTGTACTCATAATCAAACTGCCGGGCGCGGGGTGAGATGGGAGTTCAGCAGGGCCCGCGGCCCCTCGCCCTCCGCGAGCTCCCAGTCCCGCGTCCTCACCTCCAACATCTCCCCCGCGCCCTCCGCCTCCTCCGCGG
>Seq ID NO:4
GCGTGCGGCGGCTGGGGTTgggcgcggggcccggggcgcggcgATGCGCGCGGCACGGCGAGGACCTGAGCCGCTTCTGCGAGGAGGACGAGGCGGCGCTGTGCTGGGTGTGCGACGCCGGCCCCGAGCACAGGACGCACCGCACGGCGCCGCTGCAGGAGGCCGCCGGCAGCTACCAGGTGAggcgccccccggcgggggctgcgggcgcTGCGGTGACCGGGAAGCGGGCGACAGTCCGGAGCGGAGCCGCCGAGGCCACCCGTCTCCTGAGCGGCTCCCACGGCCGCTCCCCCCACCGCGCGCCGTCCCCCCCGCCCACGCGGCTCACTCAGTGTGGGTCTCTTTGCCTTGGCTGTGGTAACCCCCTTTGCGACACACACCCAT
>Seq ID NO:5
CAAACTGGAGGCGGCGGCGCAGGCGCACGGCAAGGCCAAGCCGCTGAGCCGCTCTCTCAAAGAGTTCCCGCGTGCGCCGCCAGCCGACGGCGTGGCCCCACGCCTCTACAGCACGCGCAGCAGCAGCGGCGGCCGCGCGCCCATCAAGGCCGAgcgcgccgcgcaggcgcacggcccggccgccgccgccgtcgccgcccg
>Seq ID NO:6
TGAGGAGGAGCGGAAGTCGGAAGCTCCAGCCGTCACAGCCACATTCACTGGGCAAGCCGACTGTGAGCCAGGAAGTGCTCTTGGGGAGCCCAGGCCAAGCCATCCATTCTTGGGTCCTTTGGAGGTGAGCTAAGTGGGTCTGCCTAGGTTGGGGCTGGTGGAACCTGTGGGAGCAGGGAATGTGGAGAGTCACATGTGGGT
>Seq ID NO:7
AGCGGTTgcggcgggccggcgggcccggggAAGCGGGCGGTGGCCGCTCAGAGAATACCTTCCTTCCGGCAGGAGACCGTTTGGCCCTGTATTCCGGGCCTGCGGTTGGGCCTCCAAGCTGAGTTGGGCAACTTCCCAGCACCGCAAGAAAGGGCGAGCCAGACCTATTTGGCACCCCTTTCCCAGGAGGAGCAGGGGATGGCGCCGGCGGAGTTTGGGGAGGCTGCCCTGGCCAGTTCCCCGGGCTAGAGGGTGGAGGAGAGGAGGAGGGAGAGGAAAGGGCAGCTGAGGACTTGGAAGAAATGAGAAGCCGTGC
>Seq ID NO:8
GGGCGCAGGAAGAGCGGCTCTGCGAGGAAAGGGAAAGGAGAGGCCGCTTCTGGGAAGGGACCCGCACGACGACGCCCGAAGGGCGTCGGGGGAAGTGGTAGGCCCCGGAGACTGCGCGAGGCTCCTCAGCAAAGGAAGTGGGCGCGGCGCGCACGCAAGACCTCGCACCCGGCCTCGCGCGCCGCCTCTGGACAGCCCAGC
>Seq ID NO:9
AAAACTAATGTTTCTTCCTCCTTCTGTGATCTTCCTTCTTTCTGTTTTGAGCAGCTTCTATCACCTGTGTCCTCTGCGGATGAACTGCATAAAGCTCTCCGCCAAAGCCTACTTCTCCCTCATGGTGGAGAGGGAGCCGTGTGAGTAGTCCGGTACCGCAGCCATCCACCCTCTGCAGATCAGCTTTTCCTTCCTTGGCTC
>Seq ID NO:10
ACTCACCCTGCACGGGACAGGGACACCCGGGGACAGTGCCTCACTCACCCTACACGTGACAGGGACACCTGGGGACCGCGCCTCACTCACCCTGCACGTGACAGGGACACCCGGGGACAGTGCCTCACTCACCCTATACCTGGGAGGGACACCCAGGGACGGTGCCTCACTCACCCTACACGTGACAGGGACACCTGGGGC
>Seq ID NO:11
CCAACTGCCCGCGCGGAACCGGGCCGTGGGCCTGGGGTTCGGGAAGCGTGCGCCACCCCCGGTCGGGCCTGGCTTCCTTCTTGAATGCCCCCGGCGCAGGCCCGGTGCTTTGTCCCTCCGGCCTTCTCAAGGAGTGGTGGCCTTCTGCGGGGGCGAGAGCACGGCCTCTAGCCTTCCGCCGACGTCTCAGTGCGCAGATAccgcggcccgggcccctccgccgcgcgggggACCGCACTAGCGTCGACCTCCCGGCAGCCAACCCCGCGCGCAAGGCTCCGCGGCCGGATATGGGCCTAGCTTCCGGGATCCGCTCCCTGCGGGGCCGCGCTTAGGGTCGGAGTTCGCTAGTCCAGGGAAAGG
>Seq ID NO:12
GCGTGTCAGTGTGCAGTGGAGTGTGCAGTCTAAGCTTGCGGCTGTCTCCAGGCAGAAGAGGAGAccccggcgcgggcgggggcgggTTGGCGCCGGGCAAACGCCTTGGGTAGAGGGGAGAGGACGTTTCGTTAGTTCCCGCCCCTTCCTGACTAAAATTGCCTACCCGAAGCGCCCCGGAGGGCTTCACGGGAGGAGGGTAGACTCTCC
>Seq ID NO:13
GGAATAGGACGCTGGTTTCGTTCCCCCGAGGTGCGGAGAAGCAGTAGAAGACCTGCTGCTCTTGGAATTTGGCTCTGACCTTCTCCACGTCGGCCCGGGCCGTCTGGTAATTGTCCACGCTGCCTGGGATGTAGGAGCACTGTGGGGAGAAACAAGAGCAGCTGTGGGCTTGGAAATCCCCATTTCTTAGCCAAGGGCTTG
>Seq ID NO:14
CTTAATGCtttttttttttttttttttttttttttATAACATGAAGTTGTCAGGGACGCTCCTATGAGAACTGTTTGGAATTGCTGCACTTCTCTGGCTAGGAGGGAAGTGAGTAAATCACCAGGCGCCCCTCCCAGCTGCCCGTGTCCCTGCGCCGCTCAGCTCCTGCCGCAGGGCTGGCCGCGCCAAGCGCGCGTCCTA
>Seq ID NO:15
CAAGCGCCATCGCAAAGTGCTGCGTGACAACATACAGGGCATCACGAAGCCCGCCATCCGTCGCTTGGCCCGACGCGGCGGCGTGAAACGCATTTCGGGCCTCATTTATGAGGAGACCCGCGGTGTTCTTAAGGTGTTCCTGGAGAATGTGATACGGGACGCCGTAACCTACACGGAGCACGCCAAGCGTAAGACAGTCAC
>Seq ID NO:16
AGCCGTGGCTTCCCGTGGCTGCACTTGGAAAAAGCACTCGACGCTGCCCGGGCAGCTTTCCATCTCAAGTGGGAACGCGGCTGCCGGCTGTCTCCGCTCTTCAAAGTTAGTGGAGGCTCATTTGGAATAAACTCTTCTCTTCTGCTTCCCAGTCAGGCCCTGGTGGAATACAGAGTCTGTCCTGATCCCTGCCCTTTGACA
>Seq ID NO:17
ctcggcaacgcgccctcggcccgcagcctcctgccCCCTGTGCCCCGCTTCGGCCCCCAGCGCAGCTGCAGAGGGGCCCCCCTCGACGCATACACTCAAGAGCCCGACCGCGCGGCTGAAATCGCGGAGCTCGGAGCCGCGGCTGGCTGAGCGATCGCGGTTCCTGGGCTGCGTGCGCGCCCCTTGGAGCTGAAAGGAGCGCCAGGATCGGGGGCGCTGCACCGGGCTGGGCCCCTCAACGCTCGCAGACCGGGCCGGGCTGCAGCTGGAGATGGCAGCAATCCCGGGAGGTCTCCGGGCCTCTTCAGGGTGCGTCCAGGAGGCGGGTTCCGTGCGACGCGGCGCAGCCCACCCCCACGAGACCGCTTAACTTCGCGGGGGCAGCCTCGGGCGCTCGGAGACGCGGAGGCCCAGACTGCAGCCTCCGGATGCTGGAAGCCCAGACTCCCTGGGGTCACCGGCTCTCCCGCCACCCCAGCTGCAAAGAGTCCCATTGCTTCACCGTCCGGAGCTTAGTCTCCTTGTTCCTCTACCAGTCCCTCCCTCCGCAGGTCTCTGGGGACTTCTGACCGCCTGTTCTTA
>Seq ID NO:18
atctcggctcactgcaagctctgcctcccgggttcacgccattctcctgcctcagcctcccaagtagctgggactacaggtgcccgccaccacgcccggctaattttttgtatttttagtagagacggggtttcactgtgttagccgggatggtctcgatctcctgatctcgtgatccacctgccttggcctcccaaagtg
>Seq ID NO:19
CGGGCCAGCGCCCTGGGGCTTCCGTATCACAGGGGGCAGGGATTTCCACACGCCCATCATGGTGACTAAGGTAAGGATGGTGGCTCAAAGAGATGAGAAGGTCCTGCCAGAAGCGAGGTCGGCCCTGTTCACCCCACTCTGCACAGATGGCTTGCTTTTTCTGTTCTGGAGCTAGGGATCTGCTGCTGCCTGGCGTGCTGG
>Seq ID NO:20
TGGCGGCAAAGAGGGGTTTGGTCTCGGGGCTTAAATGGCACCAGACTCTTGCTTTTGCCCATCTGGAGACTGCAGGCTCCCTTCCTTACCCTCAGAGAGTGCTTATGGTGGGTGTTTTTGCGGGGCTGCAATAGGGGCCAAAAGTCAGGGAAAGGGGCACTGACCTGTAGTGAAAGGCCACAGGACACAGCCTTATTACTG
>Seq ID NO:21
CTGGTGCTCTGCAGTGGCAGGGCTGAGATGATTATACAACCTGCACTCCAGGCCAAGTCCGGTACTCGTCCCAGCTGTCGGCTAAGCCTGCACTGCTATGGGTGAGGGAATCACTCCTCTCCAGCTGGCTTTCTCACGCTGGAGAAGCCTGACCTTTATTCAGAATCATCCTCCAGCGCCCACATCACACAGCACCCTGGC
>Seq ID NO:22
CTGCCGGCTGGGCACGCGCCAAAAGCAGCCCTGGGCCCTGGGTATCGCGCTTGGGGGGAGGGTACCCCCGCCGGCTGGGCACGCGCCAAGAGCAGCCCTGGGCCCTGGGTATCGTGCTTAGGGGGAGGGTATCGGAGCGGGAAGTGGACCTGGGGAGCGCCGTCGGCTGAGGCTCTGGCTGATGCCGCCCTCCCCCGGATCCCCCAGGGACCGCGCTGAGCACCTCCGTGCTCCACCAGTCCATGGCCTCCTCCCCCAAGATGCCGAGGCGGTGAGTTGCGACCTGGATGTAGGCACTGCCCGCCCGAAGCGCGCGGAGGGGCCCTGGCCTTGATGACACCGCCCCCCTACCAGGGCCCTGGAGCAGGAGAAAGGGCGCCACCTCTACCTGGCCGGCCTTCCCGGCAGAAGCCGCCGAGCTAAGCCCTGGAGAGGTCGGCGCCTGGACTACATCACGTACCGCGGAGTTCCCGGGTGGCTGGGCCTGCGGCACTGG
>Seq ID NO:23
TGAGGAGATAAGGCTTCAGGCCAAAAGCAGATGGGTCACGGTGACCCGGCTGGCCCAGCCCTGGGAGCAGGCTCTGTACCCAGACCTTAGACCCTGGATGGGGCAGCCCTGCCCAGTGAGGCTGATAGGGGTGCCAGGGGCACAGAGCCACAATATGGTCGCTGAGGCTTTGGTGCCCCGTGCCCTGCATTCGAGCCCCCATCCGGCCATGCATCCTCCACCCTAATTTCCTGTTTTGTGAAGCAGGAAATGTAATTTCTCTCTTTTTTGGTTAAAACGTAAGAACACACATTGGGATGTATGGGAATCGGTGGACCTGCTGTTGGTTCTTACGTGGATGCT
>Seq ID NO:24
CGAGTCCTCGAGCTCGGGCGTCTTCGCGCCGCCGCCCCGCTCAGTGCGCCCAGGCACCGCGGCCGTGACGTCACGCCCGGGACTGGCCGTTGCAGCAAGACGGCCGCGTTCCGGTTCCGGTAGGTTGCCCGGGAGACGCGGGTACACAGAGAAGCGGCTCCCGTCGGAGGCCGAGTCGTCGCCACGATCGCCCCCTTGGTG
>Seq ID NO:25
AGCCGCGGCGGATTAGGCCGCCCGCCCCAACCTGGGCTTTGATCTTATCTGAGACTTGTGAGTCCAAAAGGGCTTAGCAACCGCAGCCATGGCAGCCCCAACGACGTGAACATCCGCACCTCTGAGCCTCCCCCTGAGAAGTACCTTCGAGGTGAGGCCTGCGCAGCCCCAGGAAGAGGGTGTGGGCGCAAACCTGAGGTGGGGAGCAAGGCCCGCCGGCTACACGGTTCCTGCCATCCTCGCTGCGCCCTTT
>Seq ID NO:26
TGCGCTCTGGTGGACGTTCCGTCTAGTTAGCCTAAGCATCATCCACATACTCTGGTGAACACTCGAGGACAAGGCCGCTTGCTATTATTAGTAAAGGGCCGAACCGTCCTGTCATTGGTGGAGGCAGTGCTTGACTGTGCATCGATCCAGGAATCCGATCTTTTCTCTCAACCACAGAGCTAACGTGCTCAGAAGTGGCCT
>Seq ID NO:27
GCCTGCCGTGGTCATAAGTCAGGGCCGAGTGGCGCTGGAGGACGGGAAGATGTTTGTCACCCCGGGGGCGGGCCGCTTCGTCCCTCGGAAAACATTCCCGGACTTTGTCTACAAGAGGATCAAAGCTCGCAACAGGGTAGGGCGGCACCCGCAAGGGTGTTGTGCAGGTAGGCAGGTGGGCGCTGAGTTCTAGGCCCAGAACGCACCCCTGGTCA
>Seq ID NO:28
GGGCGACCCCGGGGGCTGGGCCTCCCCTGGCTGGTGTCCACCCTCTCGGCCAGCACAGGGGTTCACCTTCAGGAGCCACTCAACGGCATCCTCCCCTGGAGCCCGTGCCGCCCTCACTGCCCCTGGGCAGGGCCCCGCAGCACCTCCTGCTGGGTGTAGGTGCTGTCTCGGCCCCACAGCCAGCAGTGGACATGCACCTGACCCCCAGGCAGCCAGCAGCACA
>Seq ID NO:29
TCGCGTCCTGCGGGGAGAGCCACCCTGCCCCGCGCTGCGCCCGGGACGGTTCCCTGGAACCACTCACCAGGCAGCATCATCGCGCCCAGCAGCCAGAGCCCGAGGCCGCGCATGGCCGGGTCGGGGAGCAGAGGCGGAGGTGACAGCCCCGCGGGACACGGTCTGGTTCCTGCGCTCCTGGCCCGAGGCTCTTTTccgcgcgccccgccccggcgcc
>Seq ID NO:30
TACCACTTTCCTAGAGACCATGGCCATGCTCCTAGAGGGTGAACCTGCATTCGCTGACCCCTCCATGCAAccccacttcactgatggggaaagaggatcccagaggggtaaggaacaagcccaaaataatagagcCTGCATTGGAACCGGGCTGAGCTAACACTTGGCTTACCGGCACTGTCACTGCCAGGGCCCGCGCGA
>Seq ID NO:31
CCTCCTCTAAGGCCCAGGGTCGGGGGAGGTGGGGAGGGAGCGGCCGACCGGCCGAATAGCGCTGCTTTCTTTGTTTTTCATGCAACATAATTCCATGGCCAGTCCAGGCGCTGCAGCCCCCTCCCCTGCCGGCCCCGGCGCCCGCGCAGGACCGCAGAGGGGCTGGGGGTCCAGGGCGCAGTCTAGTTCCAGGGCGCCCGC
>Seq ID NO:32
CGCGTGACCGTGCGCCAGCTCCCCGTGGGGCTCCTGCCAGGGTCGACCGGGAGGGGGTGCCACTCACCCAGATGAGCCACGCGGCTGAGGCGGGGGTCGAAACCGACCTCGCGCACCTTGTCAGTCCGCGCCAGGAAGAAGTTAACCACGCCGTCGGTGACCACGCAGCCTGGGAAGCCGACGAGCTCGTGGTGGAAGCCG
>Seq ID NO:33
CCATCCTCAGGCCTGGCGTTGGCTGCTCCTTGGCTTGTGTGCCCCTCCCTGCACCCCAATATGCCAGGATCTCCCCGCACCTCCTCATTCTACCATCACCTCACGGAGACATCCTGGTCACCCCGTGAGGCATTGCTCACGCCCTCCCCGGCACTCCACAGCCTTGAAGGGCACTGACCGCCAGTGCCTCCACCCACTGTG
>Seq ID NO:34
AGGGCTCCGGAAAACTGCGTTCTCACAAGACCAAAGGGAGGGGAGGGAGGGGGAGATGTGGCTGCAAGTGCAGTTGGAGAGGGTGTGAAGAGATCGGGAGTCCTCTGCGAGGCTCTGGAGCACCCGGCGCCTAAGAGGCTAGTGCGCCCCGTGCCGCTGCGGTAGGACCTGGCGGTCCGCAGCTCCTGAAGGGCCTGGCCG
>Seq ID NO:35
GTCACGGGTCTGGACGGGGTCGCAGGTCTGGACGGGGTCGCAGGTCTGGATGGGGTCGCACAGCTTTGGACCGGGTCGCGGGTCTGGACGGGGTCGCGGGTCTGGACGGGGTTGCACAGGTCTGGATGGGGTCGCACAGGTCTGGACGGGGTCGCGAAGGTCTGGACAGGGTCGTGGGTCTGGACAGGGTCGCAGGTCTGG
>Seq ID NO:36
TGCAAGCCCCTTTTCTAGAAGTTAGAGTTCTCCTGGGATCTTTGCCTCCCAAATTCTTGCTGGCGGCTCTGCTCTCCACCCCAGTGGGGCTGAACTAACAAGTTCCCCTTTTGCTTTTCTCACCAGAACCTGTGGTTTGCCAACCCCGGGGGCAGCAATAGCATGCCAAGCCGCACCCACAGCTCAGTCCAGAGGACCCGC
>Seq ID NO:37
AGTGCTGCACTGGGGCCCCGGGAAGCAGAAGACGGCTCCTGGCACATCTCCTGGGTGCATCTGTGGATTGCTGGGGCCCCCAGCAGCTCTCCCAATCCCCAGAAACCCCTCCTGGATCTGCTGTATCCACCTGGAGCCTCTTGGTGCACAGCGGCACACACAATACCTCCACTCTCCACCCCGAAGGATGCCCACTGCAGCGGGGTCCTCA
>Seq ID NO:38
TCCTGAAGCGCTGCTCGGAGCCGGAGCGCTACTGCCTGGCGCGGCTGATGGCTGACGCGCTGCGCGGCTGCGTGCCTGCCTTCCACGGCGTGGTGGAGCGCGACGGCGAAAGCTACCTGCAGCTGCAGGACCTGCTCGATGGCTTCGACGGACCTTGTGTGCTCGACTGCAAAATGGGCGTCAGGTATGCGTGCCCTGCCAGGTCGGTTGGGGGGATCAAGTAGGGGTCCGGGGCCGGGACAGCTGCTTGAGGGGGACCCGGGGCGAGTGCTCGAAGGGGTCTCCGTGTGCGCCCCCTCATGCCCTGGCCGCTGCCTGCGCCCCCACAGGACTTACCTAGAGGAGGAGCTGACCAAGGCCCGTGAGCGGCCCAAGCTGCGGAAGGACATGTACAAGAAAATGCTGGCGGTGGATCCTGAAGCTCCCACGGAGGAGGAGCACGCGCAGCGCGCCGTCACCAAGCCGCGCTACATGCAGTGGCGGGAAGGCATCAGCTCCA
>Seq ID NO:39
ACCTGAGGCTGGTGCGGGGGCGTCTCGGGGCTGGGGGCCACCCCTGGGGTGCAGACACCCGGCTTCTCAAGGCATCTTGGTCGGGGGTGGCAGAGGATGCACTGCTCACAGGAACCCAAATTCGAAAGACAGCCGCATCTACAATTTTAACACGGTGGCCTGGGTAGGGGGCCACCCACCCCGTCTCCTTGCCCGCCTGGCCGCCCTGCCCCTCACCCCACAGTGG
>Seq ID NO:40
CCTGCCCCAGCCCCTGCTTGCTGGGCCCACGGGGGTGGGGCGGCTCATTTTCCTGGAATGTGAAAGCAAACAGAGCCGCCACCGCAGCCAGCCCCACGGAGGCCTCTGGAGAGAAAACAAAACTGCTGGCCTAGGAGCGCCTGCCCCACGCTCTGGAGGAGAGCCCGGGGCAGGGGGACGCACAGGCAGAGCCCTCAGGGACAACCGCCCCAGGAGGCCAACGGCGACAGTTCATCCCACCTGGTGCTTCCTCCCACCCTGCCTGTGCGCCACGCTGGCCTCGAGCCAAAGGAATTCTCCCAGCAACCCGGGAAGGCGGCTGGGCCCGTCGGGGAGGCTTCTGGGTTTGAAAACAGGCTTTGCCCAAGTTCCCACAGCT
>Seq ID NO:41
TTTGGCTCTCTCCTGTCTTCGGGGTTTACAAAGTGTGTTGGGACTTGCGGGGCTGCTCTGTCCAAGCCTGGGTCTGGCGTCCGCGTCTCTGAGCCTGTGAGTGCGTGCGCTTTCCTGCGTCCTCTTGACTGCCGGTGCTGGGGCTCTGCGTCCTGCGTCCGCGGGAGTAAATACAGCAGGCGAAGGGGAAGCTCACACAATGGTCTCCAGCGCTCTGGGGCAGGGCTTCTGAGGGGCGGGCCTGCCTCT
>Seq ID NO:42
ATTGTGTTCCTCAAAAGTCTCTCTTTAGAAAAGAGAATTGCCTGACAGCTGAGCTTTTCCATCTCCCATGTTACCGGGGTCCCTTTTTGGTGGCTCAGGAAGACTGGCTGAGGACACTTTTCTGCAGGCGGGCACCCCCATCACCCCACAGCCACTGGAAGGATTGCTGAGAAGAGAAGCAAACGCCTACAGCACAGTCGC
>Seq ID NO:43
CCACACGGAACGATGGCTTATCACTGGAGAAAACCAGCCAGTGAAAGGGTCGCGGGAGAAGCCCGGGGACGACCCTGGGACTGGAGGGTTTCTCGCCTCTGGAAAAGGCAGTGCCCGCGGGGCAGGCCAGAGGGAGCGCTCCGAGGAGCTTTGGGGTTGCCAGCCTTGACACGCGCACCCCTCCGCCCGGGCCGGCTCCCCTCCGCCCTCAGACTCCCACCATCCTCCTACTATTCCACATGTCGGGTGTATATGGTGCGGAGAGCCCGGGGGAAGTTAGAACACGCGGCGGGAGAGGCAGGCCCAGGGCGGCCTCAGCTAAGCAGCCCGGCTTTCCGGATCCCCGCCGCGCACAGGC
>Seq ID NO:44
TAACTTACAGAGTGTGTCTGTGTCTTCTTGAGGAAGTGGCCTGTCTGGGTCCCCCTCCCAGTCTGAGCGTCATTGCAGTGGAATATCTCCCCTTCTCACCAATCATAACACGTCACTGTGGCAGCAGCGGATAGCTGGAAACCACCTGCCAGTGCCCAGCATGTAGGGCGTGCCCCTAGAGCGGGAGCTGCCACCTGCTTC
>Seq ID NO:45
GGCTGTGCGGGCACAGCTGTTACAGGCAGGGGGCAGGGGCCTCGTGGAGCTTGTGTAGACGGAGGGGCGGCGGGCCGTGTAGTGCAGGCTGCGAAGACTCACCGCGGTGAAGTGCGGCCAGGTGCGCAGCAGGTCGAAGAGCGCGTCGCCGGGGCAGTCGGTGCGCACCAGCTGGCGGTGGCCCAGCAGCGCGTAGTCTGGCCGCAGGAGGCCGGCGCGCACCGCACAACTCGGGAGCGTGTCGCGCACCGTGCGCAGAGCGGCCTCGGTGGGCAGCGCCGCGGTGTAGTTGCCCACTATGGCCACGCCGAAGCCCCGGGAGTTGTGGCCGAGCGTGTGGGCGCCCACCCAGTGCCAGCCGCGTCCCTCGTACACGTAG
>Seq ID NO:46
CAGCAGGGCAAGCTGAGCACACACGTGTGCAGAGCCAGGGCAGGAACACCGGAAGGTGGCGGGCAGAGTCCAGCCCCAGGACTTCCAGGTGAGAGAGCCCGCCGTGCCAGCATCAGGAGACAGCAGTCAGGAGCTCACAGAGCGGGGCCTCCACCGGGTACAGCGCTAGCACAGAGTTGGTGCTCAGTAGGCAGGGACTAAAGCCCCCACCCACCACTGCTCCCAGCAGAGCTTGGTCCTCAGACCTGGAGATGTCCTGAGGCCA
>Seq ID NO:47
GTGGCGTCCAGGGCAGGGCAGGTGCGTCATCCGGGCGGGATGCAGAGACACGTCCTTCCACCAACCATCTGAGGAGCACTTGGCACCCACACAATGAGCCCGGCAAGGGCCACGCCAGGAGGCAGCGCACGGGGCAGAGCCTCTGAGCCAGAGAGGGGGAGGTCCCTTGGGAGGCCCCTGCCATCCCCCGCTCTGGGTGGGCCTCTCCAGCCAGACTCTGCGCCCCAA
>Seq ID NO:48
GTTGGAGGAGGGAAGGCTGTTCACTGAGAGAGCAGACCCAGGAGCCCCAGTGGCAGAAGGGGCCCGGCAGGGAGTGCTGGGCAGGGAGCGCCCATGTGCCCACCCGAGTGCCAGTGCCAGCCAGCTGCTGCCCGGAGAGCCCCGGCCCTCTGTAGCTATCTGGCCTCTGCTCATGGCTGTTGCTCAGAGAGAATCTGACCAGCACTGACTTCACCTCCGCCCACCCCCTGAGGCGGCAGCTGGACCTCAGCGTTGCTTCAGGAAGAAGTCCTCAGCCAATAGTGTCC
Claims (14)
1. use of a reagent or component in the preparation of a kit or device for (1) distinguishing between a lung cancer patient and a non-lung cancer patient, (2) diagnosing or aiding in the diagnosis of lung cancer; or (3) tissue traceability to lung cancer during a pan-cancer screening procedure, wherein the reagent or module comprises a reagent or module that detects the methylation level of a lung cancer tissue-specific methylation marker in the genomic DNA of the sample, said methylation marker being the region or locus thereof that is the gene that is 2.2kb upstream and 2.2kb downstream in the chromosome in which it is located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated; preferably, the length of the site is 120bp-500bp, preferably 200bp-480bp.
2. The use of claim 1, wherein the non-lung cancer or pan-cancer comprises colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer, and/or breast cancer.
3. The use of claim 1 or 2, wherein the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS: 1-48.
4. The use according to any one of claims 1-3, wherein the reagent or module comprises a reagent or module for use in a method of detecting methylation by one or more of: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescence quantification, methylation-sensitive high resolution melting curve, and chip-based methylation profile analysis and mass spectrometry.
5. Use according to any one of claims 1-4, wherein the reagent or component comprises primers and/or probes for detecting methylation markers and/or the sample is a cell, a tissue, a fine needle biopsy and/or plasma, preferably the sample genomic DNA is free DNA in plasma.
6. A method of constructing a predictive model for distinguishing lung cancer from other non-lung cancer cancers, comprising:
(1) Obtaining methylation levels of methylation markers in genomic DNA of lung cancer samples and non-lung cancer samples as a training set; the methylation marker is selected from the following regions or the sites of the regions, the regions being the following genes and the 2.2kb upstream region and the 2.2kb downstream region of the chromosome in which the genes are located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated; preferably, the length of the locus is 120bp-500bp, preferably 200bp-480bp; preferably, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer and/or breast cancer; and
(2) A logistic regression machine learning model was constructed using methylation level data of methylation markers.
7. The method of claim 6, wherein the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS 1-48;
preferably, wherein the sample is a cell, tissue, fine needle biopsy or plasma, preferably the genomic DNA is free DNA in plasma.
8. The method of claim 6 or 7, wherein step (1) comprises obtaining methylation sequencing data of the sample DNA.
9. The method according to any one of claims 6-8, wherein step (2) comprises building a logistic regression model to obtain model predictive scores; and training using the methylation level of the obtained methylation marker as a training set, and determining a correlation threshold of the model according to a sample of the training set.
10. A predictive model of lung cancer constructed according to the method of any one of claims 6-9.
11. An apparatus for diagnosing lung cancer comprising a memory and a processor that processes instructions stored by the memory, the instructions performing the method of any one of claims 6-9 to construct a predictive model of lung cancer; and the methylation level of the methylation marker in the genome DNA of the sample to be detected is used as a test set to obtain a model predictive value, whether the sample is lung cancer is judged according to a threshold value by using the predictive value, and lung cancer is predicted to be larger than the threshold value, otherwise, other cancer species are predicted to be.
12. A kit or device for detecting lung cancer tissue specific methylation markers, comprising reagents or components for detecting the status and/or level of one or more lung cancer tissue specific methylation markers in genomic DNA from a sample, said lung cancer tissue specific methylation markers being the following regions or sites thereof, said regions being the following genes and the 2.2kb upstream region and the 2.2kb downstream region of the chromosome in which they are located: gene ARHGEF16; located in gene CASZ1; gene MAP3K6; gene TRIM58; gene ARHGEF33; gene PSD4; gene HOXD4; gene SLC12A8; gene DGKG; a gene TERT; gene NR2F1; the gene PCDHGC5; gene KCNMB1; gene FOXC1; gene HIST1H4F; gene TYW; the gene LRRC4; gene DGKI; gene PDLIM2; gene RHOBTB2; gene TMEM75; the gene OPLAH; gene NR5A1; gene SPAG6; gene WAPAL; gene BTBD16; gene DPYSL4; gene TTC40; gene ADAM8; gene SLC22a11; gene CPT1A; gene B4GALNT1; the gene FBRSL1; gene XPO4; gene TFDP1; gene GCH1; gene TMEM179; the gene ITPKA; gene SOX8; gene SLC9A3R2; gene SEPT-9; gene MBP; gene NFATC1; gene DNM2; gene RASAL3; gene TAF4; gene NTSR1; gene SLC17A9; or a complementary sequence or variant of either gene, provided that the methylation site in the variant is not mutated; preferably, the length of the locus is 120bp-500bp, preferably 200bp-480bp;
Preferably, wherein the methylation marker comprises a nucleotide sequence set forth in any one or more of the following, or a complement or variant sequence thereof: SEQ ID NOS: 1-48.
13. The kit or device of claim 12, wherein the sample is a cell, tissue, fine needle biopsy or plasma, preferably wherein the nucleic acid is free DNA in plasma.
14. The kit or device of claim 12 or 13, wherein the reagents or components comprise reagents or components for use in one or more of the following methods: bisulfite conversion-based PCR, DNA sequencing, methylation-sensitive restriction enzyme analysis, fluorescence quantification, methylation-sensitive high resolution melting curve, and chip-based methylation profile analysis and mass spectrometry;
preferably, the reagent comprises an oligonucleotide for detecting a methylation marker, preferably the oligonucleotide is a primer and/or a probe;
preferably, the primer is a primer for detecting the methylation level/state of a site using methylation sequencing or a PCR primer for amplifying one or more methylation sites;
preferably, the reagent comprises bisulfite and derivatives thereof, PCR buffers, polymerase, dntps, primers, probes, methylation sensitive or insensitive restriction enzymes, cleavage buffers, fluorescent dyes, fluorescence quenchers, fluorescence reporters, exonucleases, alkaline phosphatase, internal standards, and/or controls that are the aforementioned specific methylation markers from normal subjects or cancer patients other than lung cancer; preferably, the non-lung cancer is colorectal cancer, liver cancer, gastric cancer, esophageal cancer, pancreatic cancer and/or breast cancer.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210787412.9A CN117385027A (en) | 2022-07-04 | 2022-07-04 | Lung cancer specific methylation marker and application thereof in diagnosis of lung cancer |
TW112124613A TW202403054A (en) | 2022-07-04 | 2023-06-30 | Cancer-specific methylation markers and their use |
PCT/CN2023/105537 WO2024008040A1 (en) | 2022-07-04 | 2023-07-03 | Cancer-specific methylation marker and use thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210787412.9A CN117385027A (en) | 2022-07-04 | 2022-07-04 | Lung cancer specific methylation marker and application thereof in diagnosis of lung cancer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117385027A true CN117385027A (en) | 2024-01-12 |
Family
ID=89435047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210787412.9A Pending CN117385027A (en) | 2022-07-04 | 2022-07-04 | Lung cancer specific methylation marker and application thereof in diagnosis of lung cancer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117385027A (en) |
-
2022
- 2022-07-04 CN CN202210787412.9A patent/CN117385027A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2017369018B2 (en) | Analysis of cell-free DNA in urine and other samples | |
US20150176079A1 (en) | Markers for cancer | |
EP3034624A1 (en) | Method for the prognosis of hepatocellular carcinoma | |
EP3249051B1 (en) | Use of methylation sites in y chromosome as prostate cancer diagnosis marker | |
US11866786B2 (en) | Kits and methods for diagnosing lung cancer | |
CN111051536A (en) | Improved cancer screening using cell-free viral nucleic acids | |
WO2020034888A1 (en) | Dna methylation-related marker for diagnosing tumor, and application thereof | |
CN115341031A (en) | Screening method of pan-cancer methylation biomarker, biomarker and application | |
CN116804218A (en) | Methylation marker for detecting benign and malignant lung nodules and application thereof | |
IL280297B (en) | Non-invasive cancer detection based on dna methylation changes | |
US20180223367A1 (en) | Assays, methods and compositions for diagnosing cancer | |
WO2022262831A1 (en) | Substance and method for tumor assessment | |
WO2017046714A1 (en) | Methylation signature in squamous cell carcinoma of head and neck (hnscc) and applications thereof | |
CN117385027A (en) | Lung cancer specific methylation marker and application thereof in diagnosis of lung cancer | |
CN115491411A (en) | Methylation marker for identifying pancreatitis and pancreatic cancer and application thereof | |
CN117385028A (en) | Colorectal cancer specific methylation marker and application thereof | |
CN117385026A (en) | Breast cancer specific methylation marker and application thereof in diagnosis of breast cancer | |
WO2024008040A1 (en) | Cancer-specific methylation marker and use thereof | |
CN117344012A (en) | Stomach cancer and/or esophageal cancer specific methylation marker and application thereof | |
CN117363728A (en) | Liver cancer tissue specific methylation marker and application thereof in diagnosis of liver cancer | |
CN118127150A (en) | Pancreatic cancer specific methylation marker and application thereof in diagnosis of pancreatic cancer | |
WO2022188776A1 (en) | Gene methylation marker or combination thereof that can be used for gastric carcinoma her2 companion diagnostics, and use thereof | |
CN115772566B (en) | Methylation biomarker for auxiliary detection of lung cancer somatic ERBB2 gene mutation and application thereof | |
TW202330938A (en) | Substance and method for evaluating tumor | |
WO2022126938A1 (en) | Method for detecting polynucleotide variations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40101981 Country of ref document: HK |