CN115982644A - Esophageal squamous cell carcinoma classification model construction and data processing method - Google Patents
Esophageal squamous cell carcinoma classification model construction and data processing method Download PDFInfo
- Publication number
- CN115982644A CN115982644A CN202310063027.4A CN202310063027A CN115982644A CN 115982644 A CN115982644 A CN 115982644A CN 202310063027 A CN202310063027 A CN 202310063027A CN 115982644 A CN115982644 A CN 115982644A
- Authority
- CN
- China
- Prior art keywords
- ddr
- gene
- data
- sample
- gene expression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 208000036765 Squamous cell carcinoma of the esophagus Diseases 0.000 title claims abstract description 80
- 208000007276 esophageal squamous cell carcinoma Diseases 0.000 title claims abstract description 80
- 206010061534 Oesophageal squamous cell carcinoma Diseases 0.000 title claims abstract description 68
- 238000013145 classification model Methods 0.000 title claims abstract description 36
- 238000010276 construction Methods 0.000 title claims description 8
- 238000003672 processing method Methods 0.000 title description 3
- 230000014509 gene expression Effects 0.000 claims abstract description 72
- 238000000034 method Methods 0.000 claims abstract description 54
- 230000004083 survival effect Effects 0.000 claims abstract description 52
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 39
- 230000037361 pathway Effects 0.000 claims abstract description 28
- 238000012163 sequencing technique Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000007621 cluster analysis Methods 0.000 claims abstract description 13
- 108091006146 Channels Proteins 0.000 claims abstract description 7
- 230000011559 double-strand break repair via nonhomologous end joining Effects 0.000 claims abstract 3
- 101150072950 BRCA1 gene Proteins 0.000 claims description 27
- 206010028980 Neoplasm Diseases 0.000 claims description 24
- 238000003559 RNA-seq method Methods 0.000 claims description 15
- 238000004393 prognosis Methods 0.000 claims description 14
- 230000001394 metastastic effect Effects 0.000 claims description 11
- 206010061289 metastatic neoplasm Diseases 0.000 claims description 11
- 108700040618 BRCA1 Genes Proteins 0.000 claims description 10
- 101150027186 Hfm1 gene Proteins 0.000 claims description 10
- 238000000491 multivariate analysis Methods 0.000 claims description 4
- 238000000611 regression analysis Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000012315 univariate regression analysis Methods 0.000 claims description 3
- 230000005971 DNA damage repair Effects 0.000 description 48
- 210000004027 cell Anatomy 0.000 description 24
- 108700020463 BRCA1 Proteins 0.000 description 17
- 102100025401 Breast cancer type 1 susceptibility protein Human genes 0.000 description 17
- 101000843497 Homo sapiens Probable ATP-dependent DNA helicase HFM1 Proteins 0.000 description 16
- 102100030730 Probable ATP-dependent DNA helicase HFM1 Human genes 0.000 description 16
- 102100034533 Histone H2AX Human genes 0.000 description 13
- 101001067891 Homo sapiens Histone H2AX Proteins 0.000 description 13
- 238000002474 experimental method Methods 0.000 description 12
- 230000006801 homologous recombination Effects 0.000 description 10
- 238000002744 homologous recombination Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 9
- 210000001519 tissue Anatomy 0.000 description 9
- 230000033607 mismatch repair Effects 0.000 description 7
- 230000006780 non-homologous end joining Effects 0.000 description 7
- 230000005778 DNA damage Effects 0.000 description 5
- 231100000277 DNA damage Toxicity 0.000 description 5
- 108020004459 Small interfering RNA Proteins 0.000 description 5
- 238000001262 western blot Methods 0.000 description 5
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 4
- 102000015335 Ku Autoantigen Human genes 0.000 description 4
- 108010025026 Ku Autoantigen Proteins 0.000 description 4
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 4
- 238000000692 Student's t-test Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000011002 quantification Methods 0.000 description 4
- 238000012353 t test Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 3
- 230000033590 base-excision repair Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000002271 resection Methods 0.000 description 3
- 102100040051 Aprataxin and PNK-like factor Human genes 0.000 description 2
- 102100028907 Cullin-4A Human genes 0.000 description 2
- 108050006400 Cyclin Proteins 0.000 description 2
- 102000012698 DDB1 Human genes 0.000 description 2
- 108020004414 DNA Proteins 0.000 description 2
- 102100021122 DNA damage-binding protein 2 Human genes 0.000 description 2
- 102100029995 DNA ligase 1 Human genes 0.000 description 2
- 102100033195 DNA ligase 4 Human genes 0.000 description 2
- 108010032250 DNA polymerase beta2 Proteins 0.000 description 2
- 102100024829 DNA polymerase delta catalytic subunit Human genes 0.000 description 2
- 102100024823 DNA polymerase delta subunit 2 Human genes 0.000 description 2
- 102100020782 DNA polymerase delta subunit 3 Human genes 0.000 description 2
- 102100029765 DNA polymerase lambda Human genes 0.000 description 2
- 102100039116 DNA repair protein RAD50 Human genes 0.000 description 2
- 102100022204 DNA-dependent protein kinase catalytic subunit Human genes 0.000 description 2
- 102100027700 DNA-directed RNA polymerase I subunit RPA2 Human genes 0.000 description 2
- 101100170004 Dictyostelium discoideum repE gene Proteins 0.000 description 2
- 101100170005 Drosophila melanogaster pic gene Proteins 0.000 description 2
- 102100023877 E3 ubiquitin-protein ligase RBX1 Human genes 0.000 description 2
- 101710095156 E3 ubiquitin-protein ligase RBX1 Proteins 0.000 description 2
- 101000890463 Homo sapiens Aprataxin and PNK-like factor Proteins 0.000 description 2
- 101000916245 Homo sapiens Cullin-4A Proteins 0.000 description 2
- 101001041466 Homo sapiens DNA damage-binding protein 2 Proteins 0.000 description 2
- 101000863770 Homo sapiens DNA ligase 1 Proteins 0.000 description 2
- 101000927810 Homo sapiens DNA ligase 4 Proteins 0.000 description 2
- 101000909198 Homo sapiens DNA polymerase delta catalytic subunit Proteins 0.000 description 2
- 101000909189 Homo sapiens DNA polymerase delta subunit 2 Proteins 0.000 description 2
- 101000932004 Homo sapiens DNA polymerase delta subunit 3 Proteins 0.000 description 2
- 101000932009 Homo sapiens DNA polymerase delta subunit 4 Proteins 0.000 description 2
- 101000743929 Homo sapiens DNA repair protein RAD50 Proteins 0.000 description 2
- 101000619536 Homo sapiens DNA-dependent protein kinase catalytic subunit Proteins 0.000 description 2
- 101000650600 Homo sapiens DNA-directed RNA polymerase I subunit RPA2 Proteins 0.000 description 2
- 101000619640 Homo sapiens Leucine-rich repeats and immunoglobulin-like domains protein 1 Proteins 0.000 description 2
- 101000578059 Homo sapiens Non-homologous end-joining factor 1 Proteins 0.000 description 2
- 101000720958 Homo sapiens Protein artemis Proteins 0.000 description 2
- 101001096365 Homo sapiens Replication factor C subunit 2 Proteins 0.000 description 2
- 101001096355 Homo sapiens Replication factor C subunit 3 Proteins 0.000 description 2
- 101000582404 Homo sapiens Replication factor C subunit 4 Proteins 0.000 description 2
- 101000582412 Homo sapiens Replication factor C subunit 5 Proteins 0.000 description 2
- 101000709305 Homo sapiens Replication protein A 14 kDa subunit Proteins 0.000 description 2
- 101000709341 Homo sapiens Replication protein A 30 kDa subunit Proteins 0.000 description 2
- 101001092206 Homo sapiens Replication protein A 32 kDa subunit Proteins 0.000 description 2
- 102000046961 MRE11 Homologue Human genes 0.000 description 2
- 108700019589 MRE11 Homologue Proteins 0.000 description 2
- 102100028156 Non-homologous end-joining factor 1 Human genes 0.000 description 2
- 102100024168 Polymerase delta-interacting protein 2 Human genes 0.000 description 2
- 102100036691 Proliferating cell nuclear antigen Human genes 0.000 description 2
- 102100025918 Protein artemis Human genes 0.000 description 2
- -1 RFC1 Proteins 0.000 description 2
- 101710178916 RING-box protein 1 Proteins 0.000 description 2
- 102100037851 Replication factor C subunit 2 Human genes 0.000 description 2
- 102100037855 Replication factor C subunit 3 Human genes 0.000 description 2
- 102100030542 Replication factor C subunit 4 Human genes 0.000 description 2
- 102100030541 Replication factor C subunit 5 Human genes 0.000 description 2
- 102100034372 Replication protein A 14 kDa subunit Human genes 0.000 description 2
- 102100034373 Replication protein A 30 kDa subunit Human genes 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 201000011510 cancer Diseases 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000002512 chemotherapy Methods 0.000 description 2
- DQLATGHUWYMOKM-UHFFFAOYSA-L cisplatin Chemical compound N[Pt](N)(Cl)Cl DQLATGHUWYMOKM-UHFFFAOYSA-L 0.000 description 2
- 229960004316 cisplatin Drugs 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 101150077768 ddb1 gene Proteins 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000009169 immunotherapy Methods 0.000 description 2
- 210000004185 liver Anatomy 0.000 description 2
- 208000014018 liver neoplasm Diseases 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 101150071637 mre11 gene Proteins 0.000 description 2
- 230000020520 nucleotide-excision repair Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000003068 pathway analysis Methods 0.000 description 2
- 239000000092 prognostic biomarker Substances 0.000 description 2
- 238000001959 radiotherapy Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000028617 response to DNA damage stimulus Effects 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- KIAPWMKFHIKQOZ-UHFFFAOYSA-N 2-[[(4-fluorophenyl)-oxomethyl]amino]benzoic acid methyl ester Chemical compound COC(=O)C1=CC=CC=C1NC(=O)C1=CC=C(F)C=C1 KIAPWMKFHIKQOZ-UHFFFAOYSA-N 0.000 description 1
- 102000000872 ATM Human genes 0.000 description 1
- 102100024044 Aprataxin Human genes 0.000 description 1
- 101100339431 Arabidopsis thaliana HMGB2 gene Proteins 0.000 description 1
- 108010004586 Ataxia Telangiectasia Mutated Proteins Proteins 0.000 description 1
- 102000010595 BABAM2 Human genes 0.000 description 1
- 101700002522 BARD1 Proteins 0.000 description 1
- 102100024641 BRCA1-A complex subunit Abraxas 1 Human genes 0.000 description 1
- 102100028048 BRCA1-associated RING domain protein 1 Human genes 0.000 description 1
- 108700020462 BRCA2 Proteins 0.000 description 1
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- 102100035631 Bloom syndrome protein Human genes 0.000 description 1
- 108091009167 Bloom syndrome protein Proteins 0.000 description 1
- 101150008921 Brca2 gene Proteins 0.000 description 1
- 102100025399 Breast cancer type 2 susceptibility protein Human genes 0.000 description 1
- 102100030933 CDK-activating kinase assembly factor MAT1 Human genes 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 102100037631 Centrin-2 Human genes 0.000 description 1
- 102100038117 Centromere protein S Human genes 0.000 description 1
- 102100033674 Centromere protein X Human genes 0.000 description 1
- 206010053138 Congenital aplastic anaemia Diseases 0.000 description 1
- 102100028908 Cullin-3 Human genes 0.000 description 1
- 102100025525 Cullin-5 Human genes 0.000 description 1
- 102100026810 Cyclin-dependent kinase 7 Human genes 0.000 description 1
- 102100035186 DNA excision repair protein ERCC-1 Human genes 0.000 description 1
- 108010035476 DNA excision repair protein ERCC-5 Proteins 0.000 description 1
- 102100031866 DNA excision repair protein ERCC-5 Human genes 0.000 description 1
- 102100031867 DNA excision repair protein ERCC-6 Human genes 0.000 description 1
- 102100031868 DNA excision repair protein ERCC-8 Human genes 0.000 description 1
- 108090000133 DNA helicases Proteins 0.000 description 1
- 102000003844 DNA helicases Human genes 0.000 description 1
- 102100033688 DNA ligase 3 Human genes 0.000 description 1
- 102100028849 DNA mismatch repair protein Mlh3 Human genes 0.000 description 1
- 102100034157 DNA mismatch repair protein Msh2 Human genes 0.000 description 1
- 102100037700 DNA mismatch repair protein Msh3 Human genes 0.000 description 1
- 102100021147 DNA mismatch repair protein Msh6 Human genes 0.000 description 1
- 102100033215 DNA nucleotidylexotransferase Human genes 0.000 description 1
- 102100029910 DNA polymerase epsilon subunit 2 Human genes 0.000 description 1
- 102100029905 DNA polymerase epsilon subunit 3 Human genes 0.000 description 1
- 102100036948 DNA polymerase epsilon subunit 4 Human genes 0.000 description 1
- 102100029094 DNA repair endonuclease XPF Human genes 0.000 description 1
- 102100034484 DNA repair protein RAD51 homolog 3 Human genes 0.000 description 1
- 102100027830 DNA repair protein XRCC2 Human genes 0.000 description 1
- 102100027829 DNA repair protein XRCC3 Human genes 0.000 description 1
- 102100027828 DNA repair protein XRCC4 Human genes 0.000 description 1
- 102100022474 DNA repair protein complementing XP-A cells Human genes 0.000 description 1
- 102100022477 DNA repair protein complementing XP-C cells Human genes 0.000 description 1
- 102100033072 DNA replication ATP-dependent helicase DNA2 Human genes 0.000 description 1
- 102100040401 DNA topoisomerase 3-alpha Human genes 0.000 description 1
- 102100040398 DNA topoisomerase 3-beta-1 Human genes 0.000 description 1
- 102100021429 DNA-directed RNA polymerase II subunit RPB1 Human genes 0.000 description 1
- 102100039302 DNA-directed RNA polymerase II subunit RPB11-a Human genes 0.000 description 1
- 102100039303 DNA-directed RNA polymerase II subunit RPB2 Human genes 0.000 description 1
- 102100039301 DNA-directed RNA polymerase II subunit RPB3 Human genes 0.000 description 1
- 102100032260 DNA-directed RNA polymerase II subunit RPB4 Human genes 0.000 description 1
- 102100031137 DNA-directed RNA polymerase II subunit RPB7 Human genes 0.000 description 1
- 102100028495 DNA-directed RNA polymerase II subunit RPB9 Human genes 0.000 description 1
- 102100032254 DNA-directed RNA polymerases I, II, and III subunit RPABC1 Human genes 0.000 description 1
- 102100023348 DNA-directed RNA polymerases I, II, and III subunit RPABC2 Human genes 0.000 description 1
- 102100023349 DNA-directed RNA polymerases I, II, and III subunit RPABC3 Human genes 0.000 description 1
- 102100028473 DNA-directed RNA polymerases I, II, and III subunit RPABC4 Human genes 0.000 description 1
- 102100028472 DNA-directed RNA polymerases I, II, and III subunit RPABC5 Human genes 0.000 description 1
- 101100226017 Dictyostelium discoideum repD gene Proteins 0.000 description 1
- 101150105460 ERCC2 gene Proteins 0.000 description 1
- 102100030208 Elongin-A Human genes 0.000 description 1
- 102100030209 Elongin-B Human genes 0.000 description 1
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 1
- 102100029075 Exonuclease 1 Human genes 0.000 description 1
- 102000009095 Fanconi Anemia Complementation Group A protein Human genes 0.000 description 1
- 108010087740 Fanconi Anemia Complementation Group A protein Proteins 0.000 description 1
- 102000018825 Fanconi Anemia Complementation Group C protein Human genes 0.000 description 1
- 108010027673 Fanconi Anemia Complementation Group C protein Proteins 0.000 description 1
- 102000013601 Fanconi Anemia Complementation Group D2 protein Human genes 0.000 description 1
- 108010026653 Fanconi Anemia Complementation Group D2 protein Proteins 0.000 description 1
- 102000010634 Fanconi Anemia Complementation Group E protein Human genes 0.000 description 1
- 108010077898 Fanconi Anemia Complementation Group E protein Proteins 0.000 description 1
- 102000012216 Fanconi Anemia Complementation Group F protein Human genes 0.000 description 1
- 108010022012 Fanconi Anemia Complementation Group F protein Proteins 0.000 description 1
- 102000007122 Fanconi Anemia Complementation Group G protein Human genes 0.000 description 1
- 108010033305 Fanconi Anemia Complementation Group G protein Proteins 0.000 description 1
- 102000052930 Fanconi Anemia Complementation Group L protein Human genes 0.000 description 1
- 108700026162 Fanconi Anemia Complementation Group L protein Proteins 0.000 description 1
- 108010067741 Fanconi Anemia Complementation Group N protein Proteins 0.000 description 1
- 201000004939 Fanconi anemia Diseases 0.000 description 1
- 102100029347 Fanconi anemia core complex-associated protein 100 Human genes 0.000 description 1
- 102100022352 Fanconi anemia core complex-associated protein 24 Human genes 0.000 description 1
- 102100027285 Fanconi anemia group B protein Human genes 0.000 description 1
- 102100034554 Fanconi anemia group I protein Human genes 0.000 description 1
- 102100034553 Fanconi anemia group J protein Human genes 0.000 description 1
- 102100034552 Fanconi anemia group M protein Human genes 0.000 description 1
- 102100036089 Fascin Human genes 0.000 description 1
- 102000054184 GADD45 Human genes 0.000 description 1
- 102100031885 General transcription and DNA repair factor IIH helicase subunit XPB Human genes 0.000 description 1
- 102100035184 General transcription and DNA repair factor IIH helicase subunit XPD Human genes 0.000 description 1
- 102100038308 General transcription factor IIH subunit 1 Human genes 0.000 description 1
- 102100032864 General transcription factor IIH subunit 2 Human genes 0.000 description 1
- 102100032863 General transcription factor IIH subunit 3 Human genes 0.000 description 1
- 102100032862 General transcription factor IIH subunit 4 Human genes 0.000 description 1
- 102100032865 General transcription factor IIH subunit 5 Human genes 0.000 description 1
- 208000031448 Genomic Instability Diseases 0.000 description 1
- 102100031150 Growth arrest and DNA damage-inducible protein GADD45 alpha Human genes 0.000 description 1
- 108700010013 HMGB1 Proteins 0.000 description 1
- 101150021904 HMGB1 gene Proteins 0.000 description 1
- 102100022536 Helicase POLQ-like Human genes 0.000 description 1
- 102100037907 High mobility group protein B1 Human genes 0.000 description 1
- 102100022893 Histone acetyltransferase KAT5 Human genes 0.000 description 1
- 101000757586 Homo sapiens Aprataxin Proteins 0.000 description 1
- 101000785776 Homo sapiens Artemin Proteins 0.000 description 1
- 101000760704 Homo sapiens BRCA1-A complex subunit Abraxas 1 Proteins 0.000 description 1
- 101000874539 Homo sapiens BRISC and BRCA1-A complex member 2 Proteins 0.000 description 1
- 101000583935 Homo sapiens CDK-activating kinase assembly factor MAT1 Proteins 0.000 description 1
- 101000880516 Homo sapiens Centrin-2 Proteins 0.000 description 1
- 101000884588 Homo sapiens Centromere protein S Proteins 0.000 description 1
- 101000944476 Homo sapiens Centromere protein X Proteins 0.000 description 1
- 101000851684 Homo sapiens Chimeric ERCC6-PGBD3 protein Proteins 0.000 description 1
- 101000765038 Homo sapiens Class E basic helix-loop-helix protein 40 Proteins 0.000 description 1
- 101000916238 Homo sapiens Cullin-3 Proteins 0.000 description 1
- 101000856414 Homo sapiens Cullin-5 Proteins 0.000 description 1
- 101000911952 Homo sapiens Cyclin-dependent kinase 7 Proteins 0.000 description 1
- 101000876529 Homo sapiens DNA excision repair protein ERCC-1 Proteins 0.000 description 1
- 101000920783 Homo sapiens DNA excision repair protein ERCC-6 Proteins 0.000 description 1
- 101000920778 Homo sapiens DNA excision repair protein ERCC-8 Proteins 0.000 description 1
- 101000927847 Homo sapiens DNA ligase 3 Proteins 0.000 description 1
- 101000577867 Homo sapiens DNA mismatch repair protein Mlh3 Proteins 0.000 description 1
- 101001134036 Homo sapiens DNA mismatch repair protein Msh2 Proteins 0.000 description 1
- 101001027762 Homo sapiens DNA mismatch repair protein Msh3 Proteins 0.000 description 1
- 101000968658 Homo sapiens DNA mismatch repair protein Msh6 Proteins 0.000 description 1
- 101000800646 Homo sapiens DNA nucleotidylexotransferase Proteins 0.000 description 1
- 101000864190 Homo sapiens DNA polymerase epsilon subunit 2 Proteins 0.000 description 1
- 101000864175 Homo sapiens DNA polymerase epsilon subunit 3 Proteins 0.000 description 1
- 101000804960 Homo sapiens DNA polymerase epsilon subunit 4 Proteins 0.000 description 1
- 101001094659 Homo sapiens DNA polymerase kappa Proteins 0.000 description 1
- 101001132271 Homo sapiens DNA repair protein RAD51 homolog 3 Proteins 0.000 description 1
- 101000649306 Homo sapiens DNA repair protein XRCC2 Proteins 0.000 description 1
- 101000649315 Homo sapiens DNA repair protein XRCC4 Proteins 0.000 description 1
- 101000618531 Homo sapiens DNA repair protein complementing XP-A cells Proteins 0.000 description 1
- 101000618535 Homo sapiens DNA repair protein complementing XP-C cells Proteins 0.000 description 1
- 101000927313 Homo sapiens DNA replication ATP-dependent helicase DNA2 Proteins 0.000 description 1
- 101000611068 Homo sapiens DNA topoisomerase 3-alpha Proteins 0.000 description 1
- 101000611076 Homo sapiens DNA topoisomerase 3-beta-1 Proteins 0.000 description 1
- 101000729474 Homo sapiens DNA-directed RNA polymerase I subunit RPA1 Proteins 0.000 description 1
- 101001106401 Homo sapiens DNA-directed RNA polymerase II subunit RPB1 Proteins 0.000 description 1
- 101000669827 Homo sapiens DNA-directed RNA polymerase II subunit RPB11-a Proteins 0.000 description 1
- 101000669831 Homo sapiens DNA-directed RNA polymerase II subunit RPB2 Proteins 0.000 description 1
- 101000669859 Homo sapiens DNA-directed RNA polymerase II subunit RPB3 Proteins 0.000 description 1
- 101001088177 Homo sapiens DNA-directed RNA polymerase II subunit RPB4 Proteins 0.000 description 1
- 101000729332 Homo sapiens DNA-directed RNA polymerase II subunit RPB7 Proteins 0.000 description 1
- 101000723873 Homo sapiens DNA-directed RNA polymerase II subunit RPB9 Proteins 0.000 description 1
- 101001088179 Homo sapiens DNA-directed RNA polymerases I, II, and III subunit RPABC1 Proteins 0.000 description 1
- 101000686009 Homo sapiens DNA-directed RNA polymerases I, II, and III subunit RPABC2 Proteins 0.000 description 1
- 101000686022 Homo sapiens DNA-directed RNA polymerases I, II, and III subunit RPABC3 Proteins 0.000 description 1
- 101000723789 Homo sapiens DNA-directed RNA polymerases I, II, and III subunit RPABC4 Proteins 0.000 description 1
- 101000723805 Homo sapiens DNA-directed RNA polymerases I, II, and III subunit RPABC5 Proteins 0.000 description 1
- 101000670537 Homo sapiens E3 ubiquitin-protein ligase RNF168 Proteins 0.000 description 1
- 101001107071 Homo sapiens E3 ubiquitin-protein ligase RNF8 Proteins 0.000 description 1
- 101001011859 Homo sapiens Elongin-A Proteins 0.000 description 1
- 101001011846 Homo sapiens Elongin-B Proteins 0.000 description 1
- 101000918264 Homo sapiens Exonuclease 1 Proteins 0.000 description 1
- 101100119754 Homo sapiens FANCL gene Proteins 0.000 description 1
- 101001062402 Homo sapiens Fanconi anemia core complex-associated protein 100 Proteins 0.000 description 1
- 101000824568 Homo sapiens Fanconi anemia core complex-associated protein 24 Proteins 0.000 description 1
- 101000914679 Homo sapiens Fanconi anemia group B protein Proteins 0.000 description 1
- 101000848174 Homo sapiens Fanconi anemia group I protein Proteins 0.000 description 1
- 101000848171 Homo sapiens Fanconi anemia group J protein Proteins 0.000 description 1
- 101000848187 Homo sapiens Fanconi anemia group M protein Proteins 0.000 description 1
- 101000914689 Homo sapiens Fanconi-associated nuclease 1 Proteins 0.000 description 1
- 101000920748 Homo sapiens General transcription and DNA repair factor IIH helicase subunit XPB Proteins 0.000 description 1
- 101000666405 Homo sapiens General transcription factor IIH subunit 1 Proteins 0.000 description 1
- 101000655398 Homo sapiens General transcription factor IIH subunit 2 Proteins 0.000 description 1
- 101000655391 Homo sapiens General transcription factor IIH subunit 3 Proteins 0.000 description 1
- 101000655406 Homo sapiens General transcription factor IIH subunit 4 Proteins 0.000 description 1
- 101000655402 Homo sapiens General transcription factor IIH subunit 5 Proteins 0.000 description 1
- 101001002170 Homo sapiens Glutamine amidotransferase-like class 1 domain-containing protein 3, mitochondrial Proteins 0.000 description 1
- 101001066158 Homo sapiens Growth arrest and DNA damage-inducible protein GADD45 alpha Proteins 0.000 description 1
- 101001066163 Homo sapiens Growth arrest and DNA damage-inducible protein GADD45 gamma Proteins 0.000 description 1
- 101000899334 Homo sapiens Helicase POLQ-like Proteins 0.000 description 1
- 101001046996 Homo sapiens Histone acetyltransferase KAT5 Proteins 0.000 description 1
- 101000581326 Homo sapiens Mediator of DNA damage checkpoint protein 1 Proteins 0.000 description 1
- 101000968674 Homo sapiens MutS protein homolog 4 Proteins 0.000 description 1
- 101000968663 Homo sapiens MutS protein homolog 5 Proteins 0.000 description 1
- 101000981336 Homo sapiens Nibrin Proteins 0.000 description 1
- 101000738901 Homo sapiens PMS1 protein homolog 1 Proteins 0.000 description 1
- 101000589450 Homo sapiens Poly(ADP-ribose) glycohydrolase Proteins 0.000 description 1
- 101001094809 Homo sapiens Polynucleotide 5'-hydroxyl-kinase Proteins 0.000 description 1
- 101000647571 Homo sapiens Pre-mRNA-splicing factor SYF1 Proteins 0.000 description 1
- 101000735456 Homo sapiens Protein mono-ADP-ribosyltransferase PARP3 Proteins 0.000 description 1
- 101000670549 Homo sapiens RecQ-mediated genome instability protein 2 Proteins 0.000 description 1
- 101001092125 Homo sapiens Replication protein A 70 kDa DNA-binding subunit Proteins 0.000 description 1
- 101000597183 Homo sapiens Telomere length regulation protein TEL2 homolog Proteins 0.000 description 1
- 101000735431 Homo sapiens Terminal nucleotidyltransferase 4A Proteins 0.000 description 1
- 101000843556 Homo sapiens Transcription factor HES-1 Proteins 0.000 description 1
- 101000717428 Homo sapiens UV excision repair protein RAD23 homolog A Proteins 0.000 description 1
- 101000717424 Homo sapiens UV excision repair protein RAD23 homolog B Proteins 0.000 description 1
- 101000749407 Homo sapiens UV-stimulated scaffold protein A Proteins 0.000 description 1
- 101000607909 Homo sapiens Ubiquitin carboxyl-terminal hydrolase 1 Proteins 0.000 description 1
- 101000837581 Homo sapiens Ubiquitin-conjugating enzyme E2 T Proteins 0.000 description 1
- 101000814276 Homo sapiens WD repeat-containing protein 48 Proteins 0.000 description 1
- 102000037982 Immune checkpoint proteins Human genes 0.000 description 1
- 108091008036 Immune checkpoint proteins Proteins 0.000 description 1
- 229910015837 MSH2 Inorganic materials 0.000 description 1
- 102100027643 Mediator of DNA damage checkpoint protein 1 Human genes 0.000 description 1
- 108010074346 Mismatch Repair Endonuclease PMS2 Proteins 0.000 description 1
- 102100037480 Mismatch repair endonuclease PMS2 Human genes 0.000 description 1
- 102000013609 MutL Protein Homolog 1 Human genes 0.000 description 1
- 108010026664 MutL Protein Homolog 1 Proteins 0.000 description 1
- 102100021157 MutS protein homolog 4 Human genes 0.000 description 1
- 102100021156 MutS protein homolog 5 Human genes 0.000 description 1
- 102100024403 Nibrin Human genes 0.000 description 1
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 1
- 102100037482 PMS1 protein homolog 1 Human genes 0.000 description 1
- 102100040884 Partner and localizer of BRCA2 Human genes 0.000 description 1
- 102100032347 Poly(ADP-ribose) glycohydrolase Human genes 0.000 description 1
- 102100035460 Polynucleotide 5'-hydroxyl-kinase Human genes 0.000 description 1
- 102100025391 Pre-mRNA-splicing factor SYF1 Human genes 0.000 description 1
- 102100034935 Protein mono-ADP-ribosyltransferase PARP3 Human genes 0.000 description 1
- 102100020949 Putative glutamine amidotransferase-like class 1 domain-containing protein 3B, mitochondrial Human genes 0.000 description 1
- 102000004909 RNF168 Human genes 0.000 description 1
- 102000004910 RNF8 Human genes 0.000 description 1
- 102100039613 RecQ-mediated genome instability protein 2 Human genes 0.000 description 1
- 102100035729 Replication protein A 70 kDa DNA-binding subunit Human genes 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 102100035154 Telomere length regulation protein TEL2 homolog Human genes 0.000 description 1
- 102100034939 Terminal nucleotidyltransferase 4A Human genes 0.000 description 1
- 102000000504 Tumor Suppressor p53-Binding Protein 1 Human genes 0.000 description 1
- 108010041385 Tumor Suppressor p53-Binding Protein 1 Proteins 0.000 description 1
- 102100020845 UV excision repair protein RAD23 homolog A Human genes 0.000 description 1
- 102100020779 UV excision repair protein RAD23 homolog B Human genes 0.000 description 1
- 102100040533 UV-stimulated scaffold protein A Human genes 0.000 description 1
- 102100039865 Ubiquitin carboxyl-terminal hydrolase 1 Human genes 0.000 description 1
- 102100028705 Ubiquitin-conjugating enzyme E2 T Human genes 0.000 description 1
- 102100039414 WD repeat-containing protein 48 Human genes 0.000 description 1
- 238000001772 Wald test Methods 0.000 description 1
- 108010000443 X-ray Repair Cross Complementing Protein 1 Proteins 0.000 description 1
- 102000002258 X-ray Repair Cross Complementing Protein 1 Human genes 0.000 description 1
- 108010074310 X-ray repair cross complementing protein 3 Proteins 0.000 description 1
- 108700031763 Xeroderma Pigmentosum Group D Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000005975 antitumor immune response Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009274 differential gene expression Effects 0.000 description 1
- CZWHMRTTWFJMBC-UHFFFAOYSA-N dinaphtho[2,3-b:2',3'-f]thieno[3,2-b]thiophene Chemical compound C1=CC=C2C=C(SC=3C4=CC5=CC=CC=C5C=C4SC=33)C3=CC2=C1 CZWHMRTTWFJMBC-UHFFFAOYSA-N 0.000 description 1
- 238000012172 direct RNA sequencing Methods 0.000 description 1
- 238000002224 dissection Methods 0.000 description 1
- 230000005782 double-strand break Effects 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000010201 enrichment analysis Methods 0.000 description 1
- 201000004101 esophageal cancer Diseases 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 206010073071 hepatocellular carcinoma Diseases 0.000 description 1
- 231100000844 hepatocellular carcinoma Toxicity 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 238000010166 immunofluorescence Methods 0.000 description 1
- 238000010185 immunofluorescence analysis Methods 0.000 description 1
- 230000001024 immunotherapeutic effect Effects 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 201000007270 liver cancer Diseases 0.000 description 1
- 210000005229 liver cell Anatomy 0.000 description 1
- 238000001325 log-rank test Methods 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 230000001926 lymphatic effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 238000007481 next generation sequencing Methods 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000010837 poor prognosis Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 230000008263 repair mechanism Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000005783 single-strand break Effects 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000001890 transfection Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000003827 upregulation Effects 0.000 description 1
- 108010073629 xeroderma pigmentosum group F protein Proteins 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method, a system, equipment and a computer readable storage medium for constructing an esophageal squamous cell carcinoma classification model and processing data, wherein the method comprises the following steps: obtaining sequencing data of a training set sample and life cycle conditions corresponding to the sample; extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample; carrying out selective processing on the DDR path gene set to obtain a path related to survival rate and a gene expression condition of the path related to survival rate; the survival-related pathway comprises one or more of the following: MMR access, NER access, FA access and NHEJ access; and carrying out cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, representing the path of each group of classification subtypes and the gene expression condition thereof, and obtaining a classification model.
Description
Technical Field
The invention relates to the field of data analysis, in particular to a method and a system for constructing and processing an esophageal squamous cell carcinoma classification model.
Background
Esophageal Squamous Cell Carcinoma (ESCC) is a malignant tumor that threatens human health. The five-year survival rate of ESCC patients is less than 20% in developed countries and less than 5% in many developing countries. Notably, some patients with primary esophageal cancer often relapse rapidly after esophageal resection, and the prognosis of these patients remains poor. To date, no accurate molecular biomarkers can predict the development of these primary ESCC patients, leading to inadequate clinical management. Therefore, there is an urgent need to determine new prognostic biomarkers for primary ESCC.
Multiple synergistic repair mechanisms can rapidly and properly repair DNA damage in normal cells; DNA double strand breaks are repaired mainly by Homologous Recombination (HR) and non-homologous end joining (NHEJ), and DNA single strand breaks are repaired mainly by mismatch repair (MMR) and nucleotide excision repair pathway (NER). DNA Damage Repair (DDR) defects can lead to the accumulation of DNA damage and genomic instability, the generation of neoantigens, and the upregulation of immune checkpoints, ultimately altering immune balance in the Tumor Microenvironment (TME). Interestingly, DDR deficiency becomes an important determinant of anti-tumor immune response by affecting antigenicity, adjuvanticity, and reactivity, which may contribute to the response of immunotherapy. Recent studies have revealed the potential of some DDR-based biomarkers in predicting immunotherapeutic responses; however, the value of DDR-related features for prognostic evaluation and personalized immunotherapy has not yet been fully elucidated. Therefore, it is crucial to reveal a correlation between changes in the tumor DDR pathway and prognosis.
Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. The invention provides a construction method of an esophageal squamous cell carcinoma classification model, which comprises the steps of screening out a DDR (double data rate) channel gene set and a gene expression condition thereof by using sequencing data of a sample, carrying out cluster analysis on the sample according to the life cycle condition of the sample to obtain a DDR-active subtype and a DDR-silent subtype, representing the DDR channel gene set and the gene expression condition of 2 subtypes and obtaining the classification model; the method provided by the invention is used for typing and prognosis evaluation of the primary ESCC by processing and analyzing the related data based on the classification model, and deeply mining the life law hidden behind the biological data to solve the related life science problem.
The first aspect of the application discloses a method for constructing an esophageal squamous cell carcinoma classification model, which comprises the following steps:
obtaining sequencing data of a training set sample and life cycle conditions corresponding to the sample;
extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample;
carrying out selective processing on the DDR path gene set to obtain a path related to survival rate and a gene expression condition of the path related to survival rate; the survival-related pathway comprises one or more of the following: MMR access, NER access, FA access and NHEJ access;
and carrying out cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, representing the path of each group of classification subtypes and the gene expression condition thereof, and obtaining a classification model.
The DDR pathway gene set includes: a BER path, an MMR path, an NER path, an FA path, an HR path and an NHEJ path;
the method for cluster analysis comprises the following steps: a consistency clustering algorithm;
optionally, the method for selecting processing includes: univariate Cox regression analysis;
optionally, the sequencing data of the training set sample comprises: RNA-seq data for primary ESCC tumor tissue samples and metastatic ESCC tumor tissue samples.
The construction method further comprises the following steps: based on the gene expression condition of the survival rate related pathway, obtaining a DDR gene set related to a survival result and a corresponding gene expression condition by using a univariate regression analysis method; processing the DDR gene set related to the survival result and the corresponding gene expression condition by using a multivariate analysis method to obtain a prognosis prediction gene and the gene expression condition of the prognosis prediction gene;
and carrying out cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, prognosis prediction genes representing each group of classification subtypes and gene expression conditions thereof, and obtaining a classification model.
The different classification subtypes of the classification model include: DDR-active subtype and DDR-silent subtype; the DDR-active subtype corresponds to the gene expression condition of a path with high survival rate, and the DDR-silent subtype corresponds to the gene expression condition of a path with low survival rate;
optionally, the prognostic prediction genes include: BRCA1 gene and HFM1 gene; the DDR-active subtype has high expression level corresponding to BRCA1 gene, and the DDR-silent subtype has high expression level corresponding to HFM1 gene.
In a second aspect, the present application discloses a method for processing esophageal squamous cell carcinoma data, comprising:
obtaining sequencing data of a sample to be detected;
inputting the sequencing data of the sample to be tested into the classification model disclosed by the first aspect of the application to obtain the classification results of the DDR-active subtype and the DDR-silent subtype;
optionally, the method further includes: predicting the survival rate of the sample to be tested based on the classification result; outputting a result with high survival rate of the sample to be detected based on the DDR-active subtype classification result; and outputting a result of low survival rate of the sample to be detected based on the classification result of the DDR-silent subtype.
In a third aspect of the present application, a method for processing esophageal squamous cell carcinoma data is disclosed, which comprises:
acquiring gene expression data of a sample to be detected; the gene expression data of the sample to be detected comprises gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
inputting the gene expression data of the sample to be detected into the classification model disclosed in the first aspect of the application to obtain a classification result;
a fourth aspect of the present application discloses a system for processing esophageal squamous cell carcinoma data, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring sequencing data of a sample to be detected;
and the output unit is used for inputting the sequencing data of the sample to be detected into the classification model disclosed in the first aspect of the application to obtain the classification results of the DDR-active subtype and the DDR-silent subtype.
A fifth aspect of the present application discloses an apparatus for processing esophageal squamous cell carcinoma data, the apparatus comprising: a memory and a processor;
the memory is to store program instructions; the processor is used for calling program instructions and executing the method for processing the esophageal squamous cell carcinoma data disclosed in the second aspect and/or the third aspect of the application when the program instructions are executed.
A sixth aspect of the present application discloses a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for processing esophageal squamous cell carcinoma data disclosed in the second and/or third aspects of the present application.
The application has the following beneficial effects:
1. the application innovatively discloses a model construction method for typing primary ESCC according to a DDR path gene set and a gene expression condition thereof, and a classification model of 2 classification results of a DDR-active subtype and a DDR-silent subtype is obtained; meanwhile, in the process of model construction, two independent prognostic biomarkers BRCA1 and HFM1 are also determined, the classification model can be used for effectively predicting the subsequent survival rate of the primary ESCC patient with frequent and rapid relapse and poor prognosis, new clues and new perspectives are provided for the identification of the novel DDR-based molecular subtype as tumor heterogeneity, and the potential clinical significance of the treatment and management strategy of the primary ESCC patient with the DDR-silent subtype is revealed.
2. The method is used for carrying out clinical prognosis evaluation on the patient, life rules hidden behind biological data are mined from a deep level, and the accuracy and the depth of data analysis are greatly improved from a plurality of dimensions such as gene information and channel information of a biological population.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of a method for processing esophageal squamous cell carcinoma data provided by a second aspect of an embodiment of the present invention;
FIG. 2 is a schematic diagram of an esophageal squamous cell carcinoma data processing and analyzing device provided by the embodiment of the invention;
FIG. 3 is a schematic flow chart of a system for processing and analyzing esophageal squamous cell carcinoma data provided by an embodiment of the invention;
FIG. 4 is a diagram of cluster analysis of ESCC tumors based on DDR gene profiling provided by embodiments of the present invention;
FIG. 5 is a graph showing the results of BRCA1 and HFM1 modulating DNA damage response in ESCC as provided in the examples of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
In some of the flows described in the present specification and claims and in the above figures, a number of operations are included that occur in a particular order, but it should be clearly understood that these operations may be performed out of order or in parallel as they occur herein, with the order of the operations being indicated as 101, 102, etc. merely to distinguish between the various operations, and the order of the operations by themselves does not represent any order of performance. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a processing method of esophageal squamous cell carcinoma data provided by a second aspect of the embodiment of the invention, specifically, the method comprises the following steps:
101: obtaining sequencing data of a sample to be detected;
in one embodiment, the sequencing data of the test sample is RNA-seq data of a primary ESCC patient. Primary is relative to secondary and metastatic. That is, a disease occurs first in a tissue or organ for which the disease is primary. To give an example: primary hepatocellular carcinoma, the primary one, is the cancer of the liver cells, while secondary liver cancer, the cancer of other parts, which is transferred to the liver by the blood flow or lymphatic route, is in other tissues or organs than the liver. Primary ESCC patients in this example refer to patients who have undergone surgical resection of a primary tumor, followed by radiation therapy, with or without chemotherapy.
In one embodiment, RNA-seq or transcriptome sequencing techniques are used to perform sequencing analysis using high throughput sequencing techniques to reflect the expression levels of mRNA, smallRNA, noncodingRNA, etc., or some of them. In the past decade, RNA-Seq technology has evolved rapidly and has become an indispensable tool for analyzing differential gene expression/mRNA variable splicing at the transcriptome level. With the development of the next generation sequencing technology, the application range of the RNA-Seq technology becomes wider: in the field of RNA biology, RNA-Seq can be applied to single cell gene expression/protein expression/RNA structure analysis; secondly, the concept of spatial transcriptome is also gradually emerging. The long-read long/direct RNA-Seq technique and better data analysis computational tools help biologists to use RNA-Seq to deepen understanding of RNA biology-e.g., when and where transcription begins; how the folding and intermolecular action in vivo affect the RNA function, and the like.
A transcriptome is the collection of all transcripts produced by a certain species or specific cell type. Transcriptome research can study gene functions and gene structures from the whole level, reveal molecular mechanisms in specific biological processes and disease occurrence processes, and has been widely applied in the fields of basic research, clinical diagnosis, drug research and development, and the like.
In one embodiment, the test sample is a primary ESCC patient clinically used to receive a prognostic assessment.
102: inputting the sequencing data of the sample to be detected into the constructed classification model to obtain the classification results of DDR-active subtype and DDR-silent subtype;
in one embodiment, the method further comprises: predicting the survival rate of the sample to be tested based on the classification result; outputting a result with high survival rate of the sample to be detected based on the classification result of the DDR-active subtype; outputting a result of low survival rate of the sample to be detected based on the classification result of the DDR-silent subtype;
in one embodiment, the method for constructing the classification model comprises the following steps:
obtaining sequencing data of a training set sample and life cycle conditions corresponding to the sample;
extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample; the training set samples included RNA-seq data for tumor tissues of 82 primary ESCCs and 73 ESCCs with lymph node metastasis; the patients received surgical resection and lymph node dissection of the primary tumor, followed by radiation therapy, with or without chemotherapy. The data for the 155 patients were from the ESCC cohort of the tumor Hospital (SCH) of Shanxi province, the RNA-seq data for the SCH cohort were deposited in Gene Expression Omnibus (GEO) under the accession number GSE53625, and clinical and pathological data were determined for 97 patients by retrospective examination of SCH electronic medical records, ending the follow-up period in 2019/06 months. RNA-seq data analysis for the HiSeq Illumina platform was collected from UCSC Xena atlas (https:// Xena browser. Net/datapages /), RNA-seq data covering TPM levels and log2 (x + 1) normalization;
carrying out selective processing on the DDR path gene set to obtain a path related to survival rate and a gene expression condition of the path related to survival rate; the survival-related pathway comprises one or more of the following: MMR access, NER access, FA access and NHEJ access;
and performing cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, representing the path of each group of classification subtypes and the gene expression condition thereof, and obtaining a classification model.
In one example, to characterize DDR subtypes, differential Expression (DE) analysis was first performed on DDR subtypes using R package limma (v3.50.3) to determine subtype-specific genes. Differentially Expressed Genes (DEG) were defined as log fold change (logFC) < = -1 or > =1 and adjusted P value <0.05. Then, a pathway enrichment analysis was performed on the DEG's from a select set of marker pathways from MSigDB (GenBank database: https:// www. Jianshu. Com/p/99369b2f7a7 d) to identify enriched pathways in DDR subtypes, as performed by the R-packet Cluster analysis program Cluster profiler (version 4.2.2).
In one embodiment, the gene expression profile of the survival-related pathway comprises gene expression profiles of one or more of the following genes: POLD1, POLD2, POLD3, POLD4, MSH2, MSH3, MSH6, MLH1, MLH3, PMS1, PMS2, MSH4, MSH5, EXO1, HMGB1, LIG1, PCNA, RFC2, RFC4, RFC3, RFC5, RFC1, RPA2, RPA3, RPA4, POLD1, POLD2, POLD3, POLD4, PCNA, RFC1, RFC2, RFC3, RFC4, RFC5, POLE2, POLE3, POLE4, POLK, CUL4A, DDB1, DDB2 RBX1, CUL4A, DDB1, DDB2, RBX1, CETN2, RAD23B, XPC, POLR2A, POLR2B, POLR2C, POLR2D, POLR2E, POLR2F, POLR2G, POLR2H, POLR2I, POLR2J, POLR2K, POLR2L, CUL3, CUL5, ERCC1, ERCC4, ERCC5, LIG1, TCEB2, TCEB3, UVSSA, XPA, RPA1, RPA2, RPA3, RPA4, CDK7, ERCC2, ERCC3, GTF2H1, GTF2H2 GTF2H3, GTF2H4, GTF2H5, MNAT1, ERCC6, ERCC8, LIG3, RAD23A, XAB2, XRCC1, GADD45A, GADD45G, BLM, RMI2, TOP3A, TOP3B, BARD1, BRCA2, BRIP1, PALB2, FAAP100, FAAP24, FANCA, FANCB, FANCC, FANCE, FANCF, FANCG, FANCL, FANCM, APITD1, HES1, STRA13, UBE2T, FANCD2, FANCI, BRE, CCDC98, DNA2, FAN1, HELQ KAT5, RAD51C, TELO2, USP1, WDR48, APLF, ATM, MDC1, MRE11A, NBN, PARP3, RAD50, RNF168, RNF8, TP53BP1, DCLRE1C, LIG4, NHEJ1, PRKDC, XRCC5, XRCC6, LIG4, NHEJ1, XRCC4, PRKDC, XRCC5, XRCC6, PNKP, POLL, MRE11A, RAD50, DNTT, POLL, APLF, APTX, DCLRE1C, PARG, XRCC2, XRCC3;
optionally, the DDR pathway gene set comprises: BER pathway (base excision repair, n = 43), MMR pathway (mismatch repair, n = 27), NER pathway (nucleotide excision repair, n = 70), FA pathway (fanconi anemia, n = 36), HR pathway (homologous recombination, n = 55), and NHEJ pathway (non-homologous end joining, n = 37);
in one embodiment, the method of cluster analysis is: a consistency clustering algorithm; consistent clustering is also known as consensus clustering, which is a method of aggregating the results of multiple clustering algorithms, also known as cluster integration or clustering. It is meant that many different (input) clusters have been obtained for a particular data set and that it is desirable to find a single (consistent) cluster, in a sense more appropriate than existing clusters. Consistent clustering is therefore a problem of reconciling clustering information about the same data set from different sources or different runs of the same algorithm. This clustering procedure was performed using R-pack consensus, 1000 iterations and 90% resampling. The core algorithm is a k-means algorithm based on Euclidean distance, and a single algorithm cannot be realized.
Optionally, the method for selecting processing includes: univariate Cox regression analysis;
optionally, the sequencing data of the training set sample comprises: RNA-seq data for primary ESCC tumor tissue samples and metastatic ESCC tumor tissue samples. By analyzing RNA-seq data for primary and metastatic ESCC tumor tissue samples, DDR pathway analysis established that DDR active and DDR silent subtypes have independent prognostic value in primary ESCCs, but not in metastatic ESCCs.
In one embodiment, the construction method further comprises: based on the gene expression condition of the survival rate related pathway, obtaining (8) DDR gene sets related to the survival result and corresponding gene expression conditions by using a univariate regression analysis method; processing the DDR gene set related to the survival result and the corresponding gene expression condition by using a multivariate analysis method to obtain a prognosis prediction gene and a gene expression condition of the prognosis prediction gene; the sex, grade, smoking history and drinking history are controlled in the multivariate analysis;
and performing cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, representing prognosis prediction genes of each group of classification subtypes and gene expression conditions thereof, and obtaining a classification model.
The different classification subtypes of the classification model include: DDR-active subtype and DDR-silent subtype; the DDR-active subtype corresponds to the gene expression condition of a path with high survival rate, and the DDR-silent subtype corresponds to the gene expression condition of a path with low survival rate; no specific threshold value exists in the survival rate, and the survival rate is concluded through statistical comparison analysis between DDR-active subtype and DDR-silent subtype.
In one example, the correlation between DDR subtypes and ESCC survival with and without LNM (lymph node metastasis) was studied using a hierarchical analysis approach, and the results showed that the survival rate of primary ESCC tumors of DDRslient subtypes was the worst (log-rankp = 0.032) compared to primary and metastatic ESCC tumors of DDR-slient subtypes, but no significant difference was observed in the survival rate of metastatic ESCC tumors between DDR subtypes (log-rankp = 0.34). DDR pathway analysis established that DDR active subtypes and DDR silent subtypes have independent prognostic value in primary ESCCs, but not in metastatic ESCCs.
In one embodiment, to further validate the association between DDR subtype typing and survival outcome, we also summarized DDR subtypes for 74 tumors in the TCGA-ESCC cohort and 117 tumors in the Chen cohort. Consistent with the findings in this cohort, the DDR subtype-assisted survival prediction was only used for primary ESCC tumors, allowing identification of patient subgroups with good or poor outcome (TCGA-ESCC cohort, HR =0.075, 95-cent ci 0.008-0.674, log-rankp =0.004; HR =0.430, 95-ci 0.186-0.995, log-rankp =0.042 for Chen cohort), and no stratification of survival of ESCC tumors with LNM was possible. Multivariate Cox regression analysis indicates that the DDR subtype is a powerful predictor of survival outcome, and that it is independent of clinical variables, and emphasizes the value of the DDR subtype and its robustness in predicting survival outcomes for primary ESCC patients. The hierarchical analysis is to divide the population into different layers (sub-layers) according to certain characteristics, such as gender, age and the like, and analyze the relationship between exposure and diseases in each layer respectively. The purpose of the hierarchical analysis is to control confounding factors, adjust the interference of these factors — estimate the magnitude of the confounding factor's impact on the correlation between exposure factors and outcomes. The hierarchical analysis is to cope with the scenario of mean value failure. Wherein the TCGA-ESCC cohort included RNA-seq data for 74 patients, collected from UCSC Xena atlas (https:// Xena brown. Net/datapages /); the Chen cohort is the microarray data for 117 ESCC patients from the chinese academy of medical sciences and the beijing institute of cooperative medicine, with clinical data obtained from Gene Expression Omnibus (GEO, www.ncbi.nlm.nih.gov/GEO/query/acc.cgiac = GSE 53624).
Optionally, the prognostic predictive genes include: BRCA1 gene and HFM1 gene; the DDR-active subtype has high expression level corresponding to BRCA1 gene, and the DDR-silent subtype has high expression level corresponding to HFM1 gene; no specific threshold value exists for the level of gene expression, and the result is obtained by performing statistical comparison analysis between DDR-active subtype and DDR-silent subtype.
In one example, prognostic assessment of DDR genes in primary and metastatic ESCC tumor tissues using a meta assay with 3 independent cohorts, respectively, resulted in BRCA1 and HFM1 being predictors of survival outcome in primary ESCC patients, but not contributing to prognosis of metastatic ESCC; BRCA1 was identified as a favorable prognostic factor, with high expression associated with improved survival, with a combined HR of 0.22, while HFM1 is a risk factor, with increased expression associated with poor survival outcome, with a different aggregate HR of 4.41.
In one embodiment, in the presence of BRCA1, cells sense and repair DNA damage, maintain genomic integrity and prevent tumorigenesis. BRCA1 deficiency destroys normal DDR and leads to the accumulation of DNA damage. However, the role of HFM1, an ATP-dependent DNA helicase homolog, in DDR has not been studied. To determine the role of BRCA1 and HFM1 in ESCC cells DDR, a cellular model of cisplatin (DDP) and X-IR-induced DNA damage in vitro was constructed, using transient siRNA transfection to silence BRCA1 or HFM1 expression, and treating the cells with cisplatin (DDP) or X-IR. The knockout efficiency of BRCA1 and HFM1 is detected by Western blotting. For direct assessment of DDR, γ H2AX (a mature DNA DSB marker) was visualized by immunofluorescence. Spontaneous and DDP or IR induced γ H2AX foci were counted and analyzed. After DDP or X-IR treatment, γ H2AX accumulates. Furthermore, immunofluorescence analysis indicated a significant increase in endogenous γ H2AX accumulation in KYSE410 and KYSE450 cells following BRCA1 knockdown under IR and DDP treatment. In contrast, knock-out of HFM1 significantly reduced the number of γ H2AX foci in KYSE30 and KYSE450 cells treated with X-IR or DDP. These results indicate that deletion of BRCA1 results in DDR deficiency, supporting the role of BRCA1 as a favorable prognostic factor, while deletion of HFM1 promotes DDR, supporting the role of HFM1 as a prognostic risk factor.
In a third aspect of the present application, a method for processing esophageal squamous cell carcinoma data is disclosed, which comprises:
acquiring gene expression data of a sample to be detected; the gene expression data of the sample to be detected comprises gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
inputting the gene expression data of the sample to be detected into the classification model disclosed in the first aspect of the application to obtain a classification result;
optionally, the gene expression data of the test sample is the data of the primary ESCC patient.
Fig. 2 is a device for processing and analyzing esophageal squamous cell carcinoma data, which is provided by the embodiment of the invention and comprises: a memory and a processor; the memory is to store program instructions; the processor is configured to invoke program instructions, which when executed, are configured to perform a method of processing esophageal squamous cell carcinoma data as described above.
Fig. 3 is a system for processing and analyzing esophageal squamous cell carcinoma data, which comprises:
an obtaining unit 301, configured to obtain sequencing data of a sample to be detected;
the output unit 302 is configured to input the sequencing data of the sample to be tested into the classification model disclosed in the first aspect of the present application, so as to obtain classification results of DDR-active subtypes and DDR-silent subtypes.
The processing and analyzing system for esophageal squamous cell carcinoma data provided by the embodiment of the invention comprises:
an acquisition unit that acquires gene expression data of a sample to be tested; the gene expression data of the sample to be detected comprises gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
and the output unit is used for inputting the gene expression data of the sample to be detected into the classification model disclosed by the first aspect of the application to obtain a classification result.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of processing esophageal squamous cell carcinoma data as described above.
FIG. 4 is a diagram of cluster analysis of ESCC tumor based on DDR gene mapping provided in the embodiment of the present invention, wherein,
(A) Heat map of DDR gene expression fold change between DDR subtypes. The red bar represents the DDR-active subtype and the green bar represents the DDR-silent subtype. The DDR subtypes are classified by consensus clustering. (B-D) Kaplan-Meier curves comparing OS (log rank test) for DDR-active subtype, DDR-silent subtype and transition subtype groups. HR and 95% CI were calculated by a two-sided Wald test using univariate Cox regression.
FIG. 5 is a graph showing the results of BRCA1 and HFM1 modulating DNA damage response in ESCC as provided in the examples of the present invention, wherein (A, B) KYSE410 and KYSE450 cells were transfected with BRCA1 siRNA, treated with 2. Mu.g/ml DDP, and analyzed by Western blotting for γ H2AX. (C, D) KYSE410 and KYSE450 cells were transfected with BRCA1 siRNA, exposed to IR (4 Gy), harvested at the indicated times, and analyzed for γ H2AX by Western blotting. Representative pictures and quantification of γ H2AX foci in (E, F) control and BRCA1 knockdown KYSE410 and KYSE450 cells, treated with 2 μ g/ml DDP for the indicated time. Data are representative of three independent experiments. Each point represents one cell and 50 cells per group were counted for this experiment using Image J. Error bars represent SD of this experiment. P values were determined by unpaired two-sided t-test. Representative pictures and quantification of γ H2AX foci in (G, H) control and BRCA1 knock-out KYSE410 and KYSE450 cells, treated with IR (4 Gy) for the indicated time. Data are representative of three independent experiments. Each point represents one cell and 50 cells per group were counted for this experiment using Image J. Error bars represent SD of this experiment. P values were determined by unpaired two-sided t-test. (I, J) KYSE30 and KYSE450 cells were transfected with HFM1 siRNA, treated with 2. Mu.g/ml DDP, and analyzed by Western blot
γ H2AX. (K, L) KYSE30 and KYSE450 cells were transfected with HFM1 siRNA, exposed to IR (4 Gy), harvested at the indicated times, and analyzed for γ H2AX by Western blotting. Representative pictures and quantification of γ H2AX foci in (M, N) control and HFM1 knockdown KYSE30 and KYSE450 cells, treated with 2 μ g/ml DDP for the indicated time. Data are representative of three independent experiments. Each point represents one cell and Image J counts 50 cells for each group of this experiment. Error bars represent SD of this experiment. P values were determined by unpaired two-sided t-test. Representative pictures and quantification of (O, P) control and HFM1 knockdown of KYSE30 and KYSE450 cells with IR (4 Gy) treatment of γ H2AX foci at the indicated times. Data are representative of three independent experiments. Each point represents one cell and Image J counts 50 cells for each group of this experiment. Error bars represent SD of this experiment. P values were determined by unpaired two-sided t-test.
The validation results of this validation example show that assigning an intrinsic weight to an indication can moderately improve the performance of the method relative to the default settings.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by hardware that is instructed to implement by a program, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
While the invention has been described in detail with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.
Claims (10)
1. A method for constructing an esophageal squamous cell carcinoma classification model comprises the following steps:
obtaining sequencing data of a training set sample and life cycle conditions corresponding to the sample;
extracting a DDR channel gene set and a gene expression condition thereof from sequencing data of the training set sample; carrying out selective processing on the DDR path gene set to obtain a path related to survival rate and a gene expression condition of the path related to survival rate; the survival-related pathway comprises one or more of the following: MMR access, NER access, FA access and NHEJ access;
and carrying out cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, representing the path of each group of classification subtypes and the gene expression condition thereof, and obtaining a classification model.
2. The method for constructing the esophageal squamous cell carcinoma classification model according to claim 1, wherein the DDR pathway gene set comprises one or more of the following genes: BER pathway, MMR pathway, NER pathway, FA pathway, HR pathway, and NHEJ pathway.
3. The method for constructing the esophageal squamous cell carcinoma classification model according to claim 1, wherein the method of cluster analysis is as follows: a consistency clustering algorithm;
optionally, the method for selecting processing includes: univariate Cox regression analysis;
optionally, the sequencing data of the training set sample comprises: RNA-seq data for primary ESCC tumor tissue samples and metastatic ESCC tumor tissue samples.
4. The method of constructing an esophageal squamous cell carcinoma classification model according to claim 1, characterized in that said method of construction further comprises: based on the gene expression condition of the survival rate related pathway, obtaining a DDR gene set related to a survival result and a corresponding gene expression condition by using a univariate regression analysis method; processing the DDR gene set related to the survival result by using a multivariate analysis method to obtain a prognosis prediction gene and a gene expression condition of the prognosis prediction gene;
and carrying out cluster analysis on the training set samples based on the life cycle condition to obtain different classification subtypes, prognosis prediction genes representing each group of classification subtypes and gene expression conditions thereof, and obtaining a classification model.
5. The method for constructing the classification model of esophageal squamous cell carcinoma according to any of claims 1-4, characterized in that the different classification subtypes of the classification model comprise: DDR-active subtype and DDR-silent subtype; the DDR-active subtype corresponds to the gene expression condition of a path with high survival rate, and the DDR-silent subtype corresponds to the gene expression condition of a path with low survival rate;
optionally, the prognostic predictive genes include: BRCA1 gene and HFM1 gene; the DDR-active subtype has high expression level corresponding to BRCA1 gene, and the DDR-silent subtype has high expression level corresponding to HFM1 gene.
6. A method of processing esophageal squamous cell carcinoma data, comprising:
obtaining sequencing data of a sample to be detected;
inputting the sequencing data of the sample to be tested into the classification model in claims 1-5 to obtain the classification results of DDR-active subtype and DDR-silent subtype;
optionally, the method further includes: predicting the survival rate of the sample to be tested based on the classification result;
outputting a result with high survival rate of the sample to be detected based on the classification result of the DDR-active subtype; and outputting a result of low survival rate of the sample to be detected based on the classification result of the DDR-silent subtype.
7. A method of processing esophageal squamous cell carcinoma data, comprising:
acquiring gene expression data of a sample to be detected; the gene expression data of the sample to be detected comprises gene expression data of one or more of the following genes: BRCA1 gene, HFM1 gene;
inputting the gene expression data of the sample to be detected into the classification model in claims 1-5 to obtain a classification result;
optionally, the gene expression data of the test sample is data of a primary ESCC patient.
8. A system for processing esophageal squamous cell carcinoma data, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring sequencing data of a sample to be detected;
the output unit is used for inputting the sequencing data of the sample to be tested into the classification model in the claims 1-5 to obtain the classification results of DDR-active subtype and DDR-silent subtype.
9. An apparatus for processing esophageal squamous cell carcinoma data, the apparatus comprising: a memory and a processor;
the memory is to store program instructions; the processor is configured to invoke program instructions for performing the method of processing esophageal squamous cell carcinoma data of claim 6 or 7 when the program instructions are executed.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of processing esophageal squamous cell carcinoma data of claim 6 or 7 as set forth above.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063027.4A CN115982644B (en) | 2023-01-19 | 2023-01-19 | Esophageal squamous cell carcinoma classification model construction and data processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310063027.4A CN115982644B (en) | 2023-01-19 | 2023-01-19 | Esophageal squamous cell carcinoma classification model construction and data processing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115982644A true CN115982644A (en) | 2023-04-18 |
CN115982644B CN115982644B (en) | 2024-04-30 |
Family
ID=85960554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310063027.4A Active CN115982644B (en) | 2023-01-19 | 2023-01-19 | Esophageal squamous cell carcinoma classification model construction and data processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115982644B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180051346A1 (en) * | 2015-03-17 | 2018-02-22 | Stichting Het Nederlands Kanker Instituut-Antoni van Leeuwenhoek Ziekenhuis | Methods and means for subtyping invasive lobular breast cancer |
CN109863251A (en) * | 2016-05-17 | 2019-06-07 | 基因中心治疗公司 | To the method for squamous cell lung carcinoma subtype typing |
CN110863048A (en) * | 2019-12-06 | 2020-03-06 | 苏州卫生职业技术学院 | Probe library, detection method and kit for detecting effectiveness of DNA homologous recombination repair pathway |
CN112086199A (en) * | 2020-09-14 | 2020-12-15 | 中科院计算所西部高等技术研究院 | Liver cancer data processing system based on multiple groups of mathematical data |
US20210102260A1 (en) * | 2018-02-16 | 2021-04-08 | The Institute Of Cancer Research: Royal Cancer Hospital | Patient classification and prognositic method |
WO2021127610A1 (en) * | 2019-12-20 | 2021-06-24 | EDWARD Via COLLEGE OF OSTEOPATHIC MEDICINE | Cancer signatures, methods of generating cancer signatures, and uses thereof |
CN113345592A (en) * | 2021-06-18 | 2021-09-03 | 山东第一医科大学附属省立医院(山东省立医院) | Construction and diagnosis equipment for acute myeloid leukemia prognosis risk model |
CN114496066A (en) * | 2022-04-13 | 2022-05-13 | 南京墨宁医疗科技有限公司 | Construction method and application of gene model for prognosis of triple negative breast cancer |
CN114686591A (en) * | 2022-05-12 | 2022-07-01 | 浙江大学医学院附属第四医院 | Lung squamous carcinoma immunotherapy curative effect prediction model based on gene expression condition and construction method and application thereof |
CN115232877A (en) * | 2022-08-05 | 2022-10-25 | 中国医学科学院肿瘤医院 | Molecular typing diagnosis marker for esophageal squamous carcinoma and application thereof |
-
2023
- 2023-01-19 CN CN202310063027.4A patent/CN115982644B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180051346A1 (en) * | 2015-03-17 | 2018-02-22 | Stichting Het Nederlands Kanker Instituut-Antoni van Leeuwenhoek Ziekenhuis | Methods and means for subtyping invasive lobular breast cancer |
CN109863251A (en) * | 2016-05-17 | 2019-06-07 | 基因中心治疗公司 | To the method for squamous cell lung carcinoma subtype typing |
US20210102260A1 (en) * | 2018-02-16 | 2021-04-08 | The Institute Of Cancer Research: Royal Cancer Hospital | Patient classification and prognositic method |
CN110863048A (en) * | 2019-12-06 | 2020-03-06 | 苏州卫生职业技术学院 | Probe library, detection method and kit for detecting effectiveness of DNA homologous recombination repair pathway |
WO2021127610A1 (en) * | 2019-12-20 | 2021-06-24 | EDWARD Via COLLEGE OF OSTEOPATHIC MEDICINE | Cancer signatures, methods of generating cancer signatures, and uses thereof |
CN112086199A (en) * | 2020-09-14 | 2020-12-15 | 中科院计算所西部高等技术研究院 | Liver cancer data processing system based on multiple groups of mathematical data |
CN113345592A (en) * | 2021-06-18 | 2021-09-03 | 山东第一医科大学附属省立医院(山东省立医院) | Construction and diagnosis equipment for acute myeloid leukemia prognosis risk model |
CN114496066A (en) * | 2022-04-13 | 2022-05-13 | 南京墨宁医疗科技有限公司 | Construction method and application of gene model for prognosis of triple negative breast cancer |
CN114686591A (en) * | 2022-05-12 | 2022-07-01 | 浙江大学医学院附属第四医院 | Lung squamous carcinoma immunotherapy curative effect prediction model based on gene expression condition and construction method and application thereof |
CN115232877A (en) * | 2022-08-05 | 2022-10-25 | 中国医学科学院肿瘤医院 | Molecular typing diagnosis marker for esophageal squamous carcinoma and application thereof |
Non-Patent Citations (4)
Title |
---|
LIU Z, ET AL.: "Integrated multi-omics profiling yields a clinically relevant molecular classification for esophageal squamous cell carcinoma", CANCER CELL, 9 January 2023 (2023-01-09), pages 181 - 195 * |
ZHAO N, ET AL.: "DNA damage repair profiling of esophageal squamous cell carcinoma uncovers clinically relevant molecular subtypes with distinct prognoses and therapeutic vulnerabilities", EBIOMEDICINE, 17 September 2023 (2023-09-17) * |
王婷,等: "基于多数据库分析代谢相关基因DLAT在结直肠癌中的表达及其临床意义", 解放军医学杂志, no. 04, 16 April 2019 (2019-04-16), pages 49 - 55 * |
魏之菡,等: "多基因模型在肝细胞癌预后中的应用", 生物技术通报, 21 April 2020 (2020-04-21), pages 183 - 192 * |
Also Published As
Publication number | Publication date |
---|---|
CN115982644B (en) | 2024-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nagahashi et al. | Genomic landscape of colorectal cancer in Japan: clinical implications of comprehensive genomic sequencing for precision medicine | |
Lai et al. | Single‐cell RNA sequencing reveals the epithelial cell heterogeneity and invasive subpopulation in human bladder cancer | |
Hanash et al. | Emerging molecular biomarkers—blood-based strategies to detect and monitor cancer | |
Vakiani et al. | Comparative genomic analysis of primary versus metastatic colorectal carcinomas | |
Scott et al. | Molecular subtypes of osteosarcoma identified by reducing tumor heterogeneity through an interspecies comparative approach | |
CN113228190B (en) | Systems and methods for classifying and/or identifying cancer subtypes | |
Guo et al. | Single-cell DNA sequencing reveals punctuated and gradual clonal evolution in hepatocellular carcinoma | |
Milanez-Almeida et al. | Cancer prognosis with shallow tumor RNA sequencing | |
Londero et al. | Expression and prognostic significance of APE1/Ref-1 and NPM1 proteins in high-grade ovarian serous cancer | |
Meng et al. | Biomarker discovery to improve prediction of breast cancer survival: using gene expression profiling, meta-analysis, and tissue validation | |
CN116129998B (en) | Esophageal squamous cell carcinoma data processing method and system | |
Risi et al. | A gene expression signature of Retinoblastoma loss-of-function predicts resistance to neoadjuvant chemotherapy in ER-positive/HER2-positive breast cancer patients | |
CN114540499A (en) | Application of model constructed based on PCD related gene combination in preparation of product for predicting colon adenocarcinoma prognosis | |
Wang et al. | Identification of a prognostic metabolic gene signature in diffuse large B‐cell lymphoma | |
Shi et al. | Hypoxia‐induced hsa_circ_0000826 is linked to liver metastasis of colorectal cancer | |
Zhou et al. | Whole-genome sequencing reveals the evolutionary trajectory of HBV-related hepatocellular carcinoma early recurrence | |
Braxton et al. | Clinicopathogenomic analysis of mismatch repair proficient colorectal adenocarcinoma uncovers novel prognostic subgroups with differing patterns of genetic evolution | |
KR20220060493A (en) | Method for Determining Sensitivity to PARP inhibitor or genotoxic drugs based on non-functional transcripts | |
Xing et al. | An integrated transcriptomic and computational analysis for biomarker identification in human glioma | |
Ma et al. | Skin cutaneous melanoma properties of immune-related lncRNAs identifying potential prognostic biomarkers | |
Cui et al. | ALDH2 promotes uterine corpus endometrial carcinoma proliferation and construction of clinical survival prognostic model | |
Swanton et al. | From genomic landscapes to personalized cancer management—is there a roadmap? | |
Lu et al. | Gene expression along with genomic copy number variation and mutational analysis were used to develop a 9-gene signature for estimating prognosis of COAD | |
Zhang et al. | Bioinformatic identification of genomic instability-associated lncRNAs signatures for improving the clinical outcome of cervical cancer by a prognostic model | |
Ye et al. | Exploring prognosis-associated biomarkers of estrogen-independent uterine corpus endometrial carcinoma by bioinformatics analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |