CN116656820A - Prognosis model based on breast tumor stem cell related genes and application thereof - Google Patents
Prognosis model based on breast tumor stem cell related genes and application thereof Download PDFInfo
- Publication number
- CN116656820A CN116656820A CN202310557532.4A CN202310557532A CN116656820A CN 116656820 A CN116656820 A CN 116656820A CN 202310557532 A CN202310557532 A CN 202310557532A CN 116656820 A CN116656820 A CN 116656820A
- Authority
- CN
- China
- Prior art keywords
- prognosis
- breast cancer
- tumor stem
- bcscrs
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 206010006187 Breast cancer Diseases 0.000 title claims abstract description 117
- 208000026310 Breast neoplasm Diseases 0.000 title claims abstract description 117
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 92
- 238000004393 prognosis Methods 0.000 title claims abstract description 91
- 210000000130 stem cell Anatomy 0.000 title claims abstract description 53
- 230000004083 survival effect Effects 0.000 claims abstract description 35
- 230000014509 gene expression Effects 0.000 claims abstract description 21
- 102100033449 40S ribosomal protein S24 Human genes 0.000 claims abstract description 7
- 102100022464 5'-nucleotidase Human genes 0.000 claims abstract description 7
- 102100022524 Alpha-1-antichymotrypsin Human genes 0.000 claims abstract description 7
- 102100027205 B-cell antigen receptor complex-associated protein alpha chain Human genes 0.000 claims abstract description 7
- 101000656669 Homo sapiens 40S ribosomal protein S24 Proteins 0.000 claims abstract description 7
- 101000678236 Homo sapiens 5'-nucleotidase Proteins 0.000 claims abstract description 7
- 101000678026 Homo sapiens Alpha-1-antichymotrypsin Proteins 0.000 claims abstract description 7
- 101000914489 Homo sapiens B-cell antigen receptor complex-associated protein alpha chain Proteins 0.000 claims abstract description 7
- 101000997835 Homo sapiens Tyrosine-protein kinase JAK1 Proteins 0.000 claims abstract description 7
- 102100029166 NT-3 growth factor receptor Human genes 0.000 claims abstract description 7
- 108010055623 S-Phase Kinase-Associated Proteins Proteins 0.000 claims abstract description 7
- 102000000341 S-Phase Kinase-Associated Proteins Human genes 0.000 claims abstract description 7
- 102100033438 Tyrosine-protein kinase JAK1 Human genes 0.000 claims abstract description 7
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 7
- 108010064892 trkC Receptor Proteins 0.000 claims abstract description 7
- 101000979748 Homo sapiens Protein NDRG1 Proteins 0.000 claims abstract description 4
- 102100024980 Protein NDRG1 Human genes 0.000 claims abstract description 4
- 238000004458 analytical method Methods 0.000 claims description 21
- 206010028980 Neoplasm Diseases 0.000 claims description 20
- 238000000034 method Methods 0.000 claims description 14
- 238000010276 construction Methods 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 12
- 238000000611 regression analysis Methods 0.000 claims description 11
- 238000010200 validation analysis Methods 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 8
- 101000884271 Homo sapiens Signal transducer CD24 Proteins 0.000 claims description 7
- 102100038081 Signal transducer CD24 Human genes 0.000 claims description 7
- 238000013211 curve analysis Methods 0.000 claims description 7
- 108091005625 BRD4 Proteins 0.000 claims description 6
- 102100029895 Bromodomain-containing protein 4 Human genes 0.000 claims description 6
- 101100307274 Dictyostelium discoideum rps15a gene Proteins 0.000 claims description 6
- 101100533604 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) scon-3 gene Proteins 0.000 claims description 6
- 101150089938 SKP1 gene Proteins 0.000 claims description 6
- 101100176789 Schizosaccharomyces pombe (strain 972 / ATCC 24843) gsk3 gene Proteins 0.000 claims description 6
- 101150079478 jak1 gene Proteins 0.000 claims description 6
- 101150045842 rps24 gene Proteins 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000011088 calibration curve Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 5
- 101150072531 10 gene Proteins 0.000 claims description 3
- 102000011324 NDRG Human genes 0.000 claims description 3
- 108050001500 NDRG Proteins 0.000 claims description 3
- 238000004445 quantitative analysis Methods 0.000 claims description 3
- 230000008901 benefit Effects 0.000 abstract description 5
- 238000011161 development Methods 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000001737 promoting effect Effects 0.000 abstract description 2
- 201000011510 cancer Diseases 0.000 description 9
- 210000004027 cell Anatomy 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 229940076838 Immune checkpoint inhibitor Drugs 0.000 description 5
- 108091008026 Inhibitory immune checkpoint proteins Proteins 0.000 description 5
- 102000037984 Inhibitory immune checkpoint proteins Human genes 0.000 description 5
- 210000002865 immune cell Anatomy 0.000 description 5
- 239000012274 immune-checkpoint protein inhibitor Substances 0.000 description 5
- 238000009169 immunotherapy Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 4
- 238000003556 assay Methods 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 230000008595 infiltration Effects 0.000 description 4
- 238000001764 infiltration Methods 0.000 description 4
- JVTAAEKCZFNVCJ-UHFFFAOYSA-N lactic acid Chemical compound CC(O)C(O)=O JVTAAEKCZFNVCJ-UHFFFAOYSA-N 0.000 description 4
- 210000002540 macrophage Anatomy 0.000 description 4
- 108091008036 Immune checkpoint proteins Proteins 0.000 description 3
- 102000037982 Immune checkpoint proteins Human genes 0.000 description 3
- 238000002512 chemotherapy Methods 0.000 description 3
- 230000004060 metabolic process Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 108010021064 CTLA-4 Antigen Proteins 0.000 description 2
- 102000008203 CTLA-4 Antigen Human genes 0.000 description 2
- 229940045513 CTLA4 antagonist Drugs 0.000 description 2
- AOJJSUZBOXZQNB-TZSSRYMLSA-N Doxorubicin Chemical compound O([C@H]1C[C@@](O)(CC=2C(O)=C3C(=O)C=4C=CC=C(C=4C(=O)C3=C(O)C=21)OC)C(=O)CO)[C@H]1C[C@H](N)[C@H](O)[C@H](C)O1 AOJJSUZBOXZQNB-TZSSRYMLSA-N 0.000 description 2
- 206010059866 Drug resistance Diseases 0.000 description 2
- 108091008874 T cell receptors Proteins 0.000 description 2
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 2
- 238000013103 analytical ultracentrifugation Methods 0.000 description 2
- 239000002246 antineoplastic agent Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 210000003719 b-lymphocyte Anatomy 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 229940127089 cytotoxic agent Drugs 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 238000010201 enrichment analysis Methods 0.000 description 2
- 210000004475 gamma-delta t lymphocyte Anatomy 0.000 description 2
- 229940126546 immune checkpoint molecule Drugs 0.000 description 2
- 229910052742 iron Inorganic materials 0.000 description 2
- 235000014655 lactic acid Nutrition 0.000 description 2
- 239000004310 lactic acid Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 210000004180 plasmocyte Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004043 responsiveness Effects 0.000 description 2
- 230000019491 signal transduction Effects 0.000 description 2
- 210000002536 stromal cell Anatomy 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 230000036962 time dependent Effects 0.000 description 2
- 230000004102 tricarboxylic acid cycle Effects 0.000 description 2
- 102100022712 Alpha-1-antitrypsin Human genes 0.000 description 1
- 108010082399 Autophagy-Related Proteins Proteins 0.000 description 1
- 102100038817 CDGSH iron-sulfur domain-containing protein 1 Human genes 0.000 description 1
- 239000012275 CTLA-4 inhibitor Substances 0.000 description 1
- 102100025150 Complex III assembly factor LYRM7 Human genes 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 102100039498 Cytotoxic T-lymphocyte protein 4 Human genes 0.000 description 1
- WQZGKKKJIJFFOK-QTVWNMPRSA-N D-mannopyranose Chemical compound OC[C@H]1OC(O)[C@@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-QTVWNMPRSA-N 0.000 description 1
- 102100024361 Disintegrin and metalloproteinase domain-containing protein 9 Human genes 0.000 description 1
- 102100037527 ER membrane protein complex subunit 2 Human genes 0.000 description 1
- 229930091371 Fructose Natural products 0.000 description 1
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 1
- 239000005715 Fructose Substances 0.000 description 1
- 102100039696 Glutamate-cysteine ligase catalytic subunit Human genes 0.000 description 1
- 102100030595 HLA class II histocompatibility antigen gamma chain Human genes 0.000 description 1
- 101000823116 Homo sapiens Alpha-1-antitrypsin Proteins 0.000 description 1
- 101000883055 Homo sapiens CDGSH iron-sulfur domain-containing protein 1 Proteins 0.000 description 1
- 101001005524 Homo sapiens Complex III assembly factor LYRM7 Proteins 0.000 description 1
- 101000889276 Homo sapiens Cytotoxic T-lymphocyte protein 4 Proteins 0.000 description 1
- 101000832769 Homo sapiens Disintegrin and metalloproteinase domain-containing protein 9 Proteins 0.000 description 1
- 101000880998 Homo sapiens ER membrane protein complex subunit 2 Proteins 0.000 description 1
- 101001034527 Homo sapiens Glutamate-cysteine ligase catalytic subunit Proteins 0.000 description 1
- 101001082627 Homo sapiens HLA class II histocompatibility antigen gamma chain Proteins 0.000 description 1
- 101000961071 Homo sapiens NF-kappa-B inhibitor alpha Proteins 0.000 description 1
- 101000829725 Homo sapiens Phospholipid hydroperoxide glutathione peroxidase Proteins 0.000 description 1
- 101001064853 Homo sapiens Polyunsaturated fatty acid lipoxygenase ALOX15 Proteins 0.000 description 1
- 101001130147 Homo sapiens Probable D-lactate dehydrogenase, mitochondrial Proteins 0.000 description 1
- 101001000545 Homo sapiens Probable hydrolase PNKD Proteins 0.000 description 1
- 101000706156 Homo sapiens Syntaxin-11 Proteins 0.000 description 1
- 238000010824 Kaplan-Meier survival analysis Methods 0.000 description 1
- FBOZXECLQNJBKD-ZDUSSCGKSA-N L-methotrexate Chemical compound C=1N=C2N=C(N)N=C(N)C2=NC=1CN(C)C1=CC=C(C(=O)N[C@@H](CCC(O)=O)C(O)=O)C=C1 FBOZXECLQNJBKD-ZDUSSCGKSA-N 0.000 description 1
- 210000004322 M2 macrophage Anatomy 0.000 description 1
- 206010027476 Metastases Diseases 0.000 description 1
- 101100519207 Mus musculus Pdcd1 gene Proteins 0.000 description 1
- 102100039337 NF-kappa-B inhibitor alpha Human genes 0.000 description 1
- 206010061309 Neoplasm progression Diseases 0.000 description 1
- 239000012270 PD-1 inhibitor Substances 0.000 description 1
- 239000012668 PD-1-inhibitor Substances 0.000 description 1
- 229930012538 Paclitaxel Natural products 0.000 description 1
- 102100023410 Phospholipid hydroperoxide glutathione peroxidase Human genes 0.000 description 1
- 208000031951 Primary immunodeficiency Diseases 0.000 description 1
- 102100031708 Probable D-lactate dehydrogenase, mitochondrial Human genes 0.000 description 1
- 102100035920 Probable hydrolase PNKD Human genes 0.000 description 1
- 102000004245 Proteasome Endopeptidase Complex Human genes 0.000 description 1
- 108090000708 Proteasome Endopeptidase Complex Proteins 0.000 description 1
- -1 SLC7a11 Proteins 0.000 description 1
- 102100031115 Syntaxin-11 Human genes 0.000 description 1
- 210000000612 antigen-presenting cell Anatomy 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000007475 c-index Methods 0.000 description 1
- 238000002619 cancer immunotherapy Methods 0.000 description 1
- DQLATGHUWYMOKM-UHFFFAOYSA-L cisplatin Chemical compound N[Pt](N)(Cl)Cl DQLATGHUWYMOKM-UHFFFAOYSA-L 0.000 description 1
- 229960004316 cisplatin Drugs 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000004186 co-expression Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 229960004679 doxorubicin Drugs 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000017188 evasion or tolerance of host immune response Effects 0.000 description 1
- 230000004806 ferroptosis Effects 0.000 description 1
- SDUQYLNIPVEERB-QPPQHZFASA-N gemcitabine Chemical compound O=C1N=C(N)C=CN1[C@H]1C(F)(F)[C@H](O)[C@@H](CO)O1 SDUQYLNIPVEERB-QPPQHZFASA-N 0.000 description 1
- 229960005277 gemcitabine Drugs 0.000 description 1
- 230000004547 gene signature Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000005965 immune activity Effects 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 230000001024 immunotherapeutic effect Effects 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 230000037447 lactate metabolism Effects 0.000 description 1
- 230000036210 malignancy Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000009401 metastasis Effects 0.000 description 1
- 229960000485 methotrexate Drugs 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000869 mutational effect Effects 0.000 description 1
- 210000000822 natural killer cell Anatomy 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 229960001592 paclitaxel Drugs 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 229940121655 pd-1 inhibitor Drugs 0.000 description 1
- 101150079312 pgk1 gene Proteins 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 150000003431 steroids Chemical class 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
- RCINICONZNJXQF-MZXODVADSA-N taxol Chemical compound O([C@@H]1[C@@]2(C[C@@H](C(C)=C(C2(C)C)[C@H](C([C@]2(C)[C@@H](O)C[C@H]3OC[C@]3([C@H]21)OC(C)=O)=O)OC(=O)C)OC(=O)[C@H](O)[C@@H](NC(=O)C=1C=CC=CC=1)C=1C=CC=CC=1)O)C(=O)C1=CC=CC=C1 RCINICONZNJXQF-MZXODVADSA-N 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000005751 tumor progression Effects 0.000 description 1
- 210000004981 tumor-associated macrophage Anatomy 0.000 description 1
- GBABOYUKABKIAF-GHYRFKGUSA-N vinorelbine Chemical compound C1N(CC=2C3=CC=CC=C3NC=22)CC(CC)=C[C@H]1C[C@]2(C(=O)OC)C1=CC([C@]23[C@H]([C@]([C@H](OC(C)=O)[C@]4(CC)C=CCN([C@H]34)CC2)(O)C(=O)OC)N2C)=C2C=C1OC GBABOYUKABKIAF-GHYRFKGUSA-N 0.000 description 1
- 229960002066 vinorelbine Drugs 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
- C12Q1/6886—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/118—Prognosis of disease development
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Pathology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Physics & Mathematics (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Public Health (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Hospice & Palliative Care (AREA)
- Epidemiology (AREA)
- Oncology (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to the technical field of medical artificial intelligence, and particularly relates to a prognosis model based on breast tumor stem cell related genes and application thereof. The invention provides a breast tumor stem cell prognosis characteristic gene (BRD 4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG1 and CD 24) for constructing a breast cancer prognosis model and application thereof in constructing a breast tumor stem cell-related gene-based prognosis model; the invention also comprises application of the reagent for detecting the prognostic characteristic gene expression quantity of the breast tumor stem cells in preparing a kit for evaluating the prognosis survival of breast cancer. The invention judges the prognosis of the patient based on the prognosis characteristic gene of the breast tumor stem cells and a corresponding prediction model (namely, according to the sum of the products of the expression quantity of the prognosis characteristic genes of the breast tumor stem cells and the coefficients thereof and the clinical characteristics of the patient, the invention has the advantages of high efficiency and accurate prediction of the prognosis of the breast cancer patient, provides effective guidance opinion for the treatment decision of the breast cancer patient for clinicians, reduces the occurrence of ineffective treatment, thereby reducing the treatment cost of the patient and promoting the clinical development and application of accurate treatment.
Description
Technical Field
The invention belongs to the technical field of medical artificial intelligence, and particularly relates to a prognosis model based on breast tumor stem cell related genes and application thereof.
Background
Breast cancer is a malignancy that occurs in breast tissue with a high degree of heterogeneity. Although traditional therapeutic approaches such as surgery, chemotherapy, radiotherapy and the like and emerging immunotherapeutic approaches significantly improve prognosis, cancer heterogeneity leads to breast cancer recurrence, metastasis, drug resistance and immune escape, which still presents a significant challenge for the clinical treatment of breast cancer. Thus, the heterogeneity of cancer is well recognized and its use in clinical diagnosis and treatment would be helpful to further enhance the clinical therapeutic efficacy of cancer patients.
The existing reports include that related genes such as immune related genes (CN 113862363A), iron death related genes, autophagy related genes (CN 113593648A), apoptosis related genes, lactic acid metabolism genes, macrophage characteristic genes, copper-dependent related genes, and the like are adopted to construct a prediction model based on breast cancer single cell transcriptome sequencing analysis (CN 109072481B, the constructed model relates to 95 genes, the model is too complex), some prognosis models relate to detection of hundreds of genes, the industrialization cost is too high, the clinical popularization is not suitable, most prognosis models are constructed aiming at a specific signal path and target point, and problems of low accuracy, low precision and the like of prediction often occur in the practical application process are likely to be caused by breast cancer heterogeneity, so that no accurate and reliable breast cancer prognosis model is applied to clinic at present.
Early studies speculate that breast tumor stem cells may be a source of heterogeneity, however, there is no report and clinical application to construct a prognostic model of breast cancer based on breast tumor stem cell-related genes.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a prognosis model based on breast tumor stem cell related genes and application thereof.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the invention provides a prognosis characteristic gene for constructing a breast cancer prognosis model, wherein the prognosis characteristic gene is a breast tumor stem cell related gene. The prognosis characteristic genes are BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG1 and CD24.
Based on the application of the prognosis characteristic gene in constructing a breast cancer prognosis model, the specific formula of the model is as follows: risk score of breast tumor stem cell-related genes (BCSCRS) = (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high.
Further, a nomogram was constructed based on risk scores and clinical variables for breast tumor stem cell-related genes to predict overall survival and prognostic treatment of patients.
The clinical variables include gender, TNM staging and age.
By adding the risk score of the breast tumor stem cell-related gene and the clinical variable, the overall score for each patient is calculated, and the probability of survival for each patient at 1 year, 3 years, and 5 years is estimated using the transfer function, with lower overall scores having higher patient survival probabilities.
The construction method of the prognosis model comprises the following steps:
(1) And (3) data acquisition: transcriptome data and clinical data of breast cancer in TCGA and GEO are obtained, normal tissue samples and samples from the same patient are excluded, the TCGA patients are randomly divided into training queues and internal validation queues in a ratio of 7:3 using the createDataPartition function in R-packet cart, and the patients in GSE20685 are used as external validation queues;
(2) Screening of breast tumor stem cell genes: collecting breast tumor stem cell related genes from a GeneCards database, reserving genes with a correlation score of more than 30, and screening breast tumor stem cell genes related to breast cancer prognosis by single factor Cox analysis;
(3) Construction of a prognosis model: further screening a prognosis characteristic gene of the breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method in a TCGA training queue, and constructing a breast cancer prognosis model by multivariate Cox regression analysis; the risk score of the prognostic model is bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); calculating the risk score of each sample through the prediction function in the survivinal package according to the expression quantity of the gene in the Cox regression model and the regression coefficient corresponding to the expression quantity; all patients were classified into high risk groups and low risk groups according to the median (0.985569) of BCSCRS, i.e. when BCSCRS < 0.985569, they were classified into low risk groups, when BCSCRS > 0.985569, they were included into high risk groups;
(4) And (3) verifying a prognosis model: the TCGA test queue and the GSE20685 queue are respectively used as an internal verification queue and an external verification queue to verify the accuracy of the prognosis model, and ROC, C-index and DCA indexes are used for evaluating the accuracy of the prognosis model; (5) construction of nomograms: constructing nomogram and calibration curves including patient age, sex, TNM stage and breast tumor stem cell related gene risk score (BCSCRS) using rms package; performing subject work feature (ROC) and Decision Curve Analysis (DCA) using timeROC and ggDCA software packages, comparing the predictive accuracy of nonomogram with other prognostic factors; nomograms constructed by inclusion of BCSCRS and clinical variables were used as quantitative methods for predicting survival of breast cancer patients.
The invention also provides application of the detection reagent in preparation of a kit for evaluating prognosis survival of breast cancer, wherein the detection reagent consists of reagents for detecting the following 10 gene expression levels: BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG, CD24, as a kit for achieving the only key components for assessing prognostic survival of breast cancer, the kit further comprises instructions for: bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high, and the prognosis of the patient in the low risk state is better than that of the patient in the high risk state.
Preferably, the sample detected using the kit is a fresh tissue tumor sample.
The invention has the advantages that:
1. the invention uses the indexes such as ROC, C-index, DCA and the like to compare the accuracy of the model with other existing prognosis models by obtaining transcriptome data and clinical data of breast cancer in TCGA and GEO, screening breast tumor stem cell genes related to survival, and constructing a prognosis model related to breast tumor stem cells by adopting LASSO regression and multi-variable COX regression. Based on this study, to better apply the prognostic model to the clinic, we have constructed an alignment map based on risk scores and clinical variables to predict overall survival of the patient. The constructed prognosis model based on the breast tumor stem cell related genes has better prediction accuracy (AUC values of 1, 3 and 5 years are all up to 0.7) in the training queue or the verification queue, and compared with other models, the model has better prediction accuracy, wherein the alignment chart containing clinical variables has better prediction accuracy and prediction accuracy (AUC=0.758 and C-index=0.744). The BCSCRS risk score is applied to clinical tumor microenvironment and immune infiltration immune landscape analysis and immune checkpoint inhibitor treatment responsiveness assessment, so that the low risk group has better response to immunotherapy and chemotherapy sensitivity, and the prognosis model constructed based on breast tumor stem cell related genes can be effectively applied to clinical practice of breast cancer treatment and prognosis.
2. The prognosis model based on the breast tumor stem cell related genes obtained by the method has the advantages of high sensitivity, good specificity and high accuracy, can provide effective guidance opinion for a clinician to the treatment decision of a breast cancer patient, reduces the occurrence of ineffective treatment, thereby reducing the treatment cost of the patient and promoting the clinical development and application of accurate treatment.
3. The nomogram constructed based on the risk scores and clinical variables improves the prediction performance of the prognosis model in breast cancer prognosis.
Drawings
FIG. 1 is a flow chart of construction of a prognostic model based on breast tumor stem cell-related genes.
FIG. 2 is the development and validation of a BCSCs-related prognostic model: (a-B) a minimum absolute shrinkage and selection operator to further screen for breast cancer prognosis signature genes; (C) a forest map of 10 BCSCs prognosis-related genes; (D) regression coefficients for each gene in the prognosis model; TCGA training queue (E), TCGA test queue (F), TCGA-total queue (G), GSE20685 test queue (H), 1 year, 3 years and 5 years survival scatter plots, kaplan-Meier analysis, time-dependent ROC curve analysis.
Fig. 3 is a diagram of the development and verification of nomograms: (a) a forest map of a single factor Cox regression analysis; (B) forest plots of multifactor Cox regression analysis; (C) Predicting a nomogram of survival probabilities of breast cancer patients for 1 year, 3 years and 5 years according to the risk scores and clinical factors; (D) a calibration curve of the nomogram; (E-G) 1 year, 3 years, and 5 years ROC curves for nomograms, BCSCRS, and clinical factors; (H-J) alignment, BCSCRS and Decision Curve Analysis (DCA) of clinical factors.
FIG. 4 is a graph of the results of comparing prognostic value of breast cancer with different prognostic models: (A-D) Kaplan-Meier survival curves of prognosis models constructed by BCSCRS, li, etc., wang, etc., and Zhang, et al, respectively; ROC curves (E-H) for total survival for 1 year, 3 years, and 5 years; (I) Comparing the total ROC curves of the prognostic models involved in the present study; (J-L) C-index, RMS, and DCA analysis.
FIG. 5 is a graph showing the results of an immunolandscape analysis of tumor microenvironment and immunoinfiltration: (a) GSEA enrichment analysis of high risk and low risk groups; (B) The heatmap shows the overall immune landscape for different risk groups; (C) Differential analysis of tumor microenvironments between different risk groups; (D) Differential analysis of immunoinfiltrated cells between two risk groups; and (E) correlation analysis of BCSCRS and immune infiltration cells.
FIG. 6 is a graph of the results of immune checkpoint inhibitor treatment responsiveness: (a) expression of 27 immune checkpoint molecules; (B) IPS analysis between two risk groups.
Fig. 7 is a relationship of BCSCRS with tumor progression: (A) The relationship between BCSCRS and stage, age, gender and TNM stage in TCGA queue; (B) relationship between BCSCRS and TNM phases in GSE20685 queue.
Detailed Description
Abbreviations involved in the present invention:
BCSCs: breast tumor stem cells; IARC: international cancer research institutions; WHO, world health organization; PD-1: programmed death 1; CTLA-4: cytotoxic T lymphocyte-associated antigen-4; GEO: a comprehensive gene expression database; TCGA: cancer genomic profile; NMF: non-negative matrix factorization; PCA: analyzing principal components; TMB: tumor mutational burden; OS: total survival rate; NES: normalizing the enrichment score; WGCNA: weighting gene co-expression network analysis; GO: gene ontology; KEGG: the encyclopedia of kyoto genes and genomes; LASSO: a minimum absolute shrinkage selection operator; ROC: a subject work profile; AUC: area under the curve; DCA: analyzing a decision curve; c index: a consistency index; GSEA: enrichment analysis of gene sets; TME: tumor microenvironment; ici: immune checkpoint inhibitors; IPS: immunophenotype scoring; IC50:50% maximum inhibitory concentration; BCSCRS: breast tumor stem cell-related risk score; NK: natural killer cells; TAMs: tumor-associated macrophages; APC: an antigen presenting cell; TCR: a T cell receptor;
embodiment one: and (3) constructing a prognosis model based on the breast tumor stem cell related genes.
The construction flow of the prognosis model is shown in figure 1.
1. Acquisition of data:
transcriptome data, clinical phenotype data from the GDC-TCGA-BRCA project in the UCSC Genome Browser database (https:// xenabrowser. Net/datapages /) and Gene Expression Omnibus (GEO) database (https:// www.ncbi.nlm.nih.gov/GEO /). After excluding normal tissue samples and samples from the same patient, total transcriptome data for 1069 breast cancer patients with complete clinical information were obtained from the TCGA database. TCGA patients were randomly divided into training queues (n=749) and internal validation queues (n=320) in a 7:3 ratio using the createDataPartition function in R-pack cart, and 327 patients in GSE20685 were used as external validation queues on this basis. Table 1 is the clinical profile of breast cancer patients in all cohorts. Finally, breast tumor stem cell related genes (BCSCGs) were collected from the GeneCards database (https:// www.genecards.org /), and genes with a correlation score greater than 30 were retained for subsequent analysis. And then, screening stem cell genes related to breast cancer prognosis by single factor Cox analysis for construction of a subsequent model.
Table 1 clinical features of breast cancer patients in all cohorts.
2. Constructing and verifying a prognosis model:
in the study, according to the expression quantity of genes in a model formula and the regression coefficient corresponding to the expression quantity, calculating the risk score of each sample through the prediction function in the survivinal package.
The method comprises the steps of further screening the prognostic characteristic genes of breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method, then further screening the prognostic characteristic genes of breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method in a TCGA training queue (N=749), constructing a breast cancer prognosis model by adopting multivariate Cox regression analysis, respectively adopting a TCGA test queue (N=320) and a GSE20685 queue (N=327) as an internal verification queue and an external verification queue for verifying the accuracy of the prognosis model, and then calculating the risk score of each sample by adopting a prediction function in a survivinal package according to the expression quantity of genes in the Cox regression model and the regression coefficient corresponding to the expression quantity of genes.
From 749 patients in the training cohort, we determined 45 genes associated with survival-related breast tumor stem cells using a single factor Cox assay. Next, we performed LASSO regression analysis, selecting 17 breast tumor stem cell related genes for the construction of a multifactor Cox regression model (fig. 2: a and B). Then, we constructed a prognostic model using multifactor Cox regression analysis, determining 10 genes as prognostic signature genes (fig. 2-C): BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG, CD24. As shown in fig. 2-D and table 2, the following formula was used to calculate the risk score (BCSCRS) for breast tumor stem cell-related genes:
BCSCRS=(-0.53045×BRD4)+(-0.26259×RPS24)+(-0.31334×SERPINA3)+(0.434039×SKP1)+(-0.53742×NTRK3)+(-0.23344×CD79A)+(-0.40628×JAK1)+(0.192005×NT5E)+(0.152866×NDRG1)+(0.194872×CD24)。
TABLE 2 regression coefficients of 10 characteristic genes in prognosis model
The accuracy of the Cox regression model was assessed by calculating the area under the curve (AUC) value of the subject's operating characteristics (ROC). The risk profile for all cohorts and the survival status map for all patients were plotted using the pheeatmap package. Total survival (OS) status in the higher and lower risk groups was compared using a Kaplan-Meier (KM) analysis.
All patients were classified into high risk groups and low risk groups according to the median of BCSCRS, i.e. when BCSCRS < 0.985569 (median of risk scores), then they were classified into low risk groups, and when BCSCRS > 0.985569, they were included into high risk groups.
In the TCGA training queue (n=749), the overall survival rate was significantly higher in the low risk group (n=374) than in the high risk group (n=375). Analysis of time-dependent ROC curves showed that BCSCRS had better prediction accuracy in TCGA training queues with AUCs of 0.733 (1 year), 0.742 (3 years) and 0.741 (5 years), respectively (fig. 2-E); AUC for TCGA test queues for 1 year, 3 years, and 5 years were 0.808, 0.689, and 0.646, respectively (fig. 2-F); AUC of the TCGA total queue for 1 year, 3 years and 5 years was 0.751, 0.728, 0.707, respectively; AUCs for GSE20685 external validation queues for 1, 3 and 5 years were 0.765, 0.718, 0.692, respectively, indicating that BCSCRS had better accuracy in predicting survival (figures 2:G and H). The prognosis model constructed based on 10 genes related to the breast tumor stem cells has good accuracy and can be used for clinical practice of breast cancer prognosis treatment.
The invention screens the characteristics of target genes: the screening method of double dimension reduction is selected, namely, after the gene related to the prognosis of the breast cancer is obtained by single factor Cox analysis, the gene of the prognosis of the breast cancer is further screened by using a LASSO regression method, so that the screening method can help us to further reduce the dimension of data and obtain more accurate prognosis genes.
Embodiment two: constructing an alignment chart:
to improve the accuracy of the prognostic model and better apply to clinical practice, we have further developed a nomogram (nomogram) model comprising risk scores (i.e. risk scores of breast tumor stem cell-related genes, BCSCRS) and clinical variables (such as age and tumor stage).
First, single-factor and multi-factor Cox regression analysis was performed to assess whether risk scores and clinical variables could be used as independent prognostic factors. Then, the rms package was used to construct nomogram and calibration curves including patient age, sex, TNM stage and risk score. To compare the predictive accuracy of nomogram with other prognostic factors, subject work characteristics (ROC) and Decision Curve Analysis (DCA) were performed using timeROC and ggDCA software packages, respectively.
The potential independence of BCSCRS as a prognostic factor was investigated by univariate and multivariate cox regression analysis (table 3).
Table 3 univariate and multivariate cox regression analysis of risk scores and clinical features.
* Independent prognostic factors.
The results indicate that Risk scores (Risk score), age (Age), stage (Stage) and T, N, M Stage (Stage) are significantly correlated with prognosis of cancer (fig. 3-a, p < 0.001), and that multivariate cox regression analysis indicates that both Risk scores and ages can be used as independent prognostic factors for breast cancer patients (fig. 3-B, p < 0.001).
To investigate the potential association between BCSCRS and multiple clinical variables, wilcoxon and Kruskal-Wallis assays were performed. The results showed that BCSCRS increased with increasing tumor stage in the TCGA cohort, showing significant differences between stages (fig. 7-a). The risk scores for the T and N phases were on an upward trend, with significant differences between each group, but the risk scores for the N3 phases were opposite. In addition, BCSCRS is significantly higher in late M patients and patients over 65 years old. There was no statistically significant difference in risk scores between the different categories, similar results were also obtained in the GSE20685 cohort, with significantly higher risk scores at the advanced TNM stage (fig. 7-B).
These findings indicate that BCSCRS varies significantly between different sets of clinical variables, and that higher risk scores indicate poorer pathology in breast cancer patients.
By including risk scores and clinical variables, nomograms were constructed as a quantitative method for predicting survival of breast cancer patients (fig. 3-C). The overall score for each patient was calculated by summing the risk score and clinical variables including gender, TNM stage and age. Patients with lower total scores have higher survival probabilities. The accuracy of the alignment graph was assessed by the area under the calibration curve (fig. 3-D) and ROC curve. The prediction accuracy of the nomograms is better than other clinical features and the original risk scores. AUC for 1, 3 and 5 years of alignment in TCGA cohorts were 0.805, 0.746 and 0.758, respectively (figures 3:E, F, G). In addition, decision Curve Analysis (DCA) demonstrated better accuracy of alignment prediction than other predictors (FIG. 3: H, I, J). The accuracy of the prognosis model is improved, and the method can be better applied to clinical practice.
Embodiment III: analysis and comparison with existing breast cancer predictive models
While we have demonstrated the accuracy of BCSCRS from a number of perspectives, the most important aspect of the clinical prognosis model is its superiority in predictive performance in clinical practice. To verify that the breast cancer prognosis model constructed in this study has better predictive performance, we compared it to three different prognosis models, respectively.
The first model was the iron death-related prognosis model established by Wang et al [1], involving a total of 9 genes ALOX15, CISD1, CS, GCLC, GPX4, SLC7a11, EMC2, G6PD and ACSF 2; the second model is the macrophage signature gene model [2] constructed by Li et al, involving 7 genes in total for SERPINA1, CD74, STX11, ADAM9, CD24, NFKBIA and PGK 1; the third model is the lactic acid metabolism-related prognosis model proposed by Zhang et al [3], involving three genes, LDHD, LYRM7 and PNKD. The construction method of the model is consistent with that of the literature, and in order to reduce errors caused by different dimensions of the data, the analysis is carried out in the same transcriptome data, the expression level of genes in each model is extracted, and the multi-variable Cox regression is carried out to obtain regression coefficients of the genes. Subsequently, a risk score is calculated for each sample and the predictive power and clinical utility of each model is assessed by a consistency index (C-index) and Decision Curve Analysis (DCA), as well as a subject work feature (ROC) curve and survival analysis. All analyses were performed using timeROC and survival packages in R software.
The references for the construction of the three different prognosis models compared by the invention are as follows:
[1].Wang D,Wei G,Ma J et al.Identification of the prognostic value of ferroptosis-related gene signature in breast cancer patients,BMC Cancer 2021;21:645.
[2].Li Y,Zhao X,Liu Q et al.Bioinformatics reveal macrophages marker genes signature in breast cancer to predict prognosis,Ann Med 2021;53:1019-1031.
[3].Zhang Z,Fang T,Lv Y.A novel lactate metabolism-related signature predicts prognosis and tumor immune microenvironment of breast cancer,Front Genet 2022;13:934830.
as shown in FIG. 4, the survival curves showed that the low risk group survival was higher (FIG. 4: A-D). In addition to the characteristics of Zhang et al (auc= 0.502,0.522,0.568), other characteristics showed good potential in predicting survival of breast cancer for 1, 3 and 5 years based on the area under the subject's working characteristics curve (fig. 4:E-H). The accuracy of BCSCRS (auc=0.694) and nomogram (auc=0.758) established in this study was higher than other features (fig. 4:I). Graphs optimized for clinical variables are not included in the feature comparison, but are used only for auxiliary validation. The results of the C-index, RMS, DCA analysis further demonstrated that BCSCRS has excellent accuracy in predicting breast cancer survival (FIG. 4:J-L).
Embodiment four: application of prognosis model based on breast tumor stem cell related gene-cancer immune landscape analysis based on BCSCRS.
In view of the significant correlation between BCSC core gene and immune activity that we observed in studies and assays, we performed GSVA and GSEA assays to explore this correlation further, it was found that in the high risk group, the signaling pathway was significantly enriched in biological processes such as steroid biosynthesis, fructose and mannose metabolism, protein export, proteasome and the citrate cycle TCA cycle. In contrast, the low risk group's pathway is characterized by primary immunodeficiency and T cell receptor signaling pathways (figure 5:A), suggesting a potential link between the low risk group and immunity.
To further explore this relationship, we studied features related to immunolandscapes, including TME and immunoinfiltration (fig. 5:B). The results showed that the estimated score, immune score and matrix score (these several scores were directly in english) were significantly higher for the low risk group than for the high risk group, whereas the tumor purity results were the opposite (fig. 5: c). These findings indicate that the content of TME stromal cells and immune cells is higher in the low risk group than in the high risk group. Subsequently, the analysis results of ssGSEA showed that immune cell expression levels were higher in the low risk group TME, except macrophages (fig. 5:B). The infiltration abundance of 22 specific immune cells in the high and low risk groups was further analyzed using the cibelort algorithm, and B cells, plasma cells, memory activated CD 4T cells, CD8T cells, and gamma-delta T cells of the low risk group were observed to infiltrate more, while M0 and M2 macrophages of the high risk group infiltrate more (fig. 5:D). In addition, the infiltration levels of B cells, plasma cells, memory activated CD 4T cells, CD8T cells, and gamma-delta T cells were inversely correlated with risk scores (figure 5:E).
These results indicate that there is a close relationship between BCSCRS and immune cells, and a lower risk score indicates higher expression of stromal cells and immune cells in TME.
Fifth embodiment: application of prognosis model based on breast tumor stem cell related gene-BCSCRS-based immunotherapy response assessment.
To further investigate the relationship between BCSCRS and immunotherapy response, we evaluated several indicators:
first, we analyzed the expression of immune checkpoint molecules, and the results showed that the expression of 27 immune checkpoints was significantly higher in the low risk group, suggesting that these patients may be more reactive to immune checkpoint inhibitors (figure 6:A).
We also used the IPS scores of PD1 and CTLA4 as quantitative indicators to further evaluate the effectiveness of immune checkpoint inhibitors. The study results showed that the low risk groups were significantly higher in IPS-CTLA4, IPS-PD1 and IPS-PD1-CTLA4 scores, indicating that these patients had better efficacy when treated with PD-1 and CTLA4 inhibitors (fig. 6:B).
In addition, since BCSCs have been reported to be involved in the drug resistance process of cancer, we also analyzed drug sensitivity of common chemotherapeutics for breast cancer in different risk groups. The study results indicate that the low risk group is more sensitive to chemotherapeutic agents, such as cisplatin, doxorubicin, gemcitabine, methotrexate, paclitaxel, and vinorelbine, indicating that these patients in the low risk group will have better efficacy when treated with chemotherapeutic agents with less likelihood of developing resistance.
Therefore, the research results show that the low-risk group can have better response to immunotherapy and chemotherapy, which has very important clinical practical significance, and also proves that the prognosis model based on the breast tumor stem cell related genes constructed by the invention can be effectively applied to the clinical practice of breast cancer treatment and prognosis.
Breast Cancer Stem Cells (BCSCs) may be the origin of breast cancer heterogeneity, thought to be involved in regulating the response of breast cancer immunotherapy. Thus, understanding the prognostic value and immunoreactivity of BCSCs is crucial to determining patients who are likely to benefit from immunotherapy. Firstly, transcriptome data and clinical data of breast cancer in TCGA and GEO are obtained, then breast tumor stem cell genes related to survival are screened out, a breast cancer stem cell prognosis model is constructed by adopting LASSO regression and multi-variable COX regression, and indexes such as ROC, C-index and DCA are used for comparing the accuracy of the model with that of other existing prognosis models. Based on the research, in order to better apply the prognosis model to clinic, an alignment chart is constructed based on the risk score and clinical variables to predict the overall survival of patients, so that the accuracy of the prognosis model is improved, and the method is better applied to clinic practice.
Results: our study constructed a 10 gene breast cancer stem cell-related prognostic model, with better prediction accuracy (AUC values of 1, 3, 5 years all up to 0.7) for the prognostic model constructed by the study, whether in training or validation cohorts, and in addition, the model had better prediction accuracy compared to other models, with better prediction accuracy and prediction accuracy for nomograms incorporating clinical variables (auc=0.758, c-index=0.744).
The invention judges the prognosis of the patient based on the prognosis characteristic gene of the breast tumor stem cells and a corresponding prediction model (namely, according to the sum of the products of the expression quantity of the prognosis characteristic genes of the breast tumor stem cells and the coefficients thereof and the clinical characteristics of the patient, the invention has the advantages of efficiently and accurately predicting the prognosis of the breast cancer patient, provides effective guidance for the treatment decision of the breast cancer patient for clinicians, reduces the occurrence of ineffective treatment, and further reduces the treatment cost and uncomfortable experience of the patient.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not all described in detail nor are they intended to limit the invention to the specific embodiments described. Obviously, other relevant modifications may be made in view of the present description. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.
Claims (9)
1. A prognostic signature for breast cancer prognosis model construction, characterized by: the prognosis characteristic gene is a breast tumor stem cell related gene.
2. A prognostic signature for breast cancer prognosis model construction according to claim 1, characterized in that the prognostic signature is BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG1, CD24.
3. Use of the prognostic signature gene according to claim 2 in the construction of a breast cancer prognostic model, wherein the prognostic model includes: risk score of breast tumor stem cell-related genes (BCSCRS) = (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high.
4. A use according to claim 3, characterized in that: nomograms were constructed based on risk scores and clinical variables of breast tumor stem cell-related genes to predict overall survival and prognostic treatment of patients.
5. The use according to claim 4, characterized in that: the clinical variables include gender, TNM staging and age.
6. The use according to claim 5, characterized in that: the total score of each patient is calculated by adding the risk score of the breast tumor stem cell related genes and the scores of all clinical variables, so that the survival probability of breast cancer patients in 1 year, 3 years and 5 years is predicted, and the survival probability of patients with lower total score is higher.
7. The use according to claim 6, wherein the method of constructing a prognostic model comprises the steps of:
(1) And (3) data acquisition: transcriptome data and clinical data of breast cancer in TCGA and GEO are obtained, normal tissue samples and samples from the same patient are excluded, the TCGA patients are randomly divided into training queues and internal validation queues in a ratio of 7:3 using the createDataPartition function in R-packet cart, and the patients in GSE20685 are used as external validation queues;
(2) Screening of breast tumor stem cell genes: collecting breast tumor stem cell related genes from a GeneCards database, reserving genes with a correlation score of more than 30, and screening breast tumor stem cell genes related to breast cancer prognosis by single factor Cox analysis;
(3) Construction of a prognosis model: further screening a prognosis characteristic gene of the breast cancer by adopting a minimum absolute shrinkage selection operator (LASSO) Cox regression method in a TCGA training queue, and constructing a breast cancer prognosis model by multivariate Cox regression analysis; the risk score of the prognostic model is bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); calculating the risk score of each sample through the prediction function in the survivinal package according to the expression quantity of the gene in the Cox regression model and the regression coefficient corresponding to the expression quantity; all patients were classified into high risk groups and low risk groups according to the median (0.985569) of BCSCRS, i.e. when BCSCRS < 0.985569, they were classified into low risk groups, when BCSCRS > 0.985569, they were included into high risk groups;
(4) And (3) verifying a prognosis model: the TCGA test queue and the GSE20685 queue are respectively used as an internal verification queue and an external verification queue to verify the accuracy of the prognosis model, and ROC, C-index and DCA indexes are used for evaluating the accuracy of the prognosis model;
(5) Constructing an alignment chart: constructing nomogram and calibration curves including patient age, sex, TNM stage and breast tumor stem cell related gene risk score (BCSCRS) using rms package; performing subject work feature (ROC) and Decision Curve Analysis (DCA) using timeROC and ggDCA software packages, comparing the predictive accuracy of nonomogram with other prognostic factors; nomograms constructed by inclusion of BCSCRS and clinical variables were used as quantitative methods for predicting survival of breast cancer patients.
8. The application of the detection reagent in preparing a kit for evaluating the prognosis survival of breast cancer is characterized in that the detection reagent consists of reagents for detecting the following 10 gene expression levels: BRD4, RPS24, SERPINA3, SKP1, NTRK3, CD79A, JAK1, NT5E, NDRG, CD24, as a kit for achieving the only key components for assessing prognostic survival of breast cancer, the kit further comprises instructions for: bcscrs= (-0.53045 ×brd4) +(-0.26259 ×rps24) +(-0.31334 ×serpina 3) + (0.434039 ×skp1) +(-0.53742 ×ntrk3) +(-0.23344 ×cd79A) +(-0.40628 ×jak1) + (0.192005 ×nt 5E) + (0.152866 ×ndrg1) + (0.194872 ×cd24); when BCSCRS is less than 0.985569, the risk state is judged to be low, and when BCSCRS is more than 0.985569, the risk state is judged to be high, and the prognosis of the patient in the low risk state is better than that of the patient in the high risk state.
9. The use according to claim 8, characterized in that: the specimen detected by the kit is a fresh tissue tumor specimen.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310557532.4A CN116656820A (en) | 2023-05-17 | 2023-05-17 | Prognosis model based on breast tumor stem cell related genes and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310557532.4A CN116656820A (en) | 2023-05-17 | 2023-05-17 | Prognosis model based on breast tumor stem cell related genes and application thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116656820A true CN116656820A (en) | 2023-08-29 |
Family
ID=87721618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310557532.4A Pending CN116656820A (en) | 2023-05-17 | 2023-05-17 | Prognosis model based on breast tumor stem cell related genes and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116656820A (en) |
-
2023
- 2023-05-17 CN CN202310557532.4A patent/CN116656820A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112011616B (en) | Immune gene prognosis model for predicting hepatocellular carcinoma tumor immunoinfiltration and postoperative survival time | |
CN112133365B (en) | Gene set for evaluating tumor microenvironment, scoring model and application of gene set | |
DK2922967T3 (en) | PROCEDURE FOR VIEWING A PRESENCE OR NON-PRESENCE OF AGGRESSIVE PROSTATANCES | |
CN103299188B (en) | Molecular diagnostic assay for cancer | |
CN110577998A (en) | Construction of molecular model for predicting postoperative early recurrence risk of liver cancer and application evaluation thereof | |
CN111564214A (en) | Establishment and verification method of breast cancer prognosis evaluation model based on 7 special genes | |
CN111676288B (en) | System for predicting lung adenocarcinoma patient prognosis and application thereof | |
JP2015535176A (en) | A novel method for predicting overall and relapse-free survival in hepatocellular carcinoma | |
CN112614546B (en) | Model for predicting hepatocellular carcinoma immunotherapy curative effect and construction method thereof | |
CN105874080A (en) | Molecular diagnostic test for oesophageal cancer | |
EP2922970B1 (en) | Prognostic method for individuals with prostate cancer | |
CN113192560A (en) | Construction method of hepatocellular carcinoma typing system based on iron death process | |
CN111128385A (en) | Prognosis early warning system for esophageal squamous carcinoma and application thereof | |
US20240002949A1 (en) | Panel of mirna biomarkers for diagnosis of ovarian cancer, method for in vitro diagnosis of ovarian cancer, uses of panel of mirna biomarkers for in vitro diagnosis of ovarian cancer and test for in vitro diagnosis of ovarian cancer | |
He et al. | Epstein-Barr virus DNA loads in the peripheral blood cells predict the survival of locoregionally-advanced nasopharyngeal carcinoma patients | |
CN113201590A (en) | lncRNA for evaluating early recurrence risk of hepatocellular carcinoma, evaluation method and device | |
WO2022156610A1 (en) | Prediction tool for determining sensitivity of liver cancer to drug and long-term prognosis of liver cancer on basis of genetic testing, and application thereof | |
CN115798703A (en) | Apparatus and computer-readable storage medium for predicting prognosis of renal clear cell carcinoma based on novel fatty acid metabolism-related gene | |
CN112481380B (en) | Marker for evaluating anti-tumor immunotherapy reactivity and prognosis survival of late bladder cancer and application thereof | |
Wang et al. | A five-gene signature for recurrence prediction of hepatocellular carcinoma patients | |
CN116656820A (en) | Prognosis model based on breast tumor stem cell related genes and application thereof | |
CN106119406A (en) | Multiple granuloma vasculitis and the genotyping diagnosis test kit of small arteritis and using method | |
CN114507717A (en) | Method for predicting bile duct cancer recurrence by combining multiple mRNAs and application thereof | |
CN117476097B (en) | Colorectal cancer prognosis and treatment response prediction model based on tertiary lymphoid structure characteristic genes, and construction method and application thereof | |
Zhong et al. | Distinguishing kawasaki disease from febrile infectious disease using gene pair signatures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |