CN114438209A - Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis - Google Patents

Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis Download PDF

Info

Publication number
CN114438209A
CN114438209A CN202210118821.XA CN202210118821A CN114438209A CN 114438209 A CN114438209 A CN 114438209A CN 202210118821 A CN202210118821 A CN 202210118821A CN 114438209 A CN114438209 A CN 114438209A
Authority
CN
China
Prior art keywords
model
expression
breast cancer
overall survival
expression level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210118821.XA
Other languages
Chinese (zh)
Inventor
饶皑炳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Luwei Biotechnology Co ltd
Original Assignee
Shenzhen Luwei Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Luwei Biotechnology Co ltd filed Critical Shenzhen Luwei Biotechnology Co ltd
Priority to CN202210118821.XA priority Critical patent/CN114438209A/en
Publication of CN114438209A publication Critical patent/CN114438209A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Abstract

The invention relates to a marker and a model for predicting the overall survival of the clinical prognosis of triple negative breast cancer, wherein the marker comprises one or the combination of more than two of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1. The invention establishes an accurate clinical prognosis overall survival prediction model by using the total transcriptome data of the Sanyin breast cancer slices based on the iterative linear regression algorithm of the correlation coefficient layering by taking the overall survival rate as the target function of the model. The model provided by the invention can be used for evaluating the overall survival rate of clinical prognosis before the treatment of the triple negative breast cancer patient, and is used for guiding the formulation of an accurate treatment scheme.

Description

Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis
Technical Field
The invention relates to the technical field of precise medical treatment, in particular to a marker and a model for predicting the overall survival of the clinical prognosis of triple negative breast cancer.
Background
Triple Negative Breast Cancer (TNBC) refers to a breast cancer in which the Estrogen Receptor (ER), the Progestin Receptor (PR), and the proto-oncogene Her-2 are negative as a result of immunohistochemical examination of cancer tissues, accounting for about 20% of all breast cancers. The traditional Chinese medicine is mainly applied to young women before menopause, the clinical manifestations are invasive disease course, the distant metastasis risk is higher, the visceral metastasis chance is higher than the bone metastasis, and the brain metastasis probability is higher. The risk of distant metastasis of triple negative breast cancer peaks at3 years. At present, no treatment guidance specific to TNBC exists, and although chemotherapy and adjuvant chemotherapy have certain curative effects, the prognosis is still poor, and the death risk is high. Theoretically, if the molecular detection of triple negative breast cancer can be carried out based on the transcriptome information algorithm model, the method can be helpful for providing personalized accurate medical treatment for triple negative breast cancer patients.
In the prior art, transcriptome is already used for precise treatment of breast cancer, the source of the accurate treatment is breast cancer 21 gene detection proposed by the U.S. gene health company in 2005, recurrence scores are obtained according to mRNA expression levels of 21 genes, and the accurate treatment is used for detecting ER positive or PR positive and HER2 negative early-stage breast cancer patients without lymph node spread, and guiding the postoperative chemotherapy or auxiliary hormone therapy according to the following steps: no chemotherapy is recommended for relapse scores of no more than 17 points. Through clinical application for 15 years, 21 gene detection is written in the recommended detection of NCCN breast cancer treatment guidelines, and great success is achieved. For the high risk population with 21 gene testing (recurrence score >31), clinical results validated the effectiveness of chemotherapy. However, studies have shown that TNBC is only 56% identical to BLBC, so that an independent molecular typing model needs to be established for TNBC.
In the prior art, there have been reports relating to molecular models of TNBC. For example, Lehman et al, in 2011, used public database data to molecularly classify triple-negative breast cancer using k-means multi-tier clustering, and classified triple-negative breast cancer into 6 subtypes according to gene expression patterns: basal-like 1(BL1), basal-like 2(BL2), Immunomodulation (IM), mesenchymal (M), mesenchymal stem cell-like (MSL), and Luminal Androgen Receptor (LAR). An attempt was made to establish a precise treatment regimen for each subtype using approximately 30 breast cancer cell line models corresponding to 6 subtypes, respectively.
The third army medical university southwestern hospital pathology research institute and the southwestern cancer central benzyl force training team performed a more comprehensive review on the academic research development of TNBC transcriptome molecular typing in 2020, which divided molecular typing into three development stages. The 6 subtypes mentioned above are stage I. Busstein et al classified the improved subtypes into four subtypes in 2015: LAR, expressing AR and cell surface mucin MUC 1; m, expression growth factor receptor (platelet derived growth factor receptor alpha [ PDGFR alpha ] and c-Kit receptor); BLISs (basal-like immunosuppressive), expressing the immunosuppressive molecule VTCN 1; BLIA (basal-like immune activation), expresses STAT signaling molecules and releases cytokines. Prognostic analysis showed that the order of disease-free survival (DFS) was: BLIA > M > LAR > BLIS (p ═ 0.019), Disease Specific Survival (DSS) is: blisa > M > LAR > BLIS (p ═ 0.07), this is the second stage. The third stage is that the model optimization is carried out by combining the double denier typing (FUSCC), Liu and the like with an mRNA and non-coding long RNA (lncRNA) co-expression network, and 4 optimized subtypes are obtained: IM, immunoregulation; LAR, luminal androgen receptor; MES, mesenchymal sample; BLISs, basal-like, and immunosuppression. And (3) constructing a TNBC prognosis prediction model through complete transcriptome data, and distinguishing postoperative high-risk and low-risk relapse risk groups.
However, the application of molecular models based on transcriptome gene expression in TNBC is still in the academic research stage at present, and no model is widely verified and applied clinically. Those skilled in the art would like to develop a prognostic model based on TNBC transcriptome gene expression, which would help to achieve accurate treatment of triple negative breast cancer, improve therapeutic approaches and prognosis.
Disclosure of Invention
The invention aims to provide a marker and a model for predicting the overall survival of the clinical prognosis of the triple negative breast cancer, and the marker and the model can be used for evaluating the overall survival condition of the clinical prognosis of a patient in advance before treatment, so that the development of a postoperative treatment scheme for the triple negative breast cancer patient can be guided.
To this end, in a first aspect, the present invention provides a marker for predicting the overall survival of the triple negative breast cancer in clinical prognosis, wherein the marker comprises one or a combination of two or more of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1.
In some embodiments, the markers comprise a combination of at least seventeen, at least sixteen, at least fifteen, at least fourteen, at least thirteen, at least twelve, at least eleven, at least ten, at least nine, at least eight, at least seven, at least six, at least five, at least four, at least three, or at least two of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1.
In some embodiments, the marker consists of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1.
In some embodiments, the marker consists of the following genes: HSD3B2, RAD 51C.
In some embodiments, the marker consists of the following genes: GREM1, TPMT.
In some embodiments, the marker consists of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1.
In some embodiments, the marker consists of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, KIT, RAD51C, STAT3, TPMT, URM 1.
In a second aspect of the present invention, there is provided a model for predicting overall survival of triple negative breast cancer in clinical prognosis, wherein the model is:
Figure BDA0003497645900000031
the model takes one or more than two combinations of the following genes as markers: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1; the ExptExpressed amount of mRNA of each gene in the marker,. beta.tAs a weighting coefficient corresponding to each gene.
Further, the weighting coefficients are calculated according to iterative linear regression.
In some embodiments, the model is: survival Score (OS Score) ═ βBIRC7X expression amountBIRC7)+(βBMPR1AX expression amountBMPR1A)+(βCASP3X expression amountCASP3)+(βCNN1X expression amountCNN1)+(βCSF3RX expression amountCSF3R)+(βDTLX expression amountDTL)+(βEXO1X expression amountEXO1)+(βEXTL2X expression amountEXTL2)+(βFHL1X amount of expressionFHL1)+(βGREM1X expression amountGREM1)+(βHSD3B2X expression amountHSD3B2)+(βIGF1RX expression amountIGF1R)+(βIKZF1X expression amountIKZF1)+(βKITX expression amountKIT)+(βRAD51CX expression amountRAD51C)+(βSTAT3X expression amountSTAT3)+(βTPMTX expression amountTPMT)+(βURM1X expression amountURM1) (ii) a Wherein, the expression quantity is the mRNA expression quantity of the corresponding gene, and beta is the weighting coefficient of the corresponding gene calculated according to the iterative linear regression.
Further, in the model, βBIRC7、βIGF1R、βRAD51C、βCNN1、βIKZF1、βBMPR1A、βTPMT、βEXO1、βEXTL2Are all less than 0, betaURM1、βCASP3、βSTAT3、βDTL、βGREM1、βCSF3R、βKIT、βHSD3B2、βFHL1Are all greater than 0.
In some embodiments, the model is: survival Score (OS Score) — (0.0445X expression level)BIRC7) + (-0.1073X expression levelBMPR1A) + (0.1153X expression levelCASP3) + (-0.0622X expression levelCNN1) + (0.0642X expression levelCSF3R) + (0.0861X expression levelDTL) + (-0.1108X expression levelEXO1) + (-0.1351X expression levelEXTL2) + (0.0436 Xexpression level)FHL1) + (0.0688X expression levelGREM1) Expression level of + (0.0506 ×)HSD3B2) + (-0.0472X expression levelIGF1R) + (-0.0924X expression levelIKZF1) + (0.0565X expression levelKIT)+(-0.0533×Expression levelRAD51C)+ (0.1121X expression levelSTAT3) + (-0.1095X expression levelTPMT) + (0.1669X expression levelURM1)。
In a third aspect of the invention, the application of the marker or the model in the preparation of a product for predicting the overall survival of the triple negative breast cancer clinical prognosis is provided.
The fourth aspect of the present invention provides a method for constructing a model for predicting the overall survival of the clinical prognosis of triple negative breast cancer, comprising the following steps:
s1, acquiring complete transcriptome data of the breast cancer slices of the three-negative breast cancer patient; performing data preliminary screening and standardization to form a data set;
s2, in the data set, using t-test to search for genes with statistical significance (p is less than 0.05) capable of distinguishing the population with clinical prognosis survival and clinical prognosis death, and obtaining differentially expressed genes;
s3, dividing the differentially expressed genes into an expression up-regulation group and an expression down-regulation group, respectively carrying out hierarchical association coefficient clustering on the expression up-regulation group and the expression down-regulation group, and selecting the gene with the largest average association degree with other genes of the cluster in each cluster as a model candidate gene;
s4, carrying out iterative linear regression analysis on the model candidate genes by circulating different model variable numbers (S), and establishing a model for overall survival of the clinical prognosis of the triple negative breast cancer;
the model is as follows:
Figure BDA0003497645900000041
the model takes one or more than two combinations of the following genes as markers: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1; the ExptExpressed amount of mRNA of each gene in the marker,. beta.tAs a weighting coefficient corresponding to each gene.
In a fifth aspect of the invention, a kit for predicting the overall survival of the triple negative breast cancer clinical prognosis is provided, and the kit comprises a detection reagent of the marker.
Further, the kit comprises the model of the invention.
Further, the step of predicting the clinical prognosis overall survival of triple negative breast cancer by using the kit comprises the following steps:
(1) detecting the mRNA expression level of the marker in a test sample from a patient with triple negative breast cancer;
(2) introducing the mRNA expression level of the marker detected in the step (1) into the model of the invention, and calculating to obtain a survival Score (OS Score);
(3) when the survival Score (OS Score) is greater than a threshold Score, the triple negative breast cancer patient has a low overall survival rate for clinical prognosis; when the survival Score (OS Score) is less than the threshold Score, the triple negative breast cancer patient has a high overall survival rate for clinical prognosis.
According to the technical scheme of the invention, the treatment scheme of the patient can be guided by referring to the prediction result of the clinical prognosis overall survival.
Further, the sample to be tested is derived from tissue, body fluid, or the like.
In a sixth aspect of the present invention, there is provided an apparatus for predicting overall survival of clinical prognosis of triple negative breast cancer, the apparatus comprising a detection device and a processor;
the detection device is used for detecting the mRNA expression level of the gene in the marker of the invention in a sample to be detected;
the processor is used for reading the mRNA expression quantity of the gene measured by the detection device, calculating the survival Score (OS Score) according to the model provided by the invention, and predicting that the overall survival rate of the subject is high or low for clinical prognosis according to the threshold Score.
Further, the apparatus further comprises an output device for outputting the prediction result of the processor.
In some embodiments, the output device is a display.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention takes Overall Survival rate (OS) as an objective function of a model, and establishes an accurate clinical prognosis Overall Survival prediction model by using complete transcriptome data of TNBC breast cancer slices through an original modeling method, namely an iterative linear regression algorithm based on correlation coefficient layering. The model provided by the invention can evaluate the overall survival rate of clinical prognosis before the TNBC patient is treated, and guide to make an accurate treatment scheme for the TNBC patient.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. In the drawings:
FIG. 1: carrying out random binary cross validation on representative AUC values in a repeated process by using BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT and URM1 as integral survival models of markers (the repeated times are 20 times);
FIG. 2: the expression level of each gene in the overall survival marker provided by the invention is shown in a diagram;
FIG. 3: an ROC curve when a single gene is used as a marker to evaluate the overall survival prediction result;
FIG. 4: carrying out random binary cross validation on representative AUC values (the repetition times are 20) in a data random binary cross validation repeating process by using HSD3B2 and RAD51C as a whole survival model of the marker;
FIG. 5: carrying out a representative AUC value (the repetition times are 20) in a data random binary cross validation repetition process by using the GREM1 and the TPMT as integral survival models of the markers;
FIG. 6: carrying out random binary cross validation on representative AUC values in a repeated process by using BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT and URM1 as markers of the overall survival model (the repeated times are 20);
FIG. 7: carrying out random binary cross validation on representative AUC values in a repeated process of data by using BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, KIT, RAD51C, STAT3, TPMT and URM1 as markers (the repeated times are equal to 20);
FIG. 8: the TCGA breast cancer data set BRCA in the embodiment 1 is adopted, and the ROC curve obtained by describing the overall survival model (taking 18 genes as markers) provided by the invention is used for evaluating the capability of the model for predicting the overall survival of clinical prognosis;
FIG. 9: 0.4098 is taken as a threshold score, the survival rate is low when the threshold score is larger than the threshold score, and the survival rate is high when the threshold score is smaller than the threshold score; and (4) performing survival analysis by using a cox model, and obtaining a corresponding overall survival time K-M curve.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The invention takes Overall Survival rate (OS) as an objective function of a model, and establishes an accurate clinical prognosis Overall Survival prediction model by using complete transcriptome data of TNBC breast cancer slices through an iterative linear regression algorithm based on correlation coefficient layering.
The model is as follows:
Figure BDA0003497645900000071
the model takes one or more than two combinations of the following genes as markers: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1; the ExptExpressed amount of mRNA of each gene in the marker,. beta.tAs a weighting coefficient corresponding to each gene.
Any subset of the markers described herein is evaluated, and depending on the results of the evaluation, in some embodiments, at least seventeen, at least sixteen, at least fifteen, at least fourteen, at least thirteen, at least twelve, at least eleven, at least ten, at least nine, at least eight, at least seven, at least six, at least five, at least four, at least three, or a combination of at least two of the eighteen genes described above can be used as markers. In other embodiments, a combination of the eighteen genes described above may be used as a marker. In still other embodiments, any of the eighteen genes described above can be used as markers.
Among the genes in the marker provided by the present invention, a plurality of genes are involved in a plurality of signaling pathways and functional categories such as TGF-a/b, Wnt, NF-kB, Hippo, Hedgehog, Notch, etc., and they can be classified according to their main functions and signaling pathways as follows:
DNA damage repair and chromosome stability: URM1, RAD51C, EXO 1;
programmed cell death, apoptosis and cell forgetting: BIRC7, CASP3, IGF 1R;
TGF-b family members: BMPR 1A;
cell cycle: DTL;
cytoskeleton and ECM: EXTL2, CCN 1;
immune cells: HSD3B2, IKZF 1;
hormones: CSF3R, FHL 1;
STAT 3-related pathway: STAT3, KIT, GREM 1;
drug metabolism: TPMT
Bray, s.m. [ BRA2019 ] studied the destructive role of URM1 in DNA damage repair, URM1 altered the MRE11/RAD50 complex, which is involved in the initial recognition of DNA damage and the replenishment of subsequent repair factors through homologous recombination regulatory mechanisms. Whereas the ubiquitination-proteasome destruction pathway is a mechanism for extracting DNA repair components from chromatin when the DNA repair process is terminated. URM1 plays a role in the ubiquitin-like binding pathway (called Ursylation) and tRNA thioacylation, and URM1 is involved in Ursylation and degradation of RAD50, thereby interrupting the DNA damage repair process, resulting in chromosomal instability. On the other hand, Delaunay S, et al [ DEL2016 ] studies that the important proteins ELP3, CTU1/2 and related enzymes such as URM1, etc. in the post-transcriptional modification mechanism of tRNA in its oscillating uridine (wobbable uridine) group U34 are highly expressed in breast cancer, promote the transfer, and enable the invasion promoting factor LEF1 to realize the Internal Ribosome Entry Site (IRES) dependent transcription by expressing the oncogene DEK. URM1 weights were positive and maximal in the survival score model, indicating that URM1 is highly expressed as the most risk factor for TNBC patient death.
The second important gene for DNA damage repair in the model is tumor suppressor RAD 51C. First, RAD51C participates in the homologous recombination process of DNA damage repair and induces RAD51 to DSB upon DNA Double Strand Break (DSB). Second, RAD51C is more associated with breast cancer through direct action with the BRCA-RAD51 complex. Thirdly, RAD51C is a checkpoint signal regulator that aids ATM in phosphorylation of CHK2, slowing cell cycle progression. Studies by Alayev a, et al [ ALA2016 ] show that estrogen induces the expression of RAD5C to repair when it causes DNA damage to DSB, and thus abnormalities in the repair process of DSB damage may lead to the cancerization of ER + breast cancer. Analysis of ER + breast cancer samples indicated that high expression of RAD51C was a negative factor for survival without distant metastasis and was verified with RFS. In contrast to ER + breast cancer, in the survival model of TNBC, RAD51C weight was negative, with high expression RAD51C corresponding to a better survival prognosis.
The third important gene for DNA damage repair in the model is EXO 1. Yan S, et al [ YAN2021 ] reviews the functions of EXO1 including participation in DNA double strand break repair, mismatch repair (MMR), meiotic recombination, telomere maintenance. In the survival model of TNBC, EXO1 is weighted as a negative number and has a larger absolute value, which indicates that EXO1 is an important positive factor for the survival time of TNBC.
BIRC7 belongs to coding genes of an apoptosis inhibitor protein family, researches show that a plurality of cancer cells realize the tolerance to apoptosis through the expression of BIRC family genes, and the abnormal expression of BIRC genes in a TNBC sample shows that the normal process of cancer cell apoptosis of TNBC is blocked, thereby resisting chemotherapy. Makuch-Kocka et al [ MAK2021 ] investigated the clinical significance of TNBC expression of 8 BIRC family genes comprising BIRC 7. BIRC7 has dual activities of apoptosis promotion and anti-apoptosis, and BIRC7 coefficient in the model is negative, thereby promoting survival, suggesting that BIRC7 is apoptosis promotion in TNBC, and the high expression thereof has positive effect on clinical prognosis.
CASP3 is an important gene for programmed cell death. Jiang MX, et al [ JIA2020 ] reviews the selection mechanism of CASP3/GSDME signaling pathway in Apoptosis (Apoptosis) and Apoptosis (Pyroptosis). Apoptosis of cells is inflammatory programmed cell death [ TAN2021 ]. When the cancer suppressor gene GSDME is highly expressed, the CASP3 cuts off the N-terminal region of the GSDME, and the N-terminal region punctures the cell membrane to cause cell scorching; when GSDME is under-expressed, the cells undergo classical apoptosis. GSDME also promotes CASP3 expression, forming a positive feedback loop that promotes apoptosis in cells, which is associated with side effects of chemotherapy and anti-tumor immunity. The weight of CASP3 in the survival score model was positive and second greatest, indicating that high CASP3 of cell apoptosis is also a significant risk factor for TNBC patient death. .
Farabaugh SM, et al [ FAR2015 ] reviewed the role and mechanism of IGF1R in different breast cancer subtypes. Insulin-like growth factor (IGF1/2) is an endocrine hormone in blood circulation, is important for the growth of the body, and is mainly secreted by the liver under the stimulation of pituitary growth hormone. IGF1/2 acts on downstream genes by binding and activating the receptor IGF1R, while the corresponding IGF2R is the brake. There are three major pathways for IGF1R to resist apoptosis: PI3K/AKT, RAS/MAPK, Raf mitochondrial translocation induced by binding to 14.3.3 protein. The TCGA breast cancer data set showed that IGF pathway gene abnormalities were present in 45.3% of cases, while IGF1R abnormalities (amplified, overexpressed, mutated) were 9%. The incidence of breast cancer subtypes varies significantly from race to race, with 30% TNBC in african american populations and 12% TNBC in white american populations. This difference may be related to the differences in IGF1R/IGF2R human interspecies expression: african American women express IGF1R higher than white American women, but IGF2R lower. The negative weight of IGF1R in the survival score model suggests that highly expressed IGF1R is a positive factor for longer survival in TNBC patients, which is contrary to the anti-apoptotic and cancer-promoting results of highly expressed IGF1R, suggesting that there may be other unique protective mechanisms of IGF1R in TNBC.
The TGF-b signaling pathway underlies many other signaling pathways. Sulaiman et al [ SUL2021 ] introduced TGF-b signaling pathway first and then reviewed the clinical protocol for treating TNBC potential with the goal of inhibiting TGF-b pathway with the goal of eliminating tumor stem cells. The TGF-b signaling pathway may promote MAPK/ERK, PI3K/Akt/mTOR/S6K, the RhoA/Rac signaling pathway, many of which are aberrant in TNBC. BMPR1A, a TGF-b family member, is negatively associated with TNBC relapse and poor prognosis. The weight of BMPR1A in the model is negative, which indicates that the expression of BMPR1A has the effect of promoting survival.
STAT3 is a very important oncogene transcription factor and a target for many cancer-initiating drugs, and thus, there are dozens of clinical studies conducted. There has been a great deal of research into the mechanisms of STAT3 carcinogenesis. Qin JJ, et al [ Qin2019 ] a systematic review was made of STAT3 in the oncogenic study of TNBC. Over-expressed cytokine receptors, such as IL-6R, IL-10R, and hyperactive growth factor receptors, such as EGFR, FGFR, IGFR, trigger the tyrosine phosphorylation cascade through ligand binding to these receptors, leading to abnormal activation of STAT3 and transcription of its downstream target genes, resulting in cancer cell proliferation, anti-apoptosis, migration, invasion, angiogenesis, anti-chemotherapy, immunosuppression, stem cell self-renewal, maintenance, and autophagy. Sirkisonon SR, et al [ SIR2018 ] found that the interaction of STAT3 with the oncogene transcription factor GLI1 and its fragment tGLI1 enhanced the aggressiveness of TNBC. STAT3 weights were positive and large in the survival score model, indicating that high expression STAT3 is an important risk factor for TNBC patient death.
Another gene in the model that may be involved in activating STAT3 is KIT. KIT activates a number of signaling pathways, critical for regulating cell survival and proliferation, hematopoiesis, stem cell maintenance, gametogenesis, mast cell development, migration and function, and melanogenesis. A study of 667 breast cancer patients revisited (median time to revisit 39 months) by S Kashiwagi, et al [ KAS2013 ] showed that of 190 (28.5%) TNBC subtypes, there were 149 (78.4%) basal-like (BL1/2, KRT5/6+ and/or EGFR +). KIT + was 111 in the whole population (16.6%), 42 in the basal-like subset (28.2%), and 47 in the lymph node metastasis subset (47/216 ═ 21.8%), and was an independent negative prognostic factor at TNBC (HR ═ 2.29, 95% CI ═ 1.11-4.72). To understand the specific role of KIT in TNBC, Hu JY, et al [ HUJ2021 ] found that syrupy ribosome biogenesis factor 1(TCOF1) is highly expressed in TNBC, promoting stem cell formation and tumor growth of cancer cells, with poor prognosis, and demonstrated that KIT is a downstream effector of TCOF1 regulated TSC pathway. The KIT weight in the survival score model is positive, indicating that high expression KIT is an important risk factor for TNBC patient death.
Another gene GREM1 in the model belongs to a Bone Morphogenetic Protein (BMP) protein inhibitor and is involved in organogenesis, tissue differentiation and organ fibrosis. Sung NJ, et al [ SUN2020 ] studies found that GREM1 promoted lung metastasis of breast cancer cells via the STAT3-MMP13 pathway. Neckmann 1U, et al [ NEC2019 ] found that high expression of GREM1 correlates with ER-negative breast cancer metastasis and poor prognosis. GREM1 was weighted positive and greater in the survival score model, indicating that highly expressed GREM1 is a significant risk factor for TNBC patient death.
DTL is also known as tretinoin (vitamin a derivative) regulated nuclear matrix associated protein (RAMP). Ueki T, et al [ UEK2008 ] found that during the growth of breast cancer cell lines, DTL is phosphorylated by the action of the mitotic kinase AURKB and is involved in the regulation of multiple links of the cell cycle. Cui HR, et al [ Cui2019 ] studies indicate that DTL promotes tumor development by promoting cancer cell motility and proliferation through degradation of PDCD4 (programmed cell death 4). The weight of DTL in the survival score model was positive and large, indicating that highly expressed DTL is an important risk factor for TNBC patient death.
Granulocyte colony stimulating factor (G-CSF) receptor CSF3R is an oncogene, whose mutations are hallmarks of some leukemia patients and are expressed abnormally in many other cancers. G-CSF is associated with tumor-associated macrophages (TAM) and thus with breast cancer growth and metastasis. Hollman M, et al [ HOL2015 ] studied macrophage activation mechanism with different breast cancer cell lines, it was found that TNBC cell line MDA-MB-231 secreted high G-CSF, induced immunosuppression macrophage HLA-DRio, macrophage secreted TGF-a to promote cancer cell metastasis. In tumor tissue samples, G-CSF is highly expressed in TNBC and is associated with CD163+ macrophages, and the high-expression population of G-CSF corresponds to poor OS. In the Hunag XY, et al [ HUA2020 ], the public data set is utilized to research the GSF expression of 24 cancers, and the G-CSF and receptor genes are found to have poor prognosis in the population with high breast cancer expression. The CSF3R weight in the survival score model was positive, which also indicates that the highly expressed CSF3R is a risk factor for death of TNBC patients.
IKZF1 is another immune cell-associated gene in the model, belonging to the family of hematopoietic and immune transcriptional regulators (ikros). Ikaros plays an important role in the differentiation of hematopoietic stem cells into immune cells. Arco PGD, et al [ ARC2005 ] describes the effect of the ikros gene family on lymphocyte production, with ikros regulating the expression of CD 8. In the aspect of cell cycle, Ikaros negatively regulates G1/S transition through the DNA binding function of the Ikaros so as to inhibit cell growth and have the effect of inhibiting cancer. Chen JC, et al [ CHE2018 ] found that IKZF1 enhances the immune permeability of solid tumors, and can improve immunotherapy sensitivity. The weight of IKZF1 in the survival score model is negative, which indicates that the high expression IKZF1 is a favorable factor for prolonging the survival time of the TNBC patient.
HSD3B2 enzyme is a key catalyst for the biosynthesis of all hormonal steroids. In a scientific report, Wiggins GAR, et al [ WIG2021 ] found that HSD3B2, one of four genes associated with the C21-steroid biosynthesis pathway that converts cholesterol to breast-associated hormones, whose SNP (rs11075995) is a breast cancer risk allele, was associated with down-regulation of DNA demethylase FTO, resulting in abnormal hormone biosynthesis, thereby increasing breast cancer risk. In addition, there is a large body of research on SNPs for HSD3B2 as risk alleles for prostate cancer. The weight of HSD3B2 in the survival score model is positive, which indicates that the high expression HSD3B2 is a risk factor for the death of TNBC patients.
FHL1 is called quadruplim protein 1. The LIM domain refers to a unique double zinc finger structure shared by LIN-1, ISL-1 and MEC-3 proteins. Ding LH, et al [ DIN2009 ] found that FHL1 inhibited anchor-dependent and anchor-independent growth of breast cancer cells by down-regulating ER. FHL1 was weighted positive in the survival score model, indicating that high expression of FHL1 may lead to ER-, weakening the inhibition of cancer cell growth, and is a risk factor for TNBC patient death.
Heparan Sulfate (HS) Proteoglycan (PG), HSPG, is an important component of extracellular matrix (ECM), and during HSPG synthesis, glycosyltransferase EXT family genes, including EXT1/2/3 and EXTL1/2, regulate the extension of HS skeleton. Busse-Wicher M, et al [ BUS2014 ] reviewed EXT family genes and their relationship to disease. A detailed review of the surprising effect of HSPG on clinical cancer use was made by Faria-Ramos I, et al [ FRA2021 ]. Nadaka S, et al [ NAD2012 ] found that EXTL2 regulates glycosaminoglycan (GAG) synthesis during HSPG synthesis, resulting in termination of its multimer elongation. Sembajwe LF, et al [ FRA2021 ] investigated the expression differences of EXT family genes, glycosyltransferase activities, and HS structures in ER +/PR + breast cancer cell lines MCF-7, ER-/PR-non-cancer cell lines MCF10A, TNBC cell lines MDA-MB-231, and HCC 38: firstly, EXT1 is extremely low in MCF-7 expression, EXTL2 is high in MDA-MB-231 expression and HCC38 expression; secondly, MDA-MB-231 and HCC38 are expressed almost without difference, but the HS structures of the two are quite different; thirdly, glycosyltransferase activity HCC38 is two times higher than that of normal cell MCF 10A; while MDA-MB-231 was twice as low as MCF 10A. The weight of EXTL2 in the survival score model was negative and the absolute value was largest, indicating that high expression EXTL2 is the most important positive factor for longer survival of TNBC patients.
CCN1, a Cellular Communication Network (CCN) factor 1, plays an important role in many functions of cells as the name suggests. Li J, et al LIJ2015 provide a very good overview of the role of CCN in carcinogenesis and metastasis. CCN has four domains:
IGFBP: an insulin-like growth factor IGF binding domain;
VWC: a Von Willebrand factor region which is a TGF-b, BMP, integrin binding region;
TSP 1: the type 1 thrombospondin repeat segment is VEGF, LPR, HSPG and integrin binding region;
CT: cysteine knot region, VEGF, LPR, HSPG, integrin, Notch1, fibrin C1 binding region.
Multiple binding domains indicate that CNN is involved in many signaling pathways, such as TGF-a/b and Wnt. CCN1 is highly expressed in ovarian cancer, prostatic cancer, glioma, breast cancer, renal cancer, etc.; but low expression in endometrial cancer, lung cancer, chondrosarcoma and intestinal cancer. In the field of breast cancer, Lin MT, et al [ Lin2004 ] using MCF-7 cells revealed that CNN1 (i.e., CYR61) was resistant to chemotherapy-induced apoptosis by activating integrin/NF-kB/XIAP. Lai D, et al [ Lai2011 ] found that resistance of breast cancer cells to paclitaxel (Taxol) was due to activation of TAZ, a transcriptional co-activator containing a PDZ binding motif. TAZ is the main component of Hippo-LATS channel, interacts with TEAD family transcription factors, activates the promoter of downstream gene CNN1/CTGF, and induces resistance to paclitaxel. Harris LD, et al [ HAR2012 ] found that acting on the Hedgehog Pathway (Hedgehog Pathway) involved in angiogenesis during ontogeny, SHH (Sonic Hedgehog) activated HH transcription factor GLI1, thereby activating the potent angiogenic secretory molecule CNN1 and promoting metastasis of cancer cells. The negative CNN1 weight in the survival score model suggests that highly expressed CNN1 is a positive factor in the longer survival of TNBC patients, which is contrary to the above-mentioned results of highly expressed CNN1 causing chemotherapy resistance and cancer metastasis, but is consistent with the results of IGF1R in the model, suggesting that CNN1 and IGF1R may have other unique protective mechanisms in TNBC.
TPMT is involved in the metabolism of thiopurine drugs, including azathioprine, mercaptopurine and thioguanine, and the drugs are immunosuppressants and used for immune system diseases and blood diseases. Ruwali M, et al [ RUW2019 ] introduces the present situation and progress of cancer treatment pharmacokinetics, and has a direct relation with toxicity of TPMT and chemotherapeutic drugs, and some patients with TPMT deletion or low expression have the disadvantage that Cisplatin (Cisplatin) and 5-FU and other chemotherapeutic drugs cannot be metabolized and decomposed, so that higher toxicity is caused, the survival period of the patients can be negatively influenced, and the TPMT plays a role in protecting the chemotherapeutic patients. The TPMT weight in the survival score model is negative, which indicates that the high-expression TPMT is a positive factor of the TNBC patient with longer life cycle and is consistent with the promotion of drug metabolism by TPMT.
In conclusion, the genes in the overall survival score model provided by the present invention are mostly involved in cancer-related important pathways and play a major role, and the weights within the model parameters are consistent with positive/negative effects in the literature, except for IGF1R, which literature studies indicate that it is anti-apoptotic and promotes cancer, but the model shows that IGF1R is a positive factor for long-lived, probably because normal cells damaged during chemotherapy of patients are rescued by IGF1R during apoptosis. The genes which can be used as markers and the combination thereof discovered based on the algorithm model are matched with the research results in the field, and the reasonability of the algorithm model provided by the invention is shown.
Abbreviations and terms used throughout this disclosure
AUC: area Under the Curve of Area Under the Curve
BL 1/2: basal Like 1/2Subtype, Basal Like 1/2Subtype
DFS: disease Free Survival, Disease-Free Survival
And (3) DSB: Double-Strand Break of DNA
ECM: extracellular Matrix of Extracellar Matrix
IM: immunomodulating Subtype
IRES: internal Ribosome Entry Site, Internal Ribosome Entry Site
LAR: luminal Androgen Receptor Subtype
M: mesenchymal Subtype
MSL: mesenchymal Stem cell-Like Subtype Mesenchymal Stem Like Subtype
And OS: overall Survival of Overall Survival
RAMP: retinoic acid-regulated nuclear matrix-associated protein, tretinoin (vitamin A derivative) regulated nuclear matrix-associated protein
RFS: relapse Free Survival, Relapse Free Survival
ROC: receiver Operation Current, subject operating Curve
RSQ: R-Square, correlation coefficient squared
SHH: sonic hedgehog, Sonic hedgehog (ligand)
SNP: single Nucleotide Polymorphism (SNP)
TAM: tumor-associated macrophages
TNBC: triple Negative Breast Cancer
TSC: tumor Stem Cell, Tumor Stem Cell
Reference to the literature
【LIN2004】Lin MT,et al.Cyr61 Expression Confers Resistance to Apoptosis in Breast Cancer MCF-7Cells by a Mechanism of NF-kB-dependent XIAP Up-Regulation.The journal of biological chemistry Vol.279,No.23,Issue of June 4,pp.24015–24023,2004.
【ARC2005】Arco PGD,et al.The Role of the Ikaros Gene Family in Lymphocyte Development.Chapter 27In Book:Zinc Finger Proteins,pp.200-206,2005.
【UEK2008】Ueki T,et.al.Involvement of elevated expression of multiple cell-cycle regulator,DTL/RAMP(denticleless/RA-regulated nuclear matrix associated protein),in the growth of breast cancer cells.Oncogene(2008)27,5672–5683.
【LAI2011】Lai D,et al.Taxol Resistance in Breast Cancer Cells Is Mediated by the Hippo Pathway Component TAZ and Its Downstream Transcriptional Targets Cyr61 and CTGF.Cancer Research;71(7),2011.
【HAR2012】Harris LG,Pannell LK,Singh S,Samant RS and Shevde LA:Increased vascularity and spontaneous metastasis of breast cancer by hedgehog signaling mediated upregulation of cyr61.Oncogene 31:3370-3380,2012.
【BUS2014】Busse-Wicher M,et al.The extostosin family:Proteins with many functions.Matrix Biology 35(2014)25–33.
【FAR2015】Farabaugh SM,et al.Role of IGF1R in breast cancer subtypes,stemness,and lineage differentiation.Frontiers in Endocrinology.6(59),2015.
【HOL2015】Hollmén M,Karaman S,Schwager S,et al.G-CSF regulates macrophage phenotype and associates with poor overall survival in human triple-negative breast cancer.Oncoimmunology 2015;5:e1115177.
【LIJ2015】Li J,et al.Emerging role of CCN family proteins in tumorigenesis and cancer metastasis(Review).International journal of molecular medicine 36:1451-1463,2015.
【ALA2016】Alayev A,et al.Estrogen induces RAD51C expression and localization to sites of DNA damage.Cell Cycle 15(23):3230–3239,2016.
【DEL2016】Delaunay S,et al.Elp3 links tRNA modification to IRES-dependent translation of LEF1 to sustain metastasis in breast cancer.J.Exp.Med.2016 Vol.213 No.11 2503–2523 https://doi.org/10.1084/jem.20160397.
【CHE2018】Chen JC,et al.IKZF1 Enhances Immune Infiltrate Recruitment in Solid Tumors and Susceptibility to Immunotherapy.Cell System,2018,7(1):92-103 doi:10.1016/j.cels.2018.05.020.
【BRA2019】Bray,S.M.(2019).Mechanisms and regulation of dsDNA break repair in the Sulfolobus genus of thermophilic archaea(Doctoral thesis).https://doi.org/10.17863/CAM.37526.
【CUI2019】Cui HR,et.al.DTL promotes cancer progression by PDCD4ubiquitin-dependent degradation.Journal of Experimental&Clinical Cancer Research(2019)38:350.
【NEC2019】Neckmann U,et al.GREM1 is associated with metastasis and predicts poor prognosis in ER-negative breast cancer patients.Cell Communication and Signaling(2019)17:140https://doi.org/10.1186/s12964-019-0467-7.
【RUW2019】Ruwali M,et al.Pharmacogenetics and Cancer Treatment:Progress and Prospects.In book Molecular Medicine,2019.
【HUA2020】Huang X,Hu P,Zhang J.Genomic analysis of the prognostic value of CSFs and CSFRs across 24solid cancer types.Ann Transl Med2020;8(16):994.doi:10.21037/atm-20-5363.
【JIA2020】Jiang MX et al.The caspase-3/GSDME signal pathway as a switch between apoptosis and pyroptosis in cancer.Cell Death Discovery(2020)6:112https://doi.org/10.1038/s41420-020-00349-0.
【SUN2020】Sung NJ,et al.Gremlin-1Promotes Metastasis of Breast Cancer Cells by Activating STAT3-MMP13 Signaling Pathway.Int.J.Mol.Sci.2020,21,9227;doi:10.3390/ijms21239227.
【FRA2021】Faria-Ramos I,et al.Heparan Sulfate Glycosaminoglycans:(Un)Expected Allies in Cancer Clinical Management.Biomolecules 2021,11,136.https://doi.org/10.3390/biom11020136.
【HUJ2021】Hu JY,et.al.TCOF1 upregulation in triple-negative breast cancer promotes stemness and tumour growth and correlates with poor prognosis.British Journal of Cancer;https://doi.org/10.1038/s41416-021-01596-3.
【MAK2021】Makuch-Kocka et.al.The BIRC Family Genes Expression in Patients with Triple Negative Breast Cancer.Int.J.Mol.Sci.2021,22,1820.https://doi.org/10.3390/ijms22041820.
【TAN2021】Tan YX,et.al.Pyroptosis:a new paradigm of cell death for fighting against cancer.Journal of Experimental&Clinical Cancer Research(2021)40:153https://doi.org/10.1186/s13046-021-01959-x.
【WIG2021】Wiggins GAR,et al.Variable expression quantitative trait loci analysis of breast cancer risk variants.Scientific Reports,(2021)11:7192,Nature Portfolio,https://doi.org/10.1038/s41598-021-86690-5.
Example 1 screening of diagnostic markers for TNBC Gene Using mRNA expression data for the Gene
First, data set preparation
1. The data set GSE69031(GPL571) was downloaded from the gene expression integrated database (GEO), and the data set TCGA-BRCA (GPL96, etc.) was downloaded from TCGA. The data sets were all genechip data for breast cancer sections (Affymetrix platform GPL96, GPL571, etc.). Only TNBC patients were selected to obtain a total of 171 patients, of which 150 were contributed by TCGA-BRCA and 21 were contributed by GSE 69031.
2. After gene transcription with extremely low expression (the number of non-zero expression samples is not more than 10) is eliminated, miRNA and lncRNA are eliminated, common genes of two data sets are selected, and the gene factor is 9524.
3. Data normalization of samples and genes was performed step by step:
for each sample, calculating the median of all gene expression levels, and the normalized expression of each sample is: the difference of mRNA input amount of the sample is removed by subtracting the median of all gene expression amounts of the sample from the original expression amount.
Further calculating the median sample expression for each gene based on the normalized sample data, the normalized expression for each gene being: the median of the expression of the gene in all samples was subtracted from the expression, which removed the differences between the different platforms.
The normalized data is assembled into a TNBC dataset.
Second, screening and modeling of overall survival markers
With an Survival index (OS) as a target variable, the following operations are uniformly performed when a model is established for the target variable:
1. determining genes associated with the target variable:
and (3) searching genes with statistical significance (p is less than 0.05) capable of distinguishing different target variable groups (0 represents survival and 1 represents death) by utilizing a t-test (t-test), and primarily obtaining the differentially expressed genes.
2. Up-or down-regulation of genes into groups:
dividing the differentially expressed genes into two groups, wherein t in the t-test result is a positive number representing the genes which are expressed and down-regulated in the tissues of the patient; t is a negative number representing a gene whose expression is up-regulated in the patient's tissue. And respectively carrying out hierarchical association coefficient analysis on the two groups of genes.
3. And (3) hierarchical correlation coefficient analysis:
and (3) respectively carrying out hierarchical association coefficient clustering on the genomes with the up-regulated or down-regulated expression, wherein the purpose is that at the given association coefficient level, the genes in each cluster need to be approximately pairwise associated with each other, and the cluster with the highest average association degree with other genes in the cluster is selected as a representative. Representative genes of all clusters constitute model candidate genes for the marker.
4. Iterative linear regression analysis determined markers and models:
(i) for the model candidate genes, the number(s) of model parameters is given, and iterative linear regression analysis is performed. And recycling different s, and finding the optimal number of model parameters, which is determined by the corresponding maximum value of rsq. Thereby obtaining an optimal model.
(ii) And (3) pre-selecting 741 genes on the cancer-related gene mutation map, and repeating the step (i) to obtain the optimal model.
(iii) And (3) integrating the candidate markers obtained in the step (i) and the step (ii), and repeating the step (i) again to obtain the overall survival model. The model included markers consisting of 18 genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1.
The model is as follows:
Figure BDA0003497645900000181
wherein ExptExpressed amount of mRNA of each gene in the marker,. beta.tAs a weighting coefficient corresponding to each gene.
TABLE 1 parameter Table of final integral survival model after iterative Linear regression
Figure BDA0003497645900000182
Figure BDA0003497645900000191
Example 2 Cross-validation
The data set in example 1 is subjected to ROC curve analysis on the model obtained in example 1 according to the population average of the target variable, wherein one half of the data set is a training set and the other half of the data set is a verification set, and the AUC is calculated. This was repeated N (═ 20) times and the statistical characteristics of AUC were calculated. See fig. 1, where the minimum AUC is 0.65, the maximum AUC is 0.96, and the median AUC is 0.79. The AUC intermediate value of the cross validation is used as an index for evaluating the model result, which shows that the model provided by the invention has superiority.
Example 3 evaluation of any subset of markers
Any subset of the markers obtained in example 1 (the subset includes K genes, K is a positive integer of1 or more and less than 16) is taken and evaluated as a biomarker.
For the case of K ═ 1, i.e., any single gene in the marker was evaluated as a biomarker, ROC curves were plotted and AUC was calculated, and the results are shown in fig. 3.
For the case that K is more than or equal to 2, K genes are randomly selected from the markers, the model is reconstructed and cross validation is carried out, ROC curves of the genes are drawn, AUC is calculated, and partial results are shown in FIGS. 4-7.
Example 4 model verification
Data set the TCGA breast cancer data set BRCA in example 1 was used. The number of TNBC samples with complete data is 150, the number of samples is 26 for which the samples are labeled dead (EVENT ═ 1), and the number of samples is 124 for which the samples are labeled live (EVENT ═ 0). An ROC curve was plotted using the model established in example 1 (with 18 genes as markers) to evaluate the model's ability to predict overall survival. The ROC curve is shown in fig. 8.
According to the ROC curve shown in FIG. 8, the AUC is 0.9789, and the optimal decision point (shown as a dotted line) on the ROC curve corresponds to 92% specificity and 96% sensitivity. In addition, when the prediction score is calculated by using the model, corresponding chi-sq is calculated, and the score corresponding to the maximum value position is set as the optimal threshold score. The threshold score is 0.4098, with a score greater than the threshold score indicating low overall survival or death, and a score less than the threshold score indicating high overall survival or survival. The correlation indices are shown in table 2.
TABLE 2
Index of prediction ability Index value
AUC 0.9789
Specificity of 92%
Sensitivity of the composition 96%
Threshold score 0.4098
Survival analysis was performed using the cox model, and the corresponding overall survival time K-M curve is shown in fig. 9. According to fig. 9, the three-year survival rate of the low-risk (score less than 0.4098) person is 93%, the three-year survival rate of the high-risk (score greater than or equal to 0.4098) person is 52%, and the difference is statistically significant, which indicates that the model factor provided by the invention has excellent clinical diagnosis effect.
Example 5
This example provides a kit comprising the following genes that can quantify the 18 genes in the overall survival model provided by the present invention: agents for mRNA levels of BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM1, including reverse transcriptase, primers, Taq enzyme, fluorescent dyes, and the like.
Example 6
The present embodiment provides a method for predicting clinical prognostic overall survival of a TNBC patient, comprising the steps of:
(1) extracting mRNA from a postoperative cancer slice sample or a liquid biopsy sample of a subject as a sample to be tested;
(2) detecting the following 18 genes in the sample to be detected: mRNA expression levels of BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1;
(3) and calculating a survival Score (OS Score) according to the overall survival model provided by the invention, and evaluating the overall survival rate of the clinical prognosis of the subject according to a preset threshold Score, wherein if the overall survival rate is higher than the threshold Score, the overall survival rate is low or death, and if the overall survival rate is lower than the threshold Score, the overall survival rate is high or survival.
Example 7
The present embodiment provides an apparatus for predicting clinical prognostic overall survival of a TNBC patient, the apparatus comprising detection means, a processor and output means; the detection device can use mRNA as a sample to be detected, and quantitatively detect the mRNA expression quantity of the following 18 genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1; the processor is used for reading the mRNA expression quantity of the gene measured by the detection device, calculating a survival Score (OS Score) according to the model provided by the invention, and predicting that the overall survival rate of the subject is high or low according to a preset threshold Score; the output device is used for outputting the prediction result of the processor. In some embodiments, the output device is a display.
The method for predicting the overall survival of the TNBC subjects after clinical prognosis by using the equipment comprises the following steps:
(1) taking mRNA extracted from a cancer slice sample or a liquid biopsy sample after a subject operation as a sample to be detected;
(2) and (3) sending the sample to be detected into the detection device, and quantitatively detecting the mRNA expression quantity of the following 18 genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1;
(3) the processor reads the mRNA expression quantity of the gene measured by the detection device, calculates the survival Score (OS Score) according to the model provided by the invention, and predicts that the overall survival rate of the subject is high or low according to the preset threshold Score;
(4) and outputting the prediction result of the processor through the output device.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A marker for predicting the overall survival of the clinical prognosis of triple negative breast cancer, wherein the marker comprises one or a combination of two or more of the following genes: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1.
2. A model for predicting the overall survival of the clinical prognosis of triple negative breast cancer, wherein the model is:
Figure FDA0003497645890000011
the model takes one or more than two combinations of the following genes as markers: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1; the ExptExpressed amount of mRNA of each gene in the marker,. beta.tAs a weighting coefficient corresponding to each gene.
3. The model of claim 2, wherein said weighting coefficients are calculated from an iterative linear regression.
4. A model according to claim 2 or 3, characterized in that the model is:
survival score ═ betaBIRC7X expression amountBIRC7)+(βBMPR1AX expression amountBMPR1A)+(βCASP3X expression amountCASP3)+(βCNN1X expression amountCNN1)+(βCSF3RX expression amountCSF3R)+(βDTLX expression amountDTL)+(βEXO1X expression amountEXO1)+(βEXTL2X expression amountEXTL2)+(βFHL1X expression amountFHL1)+(βGREM1X expression amountGREM1)+(βHSD3B2X expression amountHSD3B2)+(βIGF1RX expression amountIGF1R)+(βIKZF1X expression amountIKZF1)+(βKITX amount of expressionKIT)+(βRAD51CX amount of expressionRAD51C)+(βSTAT3X expression amountSTAT3)+(βTPMTX expression amountTPMT)+(βURM1X expression amountURM1)。
5. The model of claim 4, wherein β is β in said modelBIRC7、βIGF1R、βRAD51C、βCNN1、βIKZF1、βBMPR1A、βTPMT、βEXO1、βEXTL2Are all less than 0, betaURM1、βCASP3、βSTAT3、βDTL、βGREM1、βCSF3R、βKIT、βHSD3B2、βFHL1Are all larger than 0;
preferably, the model is:
survival score (-0.0445X expression level)BIRC7) + (-0.1073X expression levelBMPR1A) + (0.1153X expression levelCASP3) + (-0.0622X expression levelCNN1) + (0.0642X expression levelCSF3R) + (0.0861X expression levelDTL) + (-0.1108X expression levelEXO1) + (-0.1351X expression levelEXTL2) + (0.0436 Xexpression level)FHL1) + (0.0688X expression levelGREM1) Expression level of + (0.0506 ×)HSD3B2) + (-0.0472X expression levelIGF1R) + (-0.0924X expression levelIKZF1) + (0.0565X expression level)KIT) + (-0.0533X expression levelRAD51C)+ (0.1121X expression levelSTAT3) + (-0.1095X expression levelTPMT) + (0.1669X expression levelURM1)。
6. Use of the marker of claim 1 or the model of any one of claims 2 to 5 for the manufacture of a product for predicting the overall survival of the triple negative breast cancer clinical prognosis.
7. A method for constructing a model for predicting the overall survival of the clinical prognosis of triple negative breast cancer is characterized by comprising the following steps of:
s1, acquiring complete transcriptome data of the breast cancer slices of the three-negative breast cancer patient; performing data preliminary screening and standardization to form a data set;
s2, in the data set, using t-test to search for genes with statistical significance (p is less than 0.05) capable of distinguishing the population with clinical prognosis survival and clinical prognosis death, and obtaining differentially expressed genes;
s3, dividing the differentially expressed genes into an expression up-regulation group and an expression down-regulation group, respectively carrying out hierarchical association coefficient clustering on the expression up-regulation group and the expression down-regulation group, and selecting the gene with the largest average association degree with other genes of the cluster in each cluster as a model candidate gene;
s4, carrying out iterative linear regression analysis on the model candidate genes by circulating different model variable numbers (S), and establishing a model for overall survival of the clinical prognosis of the triple negative breast cancer; the model is as follows:
Figure FDA0003497645890000021
the model takes one or more than two combinations of the following genes as markers: BIRC7, BMPR1A, CASP3, CNN1, CSF3R, DTL, EXO1, EXTL2, FHL1, GREM1, HSD3B2, IGF1R, IKZF1, KIT, RAD51C, STAT3, TPMT, URM 1; the ExptExpressed amount of mRNA of each gene in the marker,. beta.tAs a weighting coefficient corresponding to each gene.
8. A kit for predicting the overall survival of the triple negative breast cancer after clinical prognosis, which comprises a detection reagent for the marker of claim 1;
preferably, the kit comprises the model of any one of claims 2-5.
9. The kit of claim 8, wherein the step of using the kit to predict the clinical prognostic overall survival of triple negative breast cancer comprises:
(1) detecting the mRNA expression level of the marker in a test sample from a patient with triple negative breast cancer;
(2) substituting the mRNA expression level of the marker detected in the step (1) into the model of any one of claims 2-5, and calculating the survival score;
(3) when the survival score is larger than the threshold score, the overall survival rate of the triple negative breast cancer patient is low after clinical prognosis; when the survival score is smaller than the threshold score, the overall survival rate of the triple negative breast cancer patient is high after clinical prognosis;
preferably, the sample to be tested is from a tissue or a body fluid.
10. An apparatus for predicting overall survival of a clinical prognosis of triple negative breast cancer, the apparatus comprising a detection device and a processor;
the detection device is used for detecting the mRNA expression level of the gene in the marker of the invention in a sample to be detected;
the processor is used for reading the mRNA expression quantity of the gene measured by the detection device, calculating a survival score according to the model of any one of claims 2 to 5, and predicting that the overall survival rate of the subject is high or low according to the threshold score;
preferably, the apparatus further comprises an output device for outputting the prediction result of the processor.
CN202210118821.XA 2022-02-08 2022-02-08 Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis Pending CN114438209A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210118821.XA CN114438209A (en) 2022-02-08 2022-02-08 Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210118821.XA CN114438209A (en) 2022-02-08 2022-02-08 Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis

Publications (1)

Publication Number Publication Date
CN114438209A true CN114438209A (en) 2022-05-06

Family

ID=81372279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210118821.XA Pending CN114438209A (en) 2022-02-08 2022-02-08 Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis

Country Status (1)

Country Link
CN (1) CN114438209A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114540500A (en) * 2022-03-21 2022-05-27 深圳市陆为生物技术有限公司 Product for evaluating overall survival of breast cancer patients
CN115478106A (en) * 2022-08-18 2022-12-16 南方医科大学南方医院 LR (low rate) based method for typing triple negative breast cancer and application thereof
CN117607443A (en) * 2024-01-23 2024-02-27 杭州华得森生物技术有限公司 Biomarker combinations for diagnosing breast cancer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140045915A1 (en) * 2010-08-31 2014-02-13 The General Hospital Corporation Cancer-related biological materials in microvesicles
CN110499364A (en) * 2019-07-30 2019-11-26 北京凯昂医学诊断技术有限公司 A kind of probe groups and its kit and application for detecting the full exon of extended pattern hereditary disease

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140045915A1 (en) * 2010-08-31 2014-02-13 The General Hospital Corporation Cancer-related biological materials in microvesicles
CN110499364A (en) * 2019-07-30 2019-11-26 北京凯昂医学诊断技术有限公司 A kind of probe groups and its kit and application for detecting the full exon of extended pattern hereditary disease

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANNA MAKUCH-KOCKA 等: "The BIRC Family Genes Expression in Patients with Triple Negative Breast Cancer", INT. J. MOL. SCI., vol. 22, no. 4, pages 1820 *
宋晓薇: "STAT1在乳腺癌中的诊断和预后价值及其对三阴性乳腺癌细胞功能的影响", 万方中国学位论文全文数据库, pages 1 - 106 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114540500A (en) * 2022-03-21 2022-05-27 深圳市陆为生物技术有限公司 Product for evaluating overall survival of breast cancer patients
CN115478106A (en) * 2022-08-18 2022-12-16 南方医科大学南方医院 LR (low rate) based method for typing triple negative breast cancer and application thereof
CN117607443A (en) * 2024-01-23 2024-02-27 杭州华得森生物技术有限公司 Biomarker combinations for diagnosing breast cancer
CN117607443B (en) * 2024-01-23 2024-04-16 杭州华得森生物技术有限公司 Biomarker combinations for diagnosing breast cancer

Similar Documents

Publication Publication Date Title
CN114438209A (en) Marker and model for predicting overall survival of three-negative breast cancer clinical prognosis
JP7186700B2 (en) Methods to Distinguish Tumor Suppressor FOXO Activity from Oxidative Stress
US20220090206A1 (en) Colorectal cancer recurrence gene expression signature
Becker et al. Favorable prognostic impact of NPM1 mutations in older patients with cytogenetically normal de novo acute myeloid leukemia and associated gene-and microRNA-expression signatures: a Cancer and Leukemia Group B study
Pawitan et al. Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts
Uscanga-Perales et al. Triple negative breast cancer: Deciphering the biology and heterogeneity
Pavon et al. Gene expression signatures and molecular markers associated with clinical outcome in locally advanced head and neck carcinoma
Chi et al. Gene expression programs of human smooth muscle cells: tissue-specific differentiation and prognostic significance in breast cancers
KR20210104037A (en) Peripheral blood miRNA markers for the diagnosis of non-small cell lung cancer
US10036070B2 (en) Methods and means for molecular classification of colorectal cancers
Necchi et al. Molecular characterization of residual bladder cancer after neoadjuvant pembrolizumab
Saleh et al. Comparative analysis of triple-negative breast cancer transcriptomics of Kenyan, African American and Caucasian Women
Maldonado et al. Integrated transcriptomic and epigenomic analysis of ovarian cancer reveals epigenetically silenced GULP1
Weisz et al. Molecular identification of ERα‐positive breast cancer cells by the expression profile of an intrinsic set of estrogen regulated genes
CN111187840A (en) Biomarker for early breast cancer diagnosis
Bydoun et al. Breast cancer genomics
Daza et al. Urine supernatant reveals a signature that predicts survival in clear‐cell renal cell carcinoma
Yang et al. An integrated model of clinical information and gene expression for prediction of survival in ovarian cancer patients
Mamatjan et al. Integrated molecular analysis reveals hypermethylation and overexpression of HOX genes to be poor prognosticators in isocitrate dehydrogenase mutant glioma
Liu et al. Prognostic modeling of lung adenocarcinoma based on hypoxia and ferroptosis-related genes
Akbar-Esfahani et al. Diagnostic value of plasma long non-coding RNA HOTTIP as a non-invasive biomarker for colorectal cancer (a case-control study)
CN114959026A (en) Application of reagent for detecting gene in preparation of product for evaluating recurrence risk of breast cancer patient
Chang et al. Pharmacogenetics of breast cancer: toward the individualization of therapy
Yu et al. CSRP1 Promotes colon adenocarcinoma growth and serves as an independent risk biomarker for worse prognosis
US20160304961A1 (en) Method for predicting the response to chemotherapy treatment in patients suffering from colorectal cancer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination