CN116926190A - Prognosis marker for measuring breast cancer remote metastasis risk and application thereof - Google Patents

Prognosis marker for measuring breast cancer remote metastasis risk and application thereof Download PDF

Info

Publication number
CN116926190A
CN116926190A CN202210342312.5A CN202210342312A CN116926190A CN 116926190 A CN116926190 A CN 116926190A CN 202210342312 A CN202210342312 A CN 202210342312A CN 116926190 A CN116926190 A CN 116926190A
Authority
CN
China
Prior art keywords
breast cancer
sample
genes
gene
gene expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210342312.5A
Other languages
Chinese (zh)
Inventor
施冠卉
陈定壕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
On Chi Biomedical Pte Ltd
Original Assignee
On Chi Biomedical Pte Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by On Chi Biomedical Pte Ltd filed Critical On Chi Biomedical Pte Ltd
Priority to CN202210342312.5A priority Critical patent/CN116926190A/en
Publication of CN116926190A publication Critical patent/CN116926190A/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Abstract

The invention provides a prognosis marker for measuring breast cancer remote metastasis risk and application thereof, wherein the prognosis marker is at least seven genes in a gene group, and the gene group comprises: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab. The invention is helpful for medical staff to clinically assist breast cancer patients in deciding the required treatment type, and reduces the medical cost, the burden and the waste of health care or insurance resources.

Description

Prognosis marker for measuring breast cancer remote metastasis risk and application thereof
Technical Field
The present invention relates to a detection tool for breast cancer distant metastasis and a combination application thereof, and more particularly to a detection tool and a combination for predicting breast cancer distant metastasis by constructing a gene model by using specific gene expression values.
Background
Breast cancer is the most common female cancer worldwide, accounts for 1/3 of the proportion of female cancers, accounts for 1/10 of the proportion of all cancers, is one of the most common causes of death for women aged 45-55 years, and has 1 case of breast cancer death (6.8%) every 38 women each year. Breast cancer is a polygenic disease, and the complex interaction of genetic factors determines the cause of breast cancer. This results in breast cancer being a highly heterogeneous disease with variable characteristics, patterns, disease course, therapeutic response and prognosis. Many studies state that breast cancer does not consist of a single cancer cell, and may also consist of multiple subtypes of tumors in the same person, resulting in treatment that is difficult to completely eradicate.
Although breast cancer was found to be effective in improving survival by 90% in early stages, there are still roughly five patients who develop breast cancer recurrence within 5 to 10 years after surgery. Breast cancer recurrence can be categorized into local recurrence and distant metastasis, regional lymph node recurrence (lcoregion) is the entry of cancer cells into breast lymph; distal metastasis is the spread of cancer cells from blood vessels to internal organs such as the lungs, liver or brain. The strategy for reducing recurrence of local areas of breast cancer is to perform postoperative radiation therapy on the patient, while the strategy for reducing distant metastasis is to perform systemic adjuvant chemotherapy (systemic adjuvant chemotherapy) and hormonal therapy (hormonal therapy) on the patient.
About 60% of early breast cancer patients opt to receive adjuvant chemotherapy, with only a small fraction (2-15%) of the patients actually receiving the benefit of chemotherapy, but all are at risk of chemotherapy poisoning.
In the present stage, the possibility of distant metastasis can be evaluated only on the basis of periodic tracking, and over-treatment (overtreeatm) or under-treatment (underserveteatm) often occurs. The administration of the same intensity of treatment to each patient results in some patients suffering from unwanted side effects of the treatment or not having the desired therapeutic effect. Causing social and family burden and medical resource waste. For postoperative patients, the uncertainty of recurrence is a compromise and decoction.
At present, most subjects for breast cancer recurrence, survival, tumor subtype are Caucasians. In recent years, using genomic analysis, significant differences in tumor type and cancer subtype of breast cancer have been observed in different regional populations. For example, highly permeable breast cancer susceptibility genes (e.g., BRCA1 and BRCA 2) that are important in caucasian regions have low mutation rates in asian populations. Thus, BRCA1 and BRCA2 mutations are the only causative agents of breast cancer development or recurrence in a small portion of asian populations. Most breast cancer-related genes that have been identified are also thought to affect asian populations to a limited extent. Considering the basic epidemiology and genetic risk factors between ethnic groups, ethnic gene differences may be a potential cause of the difference in risk of breast cancer in ethnic groups. By constructing the effects of ethnicity differences, a more thorough understanding of patient prognosis is achieved, thereby making more appropriate treatment decisions. Therefore, it is of great significance to conduct breast cancer research and establish an assessment of recurrence probability for asian females.
Disclosure of Invention
The invention provides a specific protein binding molecule, a nucleic acid probe or a nucleic acid primer for measuring a prognosis marker of a breast cancer distant metastasis risk prediction gene model, wherein the prognosis marker is at least seven genes in a gene group, and the gene group comprises: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The specific protein binding molecule, nucleic acid probe or nucleic acid primer is further used for measuring the expression values of the at least seven genes in a sample, and the expression values of the at least seven genes are substituted into a calculation formula after normalization to obtain a gene expression value, wherein the gene expression value is used for judging whether the sample is a breast cancer distant metastasis low risk sample, and the calculation formula is as follows:
the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides a kit for measuring gene expression values of prognostic markers of breast cancer distal metastasis risk, wherein the prognostic markers are at least seven genes in a gene group comprising: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The kit further comprises a plurality of specific protein binding molecules, nucleic acid probes or nucleic acid primers for measuring the expression values of the at least seven genes in a sample, wherein the expression values of the at least seven genes are substituted into a calculation formula after normalization to obtain a gene expression value, the gene expression value is used for judging whether the sample is a breast cancer distant metastasis low risk sample, and the calculation formula is as follows:
The gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides a breast cancer remote metastasis risk assessment kit for measuring gene expression values of prognostic markers for measuring breast cancer remote metastasis risk, wherein the prognostic markers are at least seven genes in a gene group, and the gene group comprises: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
Wherein the kit further comprises: a plurality of specific protein binding molecules, nucleic acid probes or nucleic acid primers for measuring the expression values of the at least seven genes in a sample of the subject; and a breast cancer remote metastasis risk prediction gene model, further comprising a calculation formula, wherein the calculation formula is used for substituting the standardized expression values of the at least seven genes, calculating to obtain a gene expression value, and the gene expression value is used for judging whether the sample is a breast cancer remote metastasis low risk sample, and the calculation formula is as follows:
The gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides an application of a specific protein binding molecule, a nucleic acid probe or a nucleic acid primer for measuring the gene expression value of a prognosis marker of breast cancer remote metastasis risk, wherein the prognosis marker is at least seven genes in a gene group, and the gene group comprises: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The specific protein binding molecule, nucleic acid probe or nucleic acid primer is used for measuring the expression values of the at least seven genes in a sample, and the expression values of the at least seven genes are substituted into a calculation formula after normalization to obtain a gene expression value, wherein the gene expression value is used for judging whether the sample is a breast cancer distant metastasis low risk sample or not, and the calculation formula is as follows:
The gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides a molecular typing gene group for evaluating the breast cancer remote metastasis risk, which consists of the following genes: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The expression value of the gene group is substituted into a calculation formula after normalization to obtain a gene expression value, the gene expression value is used for judging whether the sample is a breast cancer remote metastasis low risk sample, and the calculation formula is as follows:
the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
Compared with the prior art, the detection tool and the combined application provided by the invention are beneficial to risk assessment of breast cancer remote metastasis of Asian female groups. The prediction model, the gene and the calculation formula of the invention are not reported, so that the risk prediction for breast cancer remote metastasis is effectively realized and the accuracy is excellent. The invention is helpful for medical staff to clinically assist breast cancer patients in deciding the required treatment type, and reduces the medical cost, the burden and the waste of health care or insurance resources.
Drawings
FIG. 1 is a flowchart showing the steps of a method for constructing a breast cancer distant metastasis risk prediction gene model according to the present invention.
FIG. 2 is a flowchart showing a method for constructing a breast cancer distant metastasis risk prediction gene model according to another embodiment of the present invention.
FIG. 3 is a flowchart illustrating a method for constructing a remote breast cancer metastasis risk prediction gene model according to the embodiment of FIG. 2.
FIG. 4 is a flowchart showing a method for constructing a remote breast cancer metastasis risk prediction gene model according to another embodiment of the present invention.
FIG. 5 is a flowchart showing a method for constructing a remote breast cancer metastasis risk prediction gene model according to another embodiment of the present invention.
FIG. 6 is a flowchart showing a method for predicting risk of distant metastasis of breast cancer according to an embodiment of the present invention.
Fig. 7 shows survival curves for high risk patients and low risk patients from the date of onset to 10 years.
FIG. 8 is a box plot of gene expression values for each gene based on patients with relapse.
FIG. 9 shows a remote breast cancer metastasis risk prediction gene model according to the present invention.
Detailed Description
In order that the advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings. It should be noted that these embodiments are merely representative embodiments of the present invention, and the specific methods, devices, conditions, materials, etc. are not meant to limit the present invention or the corresponding embodiments. The step numbers of the present invention are merely for separating different steps, and do not represent the order of the steps, and they will be described earlier.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. In addition, singular terms also include plural meanings unless defined otherwise. Generally, the terms used in this specification, and the terms related to molecular biology, protein, oligonucleotide or polynucleotide chemistry and hybridization techniques, are terms that are well known and commonly used in the art. The scientific terms used herein have been used for specific purposes only and are not intended to limit the scope or field of the invention.
The training sample or the sample of the present invention refers to a tumor tissue sample of a breast cancer patient, and the collection mode is not limited. The samples of the present invention were taken as follows: after surgical excision, the breast cancer tumor is fixed in paraffin embedded tissue blocks (FFPE tissue). Blank serial slices with the thickness of 4-10 um are 9-15 pieces. The RNA was extracted from the blank sections using FFPE RNA extraction reagent (Rneasy FFPE Kit). The extracted RNA was reverse transcribed (Reverse transcription) to synthesize cDNA, and a polymerase chain reaction was performed on the ABI 7500Fast PCR system to detect SYBR Green I fluorescence intensity in real time.
The term "distal metastasis" as used herein refers to breast cancer that has spread from a primary tumor to one or more parts, organs, or distal lymph nodes of the body after mastectomy and/or breast preservation surgery, or invasive breast cancer that has been confirmed or clinically diagnosed as recurrent via a slice examination. The term "invasive breast cancer" refers to a cancer that spreads from the lobular or ductal membrane (membrane of the lobule or duct) of the breast into the breast tissue, after which cancer cells may continue to spread into the axilla or other lymph nodes. When breast cancer cells are found in other parts of the body, they are called "metastatic breast cancer".
The term "proportional hazards model" as used herein refers to a survival model in statistics, wherein when survival data further includes covariates and risk factors, the data can be used to estimate the effect of these covariates on survival time, and also can be used to predict survival opportunities over a specified period of time. The Cox proportional hazards model was proposed by jazz in 1972 by Cox (David Cox), a regression analysis model most commonly used in survival analysis. This approach is often referred to as the Cox model or the proportional hazards model.
Reference in this specification to asian females means native asian females in the asian region, or females having asian ancestry, but is not limited to their habitat. Asian women include, inter alia, northeast asia, eastern asia, southeast asia, etc.
In the present specification, measuring the expression level of a gene in a sample refers to measuring the expression level of a ribonucleic acid (mRNA) transcribed from the gene in the sample, or measuring the expression level of a complementary deoxyribonucleic acid (cDNA) obtained by reverse transcription of the ribonucleic acid, or measuring the expression level of a protein translated from the mRNA corresponding to the gene. In particular, the expression value of complementary deoxyribonucleic acid is measured using the instant polymerase chain reaction (qPCR) or reverse transcription polymerase chain reaction (RT-PCR).
Please refer to fig. 1. FIG. 1 is a flowchart showing a method for constructing a remote metastasis risk prediction gene model for breast cancer according to an embodiment of the invention. As shown in fig. 1, the method for constructing a breast cancer distant metastasis risk prediction gene model in this embodiment includes the following steps: a1, providing a plurality of training samples, and selectively setting a far-end transfer mark for the training samples according to the clinical prognosis result of the training samples; step A2, determining the expression values of at least seven genes in each training sample, wherein the at least seven genes are selected from any seven of a gene group consisting of BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB, and any gene of the gene group can be replaced by a homologous gene, a variant gene or a derivative gene thereof; a3, setting weighted scores of a plurality of genes through regression analysis, and respectively generating a predicted value of each training sample according to the genes and the weighted scores; and step A4, grouping the training samples according to the remote metastasis markers to generate a first threshold value corresponding to the predicted value of each training sample, wherein the first threshold value is used in the breast cancer remote metastasis risk prediction gene model to compare with a predicted value of a sample of the sample to judge whether the sample is a breast cancer remote metastasis low risk sample. In other words, the breast cancer distant metastasis risk prediction gene model constructed by the construction method of the present embodiment uses the first threshold as a criterion for determining the breast cancer distant metastasis rate of the sample.
The training sample is a tissue sample after mastectomy or breast retention surgery of a patient with asian female breast cancer. The clinical prognosis of the training sample refers to the recurrence of breast cancer observed after 5 to 10 years of tracking of asian female breast cancer patients after surgery. When asian female breast cancer patients have remote metastasis after 5 to 10 years, setting a remote metastasis mark for the training sample; if the remote transfer does not occur, the training sample is not marked or is marked. The proportion of distal metastasis markers in the training sample set can be regarded as the actual distal metastasis rate of asian female breast cancer patients 5 to 10 years after the operation.
Among thousands of candidate genes, feature selection and regularization (feature selection and regularization) regression analysis are simultaneously performed through a Lasso algorithm (least absolute shrinkage and selection operator) in statistics and machine learning, and through repeated alternation verification, 11 genes with optimal accuracy in predicting breast cancer recurrence are selected. These 11 genes are defined in this invention as breast cancer related molecular typing gene groups, considered as important prognostic markers, each contributing to the prediction of breast cancer distant metastasis risk and being significantly correlated with distant metastasis rate. The expression value of any gene can be used for independently calculating the corresponding value so as to judge whether the sample is a breast cancer remote metastasis low risk sample or not. The universal identification codes of the 11 genes are shown in Table 1.
Table 1 general identification codes of 11 genes
Gene HGNC Ensembl
BLM 1058 ENSG00000197299
BUB1B 1149 ENSG00000156970
CLCA2 2016 ENSG00000137975
ERBB2 3430 ENSG00000141736
TPX2 1249 ENSG00000088325
PIM1 8986 ENSG00000137193
YWHAB 12849 ENSG00000166913
ESR1 3467 ENSG00000091831
OBSL1 29092 ENSG00000124006
DTX2 15973 ENSG00000091073
SF3B5 21083 ENSG00000169976
In practical application, the 11 genes are optionally 7 genes, and 330 kinds of genes are arranged and combined. From the verification results, the lowest accuracy combination of the 330 permutation and combinations still has the negative predictive value NPV accuracy exceeding 80%. If more than 8 genes are selected from the 11 genes, the accuracy of the negative predictive value NPV will be further improved. When a plurality of genes are selected, each gene is normalized and multiplied by a corresponding weighted score, and the contribution of each gene can be modeled.
When the predicted value of a sample is smaller than the first threshold value obtained in step A4, it is determined that the sample is a sample with low risk of distant metastasis of breast cancer, which represents the source patient of the sample, and the probability of distant metastasis of breast cancer in the future is low. In another embodiment, the training samples are grouped by remote transfer markers, and a second threshold may be set. When the predicted value of a sample is greater than the second threshold, it is determined that the sample is a sample with high risk of distant metastasis of breast cancer, which represents the source patient of the sample, and the probability of distant metastasis of breast cancer in the future is high. The second threshold is greater than or equal to the first threshold. It should be noted that, when the sample is determined to be a sample with high risk of distant metastasis of breast cancer, the source patient of the sample will start to perform more active treatment. So tracking after 5 years or 10 years, the source patient of the sample has a chance of distant metastasis of breast cancer, and does not exhibit ideal accuracy. Therefore, practically positive predictive value PPV is usually used only for auxiliary reference.
Thus, the method is suitable for predicting the likelihood of distant metastasis within 5 years and within 10 years for a breast cancer patient after a mastectomy or breast preservation procedure has been performed on the breast cancer patient.
In order to normalize the gene expression values, one or more housekeeping genes (housekeeping gene) may be additionally selected as endogenous reference genes, such as ACTB, RPLP0 and TFRC. The original gene expression value can be calculated as a normalized gene expression value by the housekeeping gene.
In one embodiment, the step A3 of setting the weighted scores of the at least seven genes may further comprise the steps of: setting a weighted score for each gene in the group of genes; the sum of the normalized expression values of each gene in the gene group multiplied by the weighted score is calculated to obtain a predicted value. The normalization of gene expression values was performed by using ACTB, RPLP0 and TFRC as housekeeping genes (Housekeeping genes) as normalization target genes, and the normalization method was:
average housekeeping gene expression value = (actb+rplp0+tfrc)/3
Normalized expression value = 25-each target gene expression value + average housekeeping gene expression value.
With the exception of the gene group and housekeeping genes, the remaining gene expression values were measured and used for subsequent calculations, mostly without contributing to accuracy. For example, additional determinations of the expression values of the C16ORF7, CCNB1, ENSA, MMP15, NFATC2IP, TCF3, TRPV6 genes were calculated without increasing the accuracy of the risk of recurrence of breast cancer in asian females.
Substituting the normalized expression value into the algorithm to obtain a predicted value, and converting the predicted value into a fraction scale of 0 to 100 so as to facilitate interpretation and subsequent risk estimation of the result.
Please refer to fig. 2. FIG. 2 is a flowchart showing a method for constructing a remote metastasis risk prediction gene model for breast cancer according to another embodiment of the present invention. As shown in fig. 2, the step A3 of setting weighted scores of at least seven genes further includes the steps of: step A31, setting a weighted score of each gene in the gene group through regression analysis to generate a calculation formula; and step A32, substituting the standardized expression value of each gene in the gene group into a calculation formula to obtain a gene expression value, wherein the gene expression value is used for calculating the predicted value of each training sample. The calculation formula obtained in the step A31 is as follows: gene expression values = 0.445623356 x bub1b+0.224466639 x blm+0.002146431 x clca2+0.00044734 x erbbb2+0.000307024 x tpx2+0.000290927 x pim1+0.0000278046 x ywab-0.167402736 x esr1-0.080103018 x obsl 1-0.006563626 x dtx2-0.000295266 x sf3b5. The weight score and the calculation formula are obtained after the prior training sample and the verification sample are repeatedly verified. If the expression values of some genes are not measured, the gene expression values are calculated as 0. Other steps of the construction method of the present embodiment are substantially the same as the corresponding steps of the foregoing embodiment, and thus are not repeated herein.
In one embodiment, the calculation formula may also be: the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5.
The weighting score may be adjusted to some extent as the number of training samples and validation samples increases. For the current validation results, the weighted score should not be outside the range of brackets. That is, the weighting score actually used may be any real number within a range between brackets.
The higher the predicted value obtained by using the above calculation formula, the higher the risk of distant metastasis. Based on selecting different models (e.g., a distant metastasis prediction model, a comprehensive recurrence prediction model, a five-year prediction model, or a 10-year prediction model), different scoring algorithms may be selected for operation.
Please refer to fig. 3. FIG. 3 is a flowchart showing further steps of the method for constructing the remote metastasis risk prediction gene model of breast cancer in FIG. 2. As shown in fig. 3, the step A4 of grouping each training sample further includes the following steps: step A41, importing a Cox proportional risk mode, grouping each training sample according to a remote transfer mark, and constructing the association between a predicted value and a remote transfer risk percentage in five years so as to generate a first threshold value corresponding to a plurality of predicted values; and step A42, comparing the predicted value of the sample with a first threshold, wherein the recurrence rate of the sample is less than 4% when the predicted value of the sample is less than the first threshold and the sample is determined to be a sample with low risk of breast cancer distant metastasis. For example, when the predicted value of the sample is less than 29.0, the sample is determined to be a breast cancer distant metastasis low-risk sample, and the construction of the breast cancer distant metastasis risk prediction gene model is completed.
The first threshold for recurrence rate less than 4% can be accurately found by comparing the predicted value with the distal metastasis marker by ROC curve (receiver operating characteristic curve) analysis, optimizing the performance of negative predictive rate (NPV) with the actual distal metastasis rate.
Please refer to fig. 4. FIG. 4 is a flowchart showing a method for constructing a remote metastasis risk prediction gene model for breast cancer according to another embodiment of the present invention. As shown in fig. 4, the present embodiment is different from the previous embodiment in that the step A3 of setting weighted scores of at least seven genes further comprises the steps of: a33, setting weighted scores of at least seven genes through regression analysis; step A34, calculating a clinical observation value according to a clinical factor of the source of the training sample; and step A35, respectively generating a predicted value of each training sample according to the weighted scores and the clinical observed values of at least seven genes. Clinical factors include, among others, age at the time of initial diagnosis, primary tumor size, number of lymph node metastases, lymphatic vessel invasion status, tumor classification, and estrogen receptor status.
Please refer to fig. 5. FIG. 5 is a flowchart showing a method for constructing a breast cancer distant metastasis risk prediction gene model according to another embodiment of the invention. As shown in fig. 5, the method for constructing a breast cancer distant metastasis risk prediction gene model according to the present embodiment includes the following steps: step B1, providing a plurality of training samples, and selectively setting a far-end transfer mark for the training samples according to the clinical prognosis result of the training samples; step B2, determining the expression values of a plurality of genes in each training sample, wherein the plurality of genes comprise BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB, and any one of the genes can be replaced by a homologous gene, a variant gene or a derivative gene thereof; step B3, setting weighted scores of a plurality of genes through regression analysis, and respectively generating a predicted value of each training sample according to the plurality of genes and the weighted scores; and step B4, grouping the training samples according to the remote metastasis markers to generate a first threshold value corresponding to each predicted value, wherein the first threshold value is used in the breast cancer remote metastasis risk prediction gene model to compare with a predicted value of a sample of the sample to judge whether the sample is a breast cancer remote metastasis low risk sample. And when the predicted value of the sample is smaller than the first threshold value, judging that the sample is a breast cancer remote metastasis low-risk sample, and completing construction of a breast cancer remote metastasis risk prediction gene model.
In step B3 of this embodiment, the method may further comprise the steps of: setting a weighted score for each of the plurality of genes; calculating the sum of the normalized expression values of each gene multiplied by the weighted score; calculating a clinical observation value according to a clinical factor of the source of the training sample; and calculating a predicted value according to the weighted score of the gene and the clinical observed value. Clinical factors include, among others, age at the time of initial diagnosis, primary tumor size, number of lymph node metastases, lymphatic vessel invasion status, tumor classification, and estrogen receptor status.
The difference between the present embodiment and the foregoing embodiment is that the construction method of the present embodiment selects all 11 genes to construct the breast cancer distant metastasis risk prediction gene model, and the accuracy of the negative predictive value NPV reaches 94% when the comprehensive clinical observation value is calculated.
Step B3 of this embodiment may further comprise the steps of: setting a weighted score of each gene in the gene group to generate a calculation formula; calculating a clinical observation value according to a clinical factor of the source of the training sample; substituting the standardized expression value of each gene in the gene group into a calculation formula to obtain a gene expression value; and adding the clinical observed value and the gene expression value, and calculating to obtain a predicted value. Clinical factors include age at the time of initial diagnosis, primary tumor size, number of lymph node metastases, lymphatic vessel invasion status, tumor classification, and estrogen receptor status. The calculation formula is as follows: gene expression values = 0.445623356 x bub1b+0.224466639 x blm+0.002146431 x clca2+0.00044734 x erbbb2+0.000307024 x tpx2+0.000290927 x pim1+0.0000278046 x ywab-0.167402736 x esr1-0.080103018 x obsl 1-0.006563626 x dtx2-0.000295266 x sf3b5.
In this embodiment, the predicted value is calculated by: predicted value = 2.969 gene expression value +1.617 clinical observations. The predictions obtained at this time fall within the score scale of 0-100. After evaluation, the first threshold may be set at 29.0. When the predicted value is lower than 29.0, the sample is a breast cancer distant metastasis low risk sample, and the sample source (asian female breast cancer patient) is judged as a breast cancer distant metastasis low risk group. If the predicted value is higher than 29.0, the sample is a breast cancer distant metastasis high risk sample, and the sample source (asian female breast cancer patient) is judged as a breast cancer distant metastasis high risk group.
Please refer to fig. 6. Fig. 6 is a flowchart showing a method for predicting risk of distant metastasis of breast cancer according to an embodiment of the invention. As shown in fig. 6, the method for predicting risk of distant metastasis of breast cancer in this embodiment includes the following steps: step C1, providing a sample; step C2, determining the expression values of at least seven genes in the sample, wherein the at least seven genes are selected from any combination of a gene group consisting of BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB, and any gene of the gene group can be replaced by a homologous gene, a variant gene or a derivative gene thereof; and step C3, comparing the expression values of at least seven genes in the sample with the expression values of at least seven genes in a non-breast cancer remote metastasis sample, and judging the sample as the breast cancer remote metastasis low risk sample when the expression values of at least seven genes in the sample are higher than the expression values of at least seven genes in the non-breast cancer remote metastasis sample.
According to another embodiment, the risk prediction method based on the breast cancer distant metastasis risk prediction gene model may comprise the following steps: providing a sample of the sample; determining the expression level of at least seven genes in the sample, wherein the at least seven genes are selected from any combination of a group of genes consisting of BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and YWHAB, any gene of the group of genes being replaced by a homologous gene, a variant gene, or a derivative gene thereof; substituting the standardized expression values of at least seven genes into a calculation formula to obtain a predicted value, wherein the predicted value represents the risk that the sample is breast cancer remote metastasis.
According to another embodiment, the risk prediction method based on the breast cancer distant metastasis risk prediction gene model may comprise the following steps: providing a sample of the sample; determining the expression values of a plurality of genes in the sample, wherein the plurality of genes comprises BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB, and any one of the genes can be replaced by a homologous gene, a variant gene or a derivative gene thereof; substituting the standardized expression value of each gene into a calculation formula to obtain a predicted value, wherein the predicted value represents the risk that the sample is breast cancer remote metastasis.
In the risk prediction method, the step of substituting each gene into the calculation formula further comprises the following steps: substituting the standardized expression value of each gene into a calculation formula, wherein the calculation formula is as follows: gene expression values = 0.445623356 x bub1b+0.224466639 x blm+0.002146431 x clca2+0.00044734 x erbbb2+0.000307024 x tpx2+0.000290927 x pim1+0.0000278046 x ywab-0.167402736 x esr1-0.080103018 x obsl 1-0.006563626 x dtx2-0.000295266 x sf3b5; calculating a clinical observation based on a clinical factor derived from the sample of the specimen, the clinical factor including age at the time of the first diagnosis, primary tumor size, number of lymph node metastasis, status of lymphatic vessel invasion, tumor classification, and status of estrogen receptor; adding the total clinical observation value and the gene expression value, and calculating to obtain a predicted value; if the predicted value is smaller than a first threshold, the sample is determined to be a breast cancer distant metastasis low risk sample, the first threshold is set, and when the predicted value of the sample is smaller than the first threshold, the sample is determined to be a breast cancer distant metastasis low risk sample, and the recurrence rate of the sample is smaller than 4%.
An advantage of the breast cancer distal metastasis risk prediction method of the present invention is that any number of the 11 genes mentioned above can be used to assess the likelihood of distal metastasis in breast cancer patients after mastectomy and/or breast preservation. Even a single gene is predictive. The combination of the 11 genes provides a better predictive power. In a preferred embodiment, all 11 genes are selected for calculation and prediction, and the prediction accuracy is higher. Yet another advantage is that the type of adjuvant therapy can be determined by medical personnel and breast cancer patients after mastectomy or breast preservation based on the calculated estimated likelihood of distant metastasis.
In an embodiment, the present invention further provides a specific protein binding molecule, nucleic acid probe or nucleic acid primer for measuring a prognostic marker of a distant metastasis risk prediction gene model of breast cancer, wherein the prognostic marker is at least seven genes in a gene group comprising: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The gene sequences in the gene groups are public information, and the specific protein binding molecules, nucleic acid probes or nucleic acid primers corresponding to these genes can be synthesized by known biological methods, which are not described in detail herein.
The specific protein binding molecule, nucleic acid probe or nucleic acid primer is further used for measuring the expression values of the at least seven genes in a sample, and the expression values of the at least seven genes are substituted into a calculation formula after normalization to obtain a gene expression value, wherein the gene expression value is used for judging whether the sample is a breast cancer distant metastasis low risk sample, and the calculation formula is as follows: the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides a kit for measuring gene expression values of prognostic markers of breast cancer distal metastasis risk, wherein the prognostic markers are at least seven genes in a gene group comprising: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The kit further comprises a plurality of specific protein binding molecules, nucleic acid probes or nucleic acid primers for measuring the expression values of the at least seven genes in a sample, wherein the expression values of the at least seven genes are substituted into a calculation formula after normalization to obtain a gene expression value, the gene expression value is used for judging whether the sample is a breast cancer distant metastasis low risk sample, and the calculation formula is as follows: the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
Those skilled in the art can easily understand the elements included in the kit for determining the specific gene expression value, and the details are not repeated here.
The invention also provides a breast cancer remote metastasis risk assessment kit for measuring gene expression values of prognostic markers for measuring breast cancer remote metastasis risk, wherein the prognostic markers are at least seven genes in a gene group, and the gene group comprises: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
Wherein the kit further comprises: a plurality of specific protein binding molecules, nucleic acid probes or nucleic acid primers for measuring the expression values of the at least seven genes in a sample of the subject; and a breast cancer remote metastasis risk prediction gene model, further comprising a calculation formula, wherein the calculation formula is used for substituting the standardized expression values of the at least seven genes, calculating to obtain a gene expression value, and the gene expression value is used for judging whether the sample is a breast cancer remote metastasis low risk sample, and the calculation formula is as follows: the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides an application of a specific protein binding molecule, a nucleic acid probe or a nucleic acid primer for measuring the gene expression value of a prognosis marker of breast cancer remote metastasis risk, wherein the prognosis marker is at least seven genes in a gene group, and the gene group comprises: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The specific protein binding molecule, nucleic acid probe or nucleic acid primer is used for measuring the expression values of the at least seven genes in a sample, and the expression values of the at least seven genes are substituted into a calculation formula after normalization to obtain a gene expression value, wherein the gene expression value is used for judging whether the sample is a breast cancer distant metastasis low risk sample or not, and the calculation formula is as follows: the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The invention also provides a molecular typing gene group for evaluating the breast cancer remote metastasis risk, which consists of the following genes: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
The expression value of the gene group is substituted into a calculation formula after normalization to obtain a gene expression value, the gene expression value is used for judging whether the sample is a breast cancer remote metastasis low risk sample, and the calculation formula is as follows: the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5, and brackets represent optional ranges, any real number in the ranges can be used as a parameter of the weighted fraction.
The following description is provided to illustrate the technical means, processes and effects of the present invention. The method for measuring gene expression values in the following examples is to quantify genes in a sample using a kit, a specific protein binding molecule, a nucleic acid probe or a nucleic acid primer corresponding to 11 genes.
The following examples all predict recurrence of breast cancer based on 11 gene expression values as predictive factors, and then use logistic regression. The selection of the best-fit logistic regression model is done by model training and results in obtaining the best values of the predicted parameters of the control model. The study trained the model using a supervised learning approach (supervised learning method) in the base learning. For example, using 50% of the total sample as the prediction y (recurrence or no recurrence) of the training sample run model, and then comparing the predicted (y) value (predicted high risk or predicted low risk) with the respective observed states (high risk or low risk) uses the input vector of x (gene expression values of 11 genes) as the prediction variable to determine the high or low risk for each patient. And adjusting parameters of the model according to the comparison result and the specific learning algorithm.
Example 1
In this embodiment, a total of 422 patients' data are from the gene expression value synthesis (GEO) dataset. The first dataset GSE2068519 contained a random selection of 312 gene expression value profiles from asian patients treated in the hospital cancer center (KFSYSCC) from 1991 to 2004 for patients diagnosed with breast cancer, and 15 additional lobular breast cancer sample data. The second dataset, GSE4525520, consisted of 1,954 breast tumor data with corresponding clinical pathology data from which 95 asian samples were randomly selected. (1) invasive breast cancer, (2) clinical stages T1-T4, (3) lymph node status L0-L3, (4) primary mastectomy/breast retention treatment.
Follow-up tracking data: of a total of 422 patients, 197 entered follow-up. Data from 197 patients were examined to determine recurrence and survival analysis patterns over the 5-and 10-year follow-up period.
After the model is trained, the model is tested to determine the execution accuracy of the prediction model in practice. The remaining 50% (another 211) of the total samples were used as test datasets to make unbiased evaluations (unbiased evaluation) of the final model appropriate for the training dataset.
Clinical manifestations can be used to determine the clinical accuracy of this model through indices such as sensitivity (sensitivity), specificity (specificity), positive predictive value (positive predictive value, PPV), and negative predictive value (negative Predictive value, NPV). Sensitivity refers to the proportion of patients with recurrence or metastasis that are correctly predicted to be at high risk: true positive/(true positive + false negative). Specificity refers to the proportion of patients with no recurrence or metastasis that are correctly predicted to be at low risk: true negative/(true negative + false positive). Positive predictive value is the proportion of subjects with predicted high risk that do relapse or metastasis; negative predictive value is the proportion of predicted subjects with low risk that do not relapse or metastasize.
Model verification and testing: once the model parameters are evaluated in the previous step, a best fit model is used for all samples in the study and a cross validation (LOOCV) program is performed to check accuracy. LOOCV provides an estimate of the generalization performance with little deviation, including model training on n-1 subsamples and model selection criteria evaluation on the remaining 1 samples. This process is then repeated for all n combinations of n-1 subsamples, and accuracy is then calculated to determine model performance.
Survival analysis: the prognostic significance of age, T-stage, N-stage at diagnosis was assessed using a Cox proportional hazards regression model. The total survival is estimated and any statistically significant differences in survival between the indicated groups are determined using the logarithmic scale. Comparative analysis was performed between groups using chi-square check sum T-test. Statistical significance was p <0.05. For both 5 year and 10 year follow-up data, single and multiple Cox proportional risk analyses included diagnosing age, T-stage, N-stage, and gene expression values, obtaining HRs risk ratios (hazard ratios) with 95% confidence intervals and P-values.
Finally, a subgroup analysis was performed on T1-T2 and N-stage N0-N1 tumors using Cox proportional risk tests, respectively, to assess whether the model had a significant effect in predicting patient survival within 10 years from surgery or diagnosis.
Patients after breast cancer treatment surgery in this example were classified according to biological characteristics, such as diagnosis age, stage T (stage of tumor itself), stage N (stage of metastasis to lymph node), and recurrence, and are summarized in Table 2 below.
TABLE 2 statistics of total samples diagnosed with breast cancer
To further determine the recurrence rate and survival rate of the patients, a further 5-year and 10-year follow-up study was performed on 197 out of 422 patients. Table 3 shows demographic details of tracking patient samples, including diagnosis, tumor stage, N-stage and age of recurrent status.
Table 3, demographic table of predictive model classification for 5 year and 10 year follow-up
In this example, 19 were predicted to be at high risk of relapse, with an average age of 49 years, with 5 (29.4%) relapsing within 5 years and 7 (36.8%) relapsing within 10 years; 178 were predicted to have a low risk of relapse, with an average age of about 50 years, 24 (14%) of which relapsed within 5 years and 31 (17.4%) of which relapsed within 10 years. Patient risk prediction performance p values classified by lymph node status (N stage: N0-N3) and tumor stage (T stage: T1-T4) were 0.979 and 0.567, respectively.
Please refer to fig. 7. Fig. 7 shows survival curves for patients with high risk of relapse versus patients with low risk of relapse from the date of onset to 10 years. Survival analysis predicts a 52% survival rate for patients at risk. Survival rate of low risk patients is 80%. The difference p-value between the two groups of survival times was 0.019, with significant differences. This indicates that the actual survival rate after high risk scoring patients is lower than for low risk scoring patients and that there is a significant difference in actual survival rate between the high risk group and the low risk group.
Example 2
Please refer to fig. 8. FIG. 8 is a box plot of gene expression values for each gene based on patients with relapse. The gene expression profile shows that all genes of patients with recurrence have high or median gene expression values (log 2expression > 7). The vertical axis is the expression value of each gene, and the horizontal axis lists 23 genes, including 11 genes of the present invention. In particular ACTB, PTI1 and RPLPO have high expression values in all patients. On the other hand, the expression values of ERBB2 and ESR1 genes are uniformly distributed. Each gene is divided into two groups on the horizontal axis, the left is a sample group without recurrence, and the right is a sample group with recurrence. In the figure, the central line of each solar square is an equal marked line, the upper line is an upper quartile, the lower line is a lower quartile, and the single point is an outlier or an extreme value.
Table 4 below illustrates the ratio of winnings for each gene. The odds ratio represents each increase in gene expression per unit of gene that increases the corresponding risk of recurrence. For example, every increase in gene expression of BLM in a single gene model, the risk of recurrence grows to 133% of the original. The risk of recurrence increases by 31% per unit of BLM gene expression under the control of other genes in the multigene model, and so on to 11 genes. Thus, each gene can each be used to estimate the risk of recurrence of breast cancer.
TABLE 4 ratio of single gene prediction to multiple gene prediction for each gene
The highest prediction accuracy was obtained by using 11 gene expression values plus clinical factors including diagnosis age, operation age, T stage (stage of tumor itself), N stage (stage of metastasis of tumor to lymph node), postoperative (prognosis) condition …, and the like.
Example 3
For 384 groups of early-stage breast cancer patients T1-2 and N0-1, substituting a breast cancer remote metastasis risk prediction gene model to calculate a predicted value, and importing a Cox proportional risk model (Cox proportional hazard regression model) to construct the correlation between the predicted value and the remote metastasis risk percentage in five years. With 29.0 as the first threshold of the predicted value, the average recurrence rate of the distant metastasis high-risk group (predicted value ∈29.0) is 14.3%, the average recurrence rate of the distant metastasis low-risk group (predicted value < 29.0) is 5.8%, and the survival performance between the high-risk group and the low-risk group has statistically significant difference (p-value=0.009).
Example 4
Substituting 420T 1-3 and N0-2 metaphase breast cancer patient groups into a breast cancer remote metastasis risk prediction gene model to calculate a predicted value, and classifying a remote metastasis high risk or low risk group by taking 29.0 as a first threshold. Following the 10 year tracking, table 5 was prepared as follows, based on whether distant metastasis did occur. In the performance characteristic results, the sensitivity was 72.5%, the specificity was 59.9%, the positive predictive value was 20.0%, and the negative predictive value was 94.0%.
Table 5 Gene expression of the 11 genome in 420T 1-3, N0-2 breast cancer patients
Example 5
In this embodiment, the hospital was commissioned to conduct the evaluation test, and all patients were asian females. The prediction by the method of the invention is then compared with the actual recurrence situation. The performance characteristics of the distal recurrence after comparison are shown in table 6.
TABLE 6 statistical tables categorized by predictive model
The negative predictive value reaches more than 95%, that is, the accuracy of judging the person without recurrence as a low risk group, so that the breast cancer patient with low recurrence risk can be more accurately prevented from being treated excessively.
Example 6
Please refer to fig. 9. FIG. 9 shows a remote breast cancer metastasis risk prediction gene model in the present invention. The method can be used for manufacturing a predictive classification model, wherein the horizontal axis is the score obtained by calculation, and the vertical axis is the recurrence risk of 5 years. The solid line is the predicted value, the short dashed line is the lower limit of the 95% confidence interval, and the long dashed line is the upper limit of the 95% confidence interval. After the asian female patient sample is measured to obtain the gene expression value, the calculation formula can be used to calculate the score, and then the predictive classification model of fig. 9 is compared to estimate the remote metastasis risk.
In the predictive classification model for distant metastasis of fig. 9, the first threshold and the second threshold are both set to 0.29, and when the calculation score is less than 0.29, the patient is evaluated as a low distant metastasis risk group. The probability of distant metastasis of low-risk patients is less than 8% and reaches 40% in the period of five years. The higher the score, the higher the chance of distant metastasis.
In summary, in the construction method, risk prediction method, kit, specific protein binding molecule, nucleic acid probe or nucleic acid primer using the breast cancer distant metastasis risk prediction gene model of the present invention, high-accuracy prediction can be achieved without clinical data. The invention can accurately evaluate the risk index of recurrence after mastectomy and/or breast conservation operation to relevant medical staff, help the medical staff determine the treatment type required by breast cancer patients, and reduce the medical cost, the burden and the waste of health care payment or insurance resources. Because the invention is constructed and demonstrated using a large number of samples from asian breast cancer patients, the invention is particularly suitable for asian women who are taking into account post-operative adjuvant chemotherapy or radiation therapy, avoiding excessive treatment. Compared with the prior art, the invention discloses a plurality of genes which are not confirmed or disclosed before, so that higher accuracy is achieved.
The foregoing detailed description of the preferred embodiments is intended to more clearly describe the features and spirit of the invention, but is not intended to limit the scope of the invention by way of the preferred embodiments disclosed above. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the invention as defined by the appended claims. The scope of the invention as claimed should therefore be accorded the broadest interpretation based upon the foregoing description so as to encompass all such modifications and equivalent arrangements.

Claims (10)

1. A specific protein binding molecule, nucleic acid probe or nucleic acid primer for measuring gene expression values of a prognostic marker of a distant metastasis risk prediction gene model of breast cancer, characterized in that the prognostic marker is at least seven genes of a gene group consisting of BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB.
2. The specific protein binding molecule, nucleic acid probe or nucleic acid primer of claim 1, further used for measuring the expression values of the at least seven genes in a sample, and substituting the expression values of the at least seven genes into a calculation formula after normalization to obtain a gene expression value, wherein the gene expression value is used for determining whether the sample is a breast cancer distant metastasis low risk sample, and the calculation formula is:
the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5.
3. A kit for measuring the gene expression value of a prognostic marker for the risk of distant metastasis of breast cancer, wherein the prognostic marker is at least seven genes in a gene group consisting of BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and ywab.
4. The kit of claim 3, further comprising a plurality of specific protein binding molecules, nucleic acid probes or nucleic acid primers for measuring the expression values of the at least seven genes in a sample, wherein the expression values of the at least seven genes are normalized and substituted into a calculation formula to obtain a gene expression value, wherein the gene expression value is used for determining whether the sample is a sample with low risk of distant metastasis of breast cancer, and the calculation formula is:
the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5.
5. A breast cancer distant metastasis risk assessment kit for measuring gene expression values of prognostic markers for measuring breast cancer distant metastasis risk, characterized in that the prognostic markers are at least seven genes in a gene group consisting of BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB.
6. The breast cancer distal metastasis risk assessment kit of claim 5, further comprising:
a plurality of specific protein binding molecules, nucleic acid probes or nucleic acid primers for measuring the expression values of the at least seven genes in a sample of the subject; and
the breast cancer remote metastasis risk prediction gene model further comprises a calculation formula, wherein the calculation formula is used for substituting the standardized expression values of the at least seven genes, calculating to obtain a gene expression value, and judging whether the sample is a breast cancer remote metastasis low risk sample or not, and the calculation formula is as follows:
the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5.
7. Use of a specific protein binding molecule, nucleic acid probe or nucleic acid primer for measuring the gene expression value of a prognostic marker for the risk of distant metastasis of breast cancer, characterized in that the prognostic marker is at least seven genes in a group of genes consisting of BLM, BUB1B, CLCA2, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2 and YWHAB.
8. The use of claim 7, wherein the specific protein binding molecule, nucleic acid probe or nucleic acid primer is used to measure the expression value of the at least seven genes in a sample, and the expression value of the at least seven genes is normalized and substituted into a calculation formula to obtain a gene expression value, wherein the gene expression value is used to determine whether the sample is a breast cancer distant metastasis low risk sample, and the calculation formula is:
the gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5.
9. A molecular typing gene group for assessing breast cancer distant metastasis risk, characterized by consisting of the following genes: BLM, BUB1B, CLCA, DTX2, ERBB2, ESR1, OBSL1, PIM1, SF3B5, TPX2, and ywab.
10. The molecular typing system of claim 9, wherein the expression value of the system is normalized and substituted into a calculation formula to obtain a gene expression value, the gene expression value being used to determine whether the sample is a breast cancer distant metastasis low risk sample, the calculation formula being:
The gene expression value = (0.4-0.5) bub1b+ (0.20-0.24) blm+ (0.001-0.003) clca2+ (0.0001-0.0010) erbb2+ (0.0002-0.0004) tpx2+ (0.0001-0.0005) PIM1+ (0.00001-0.00005) ybtab- (0.1-0.2) ESR1- (0.05-0.10) OBSL1- (0.003-0.010) DTX2- (0.0001-0.0005) SF3B5.
CN202210342312.5A 2022-03-31 2022-03-31 Prognosis marker for measuring breast cancer remote metastasis risk and application thereof Pending CN116926190A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210342312.5A CN116926190A (en) 2022-03-31 2022-03-31 Prognosis marker for measuring breast cancer remote metastasis risk and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210342312.5A CN116926190A (en) 2022-03-31 2022-03-31 Prognosis marker for measuring breast cancer remote metastasis risk and application thereof

Publications (1)

Publication Number Publication Date
CN116926190A true CN116926190A (en) 2023-10-24

Family

ID=88392906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210342312.5A Pending CN116926190A (en) 2022-03-31 2022-03-31 Prognosis marker for measuring breast cancer remote metastasis risk and application thereof

Country Status (1)

Country Link
CN (1) CN116926190A (en)

Similar Documents

Publication Publication Date Title
JP6246845B2 (en) Methods for quantifying prostate cancer prognosis using gene expression
JP6351112B2 (en) Gene expression profile algorithms and tests to quantify the prognosis of prostate cancer
US8183353B2 (en) Breast cancer prognostics
KR101672531B1 (en) Genetic markers for prognosing or predicting early stage breast cancer and uses thereof
US11434536B2 (en) Diagnostic test for predicting metastasis and recurrence in cutaneous melanoma
JP2009528825A (en) Molecular analysis to predict recurrence of Dukes B colorectal cancer
CN113785076A (en) Methods and compositions for predicting cancer prognosis
WO2010080933A1 (en) Cancer biomarkers
US20090192045A1 (en) Molecular staging of stage ii and iii colon cancer and prognosis
US20210301353A1 (en) Gene signatures for predicting metastasis of melanoma and patient prognosis
JP2020507320A (en) Algorithms and methods for evaluating late clinical endpoints in prostate cancer
CN116926190A (en) Prognosis marker for measuring breast cancer remote metastasis risk and application thereof
CN116936086A (en) Construction method and risk prediction method of breast cancer distant metastasis risk prediction gene model
CN117012376A (en) Construction method and risk prediction method of breast cancer local recurrence model
CN117004711A (en) Tool for measuring prognosis marker of breast cancer local recurrence risk and application thereof
CN115216543A (en) Application of nucleic acid probe or primer in preparation of kit for evaluating breast cancer recurrence and metastasis risk method
TW202242143A (en) Risk estimation method of breast cancer recurrence or metastasis and kit thereof
WO2022225447A1 (en) Risk assessment method of breast cancer recurrence or metastasis and kit thereof
CN116479123A (en) Application of m7G related lncRNA as biomarker in liver cancer prognosis or treatment response prediction, product and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination