CN111105879A - Probabilistic identification model for breast cancer prognosis generated by deep machine learning - Google Patents
Probabilistic identification model for breast cancer prognosis generated by deep machine learning Download PDFInfo
- Publication number
- CN111105879A CN111105879A CN201811265590.5A CN201811265590A CN111105879A CN 111105879 A CN111105879 A CN 111105879A CN 201811265590 A CN201811265590 A CN 201811265590A CN 111105879 A CN111105879 A CN 111105879A
- Authority
- CN
- China
- Prior art keywords
- gene
- genes
- model
- machine learning
- prognosis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004393 prognosis Methods 0.000 title claims abstract description 38
- 208000026310 Breast neoplasm Diseases 0.000 title claims abstract description 29
- 206010006187 Breast cancer Diseases 0.000 title claims abstract description 28
- 238000010801 machine learning Methods 0.000 title claims abstract description 24
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 64
- 101150094765 70 gene Proteins 0.000 claims abstract description 29
- 230000004547 gene signature Effects 0.000 claims abstract description 16
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 12
- 210000001165 lymph node Anatomy 0.000 claims abstract description 8
- 238000007418 data mining Methods 0.000 claims abstract description 6
- 201000011510 cancer Diseases 0.000 claims abstract description 5
- 101150084750 1 gene Proteins 0.000 claims abstract description 4
- 101150101112 7 gene Proteins 0.000 claims description 19
- 230000014509 gene expression Effects 0.000 claims description 18
- 238000010837 poor prognosis Methods 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 7
- 230000001149 cognitive effect Effects 0.000 claims description 5
- 238000011226 adjuvant chemotherapy Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 claims 1
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000012986 modification Methods 0.000 claims 1
- 230000004048 modification Effects 0.000 claims 1
- 238000012360 testing method Methods 0.000 abstract description 6
- 230000004083 survival effect Effects 0.000 abstract description 4
- 238000002512 chemotherapy Methods 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 17
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 8
- 239000003550 marker Substances 0.000 description 7
- 206010027476 Metastases Diseases 0.000 description 6
- 230000009401 metastasis Effects 0.000 description 6
- 238000000034 method Methods 0.000 description 6
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 238000000018 DNA microarray Methods 0.000 description 2
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 2
- WZUVPPKBWHMQCE-UHFFFAOYSA-N Haematoxylin Chemical compound C12=CC(O)=C(O)C=C2CC2(O)C1C1=CC=C(O)C(O)=C1OC2 WZUVPPKBWHMQCE-UHFFFAOYSA-N 0.000 description 2
- 238000002123 RNA extraction Methods 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 238000002790 cross-validation Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 230000001575 pathological effect Effects 0.000 description 2
- 238000001356 surgical procedure Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- JIAARYAFYJHUJI-UHFFFAOYSA-L zinc dichloride Chemical compound [Cl-].[Cl-].[Zn+2] JIAARYAFYJHUJI-UHFFFAOYSA-L 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- CRECEVIVUNGUGM-UHFFFAOYSA-N 2-amino-1-morpholin-4-ylethanol Chemical compound NCC(O)N1CCOCC1 CRECEVIVUNGUGM-UHFFFAOYSA-N 0.000 description 1
- 102000016911 Deoxyribonucleases Human genes 0.000 description 1
- 108010053770 Deoxyribonucleases Proteins 0.000 description 1
- 108091034117 Oligonucleotide Proteins 0.000 description 1
- 108700008625 Reporter Genes Proteins 0.000 description 1
- 108010077895 Sarcosine Proteins 0.000 description 1
- 101710137500 T7 RNA polymerase Proteins 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- WDJHALXBUFZDSR-UHFFFAOYSA-M acetoacetate Chemical compound CC(=O)CC([O-])=O WDJHALXBUFZDSR-UHFFFAOYSA-M 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 description 1
- CCIVGXIOQKPBKL-UHFFFAOYSA-M ethanesulfonate Chemical compound CCS([O-])(=O)=O CCIVGXIOQKPBKL-UHFFFAOYSA-M 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011223 gene expression profiling Methods 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 239000005556 hormone Substances 0.000 description 1
- 229940088597 hormone Drugs 0.000 description 1
- 238000000338 in vitro Methods 0.000 description 1
- 230000008595 infiltration Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 208000030776 invasive breast carcinoma Diseases 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 210000005075 mammary gland Anatomy 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000001959 radiotherapy Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003757 reverse transcription PCR Methods 0.000 description 1
- 201000008261 skin carcinoma Diseases 0.000 description 1
- 239000011780 sodium chloride Substances 0.000 description 1
- 229940048098 sodium sarcosinate Drugs 0.000 description 1
- ZUFONQSOSYEWCN-UHFFFAOYSA-M sodium;2-(methylamino)acetate Chemical compound [Na+].CNCC([O-])=O ZUFONQSOSYEWCN-UHFFFAOYSA-M 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000004881 tumor cell Anatomy 0.000 description 1
- 239000011592 zinc chloride Substances 0.000 description 1
- 235000005074 zinc chloride Nutrition 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The technical field of the invention is as follows: the identification model of breast cancer prognosis is used for calculating whether clinical prognosis and auxiliary chemotherapy are worth to be carried out, and is used for precise treatment. By applying an autonomously developed deep machine learning data mining algorithm, a probability identification model for cancer prognosis is developed. The "70 gene signature" was the first and by far the only us FDA-approved test for prognosis of breast cancer. Based on the same clinical 2 ten thousand 5 thousand RNA dataset (151 breast cancer lymph node negative patients, 97 survival five years old or more, 54 controls), my deep machine learning started with 2 genes, increased by 1 gene each time, and selected the combination with the strongest recognition ability. My deep machine learning produced 7 genes "probabilistic identification model of breast cancer prognosis", the identification ability has exceeded the ability of the "70 gene signature".
Description
The inventor: zhang Peesen
(I) technical field
Identification of breast cancer prognosis models to calculate clinical prognosis and whether adjuvant chemotherapy is worthwhile.
(II) background of the invention
(2.1) overview:
"breast cancer patients at the same disease stage may have significantly different treatment responses and outcomes. Clinical predictors of metastasis (e.g., lymph node status and histological grade) do not accurately classify breast tumors. Chemotherapy or hormone treatment can reduce the risk of metastasis by about one-third; however, 70-80% of patients receiving such treatment can survive without such treatment. "(technical literature [ 1 ])
Several gene recognition models have been developed to predict clinical outcome and determine whether adjuvant chemotherapy is worthwhile. Among them, the "70 Gene Signature" (70-Gene Signature) (technical documents [ 1, 2, 3 ]), which tests the classification of tumors as good or poor prognosis, depending on the risk of recurrence for 5 years. The transformation research system consortium (transcbig) is a network consisting of 21 countries, about 40 partners, including the mammary gland international group (BIG). An independently validated study of this consortium demonstrated that the "70 gene signature" approved by the U.S. Food and Drug Administration (FDA) was able to distinguish patients with significant risk of metastasis recurrence and death from low risk patients. (technical literature [ 3 ])
Expression of some genes is correlated. The associated genes are duplicated and redundant in the recognition model. The redundant genes increase the detection cost and introduce noise and increase errors. The 70 genes used in the "70 gene tag" are related in some cases. It is desirable to select relatively independent genes as much as possible to establish a model, thereby reducing the detection cost, reducing the noise and improving the precision.
(2.2) data sources: (patient selection, RNA isolation, and biochip expression):
we used the same clinical 2 ten thousand 5 kilo RNA dataset with the "70 gene signature" (151 lymph node negative patients with breast cancer, 97 patients who survived for more than five years, 54 controls). The "70 gene signature" was selected from a subsample set of 151 patients, 78 patients (34 patients with no metastasis for more than five years, 44 controls). Our probabilistic recognition model uses 151 cases of the entire sample set.
Tumors were selected from 295 women with breast cancer from fresh frozen tissue banks at the netherlands cancer institute according to the following criteria: the tumor is primary invasive breast cancer, and the diameter of the tumor is less than 5cm (pT1 or pT2) in pathological examination; apical axillary lymph nodes were tumor negative as determined from subclavian lymph node biopsy; diagnosis age 52 years or less; the diagnosis period is between 1984 and 1995; there was no history of cancer, except for non-melanoma skin cancers. All patients received modified radical mastectomy or breast conservation surgical treatment, including axillary lymphadenectomy, and radiation therapy if indicated. Of 295 patients, 151 patients were node negative (pathological examination result pN0), and 144 were node positive (pN +). (technical literature [ 2 ])
Tumor material was snap frozen in liquid nitrogen within 1 hour after surgery. Frozen sections were stained with hematoxylin and eosin; only samples with more than 50% tumor cells were selected. 30- μm sections were used for RNA isolation. Total RNA was isolated using RNAzolB and dissolved in RNase-free water. 25 μ g of RNA was then treated using Qiagen RNase-free DNase kit and RNeasy spin column, then dissolved in RNase-free water to a final concentration of 0.2 μ g/μ l, transcribed in vitro by using T7 RNA polymerase and 5 μ g total RNA and labeled with Cy3 or Cy5(Cy Dye, Amersham Pharmacia Biotech). 5 micrograms of Cylabeled cRNA from one breast cancer tumor was mixed with the same amount of reverse Cy-labeled product in a pool of equal amounts of cRNA from each patient. The labeled cRNA was fragmented to an average size of about 50 to 100 nucleotides by heating the sample to 60 ℃ in the presence of 10mM zinc chloride and adding a hybridization buffer containing 1M sodium chloride, 0.5% sodium sarcosinate, 50mM morpholino-ethanolamine and 50mM acetoacetate. Ethanesulfonic acid (pH6.5) and formamide (final concentration, 30% at 40 ℃); the final volume was 3 ml. The microarray included 24,479 biological oligonucleotides and 1281 control probes. After hybridization, the slides were washed and scanned with a confocal laser scanner (Agilent Technologies). The fluorescence intensity of the scanned image was quantified and the background level was corrected and normalized. (technical literature [ 2 ])
(2.3) "70 Gene signature" big data analysis and data mining algorithm (technical literature [ 1 "):
in the first step, the "70 gene tag" screens 24,479 genes of the biochip for 5,000 important genes. These genes were more than twice expressed in more than 5 experiments and were significant p < 0.01.
In the second step, the "70 gene signature" calculated the correlation between the prognostic class (metastasis vs. no metastasis) and the log expression ratio for all 78 samples of each individual gene of the 5,000 important genes. The "70 gene signature" found 231 genes with correlation coefficients greater than 0.3 ("related genes") or less than-0.3 ("anti-related genes").
In the third step, the "70 gene tag" is cross-validated using the method of "rule-one-out". One sample at a time is taken, the remaining samples are used for learning, a model is generated, and then the model is used to identify the taken samples. One at a time until all samples are exhausted. This approach avoids information penetration. The samples to be identified are not in the "learning set". The "70 gene signature" takes one sample at a time and uses the remaining 77 samples to define a classifier based on 231 distinct genes. The result of the first sample taken is then predicted. The prediction of the samples was based on their correlation coefficients with a "good prognosis" template and a "poor prognosis" template, where the "good" and "poor" templates were the average expression of the "good" and "poor" samples of the 77 samples in the clinic. Correlation coefficients were calculated using the selected reporter genes. This procedure was repeated until each of the 78 samples was expelled. It is finally calculated how many cases the prediction is correct and how many cases the prediction is incorrect. The performance of the classifier is measured by the error rate of type 1 (false negative) and type 2 (false positive) of the selected genome. The "70 gene signature" repeats the above-described performance assessment procedure based on "one out" cross-validation, from the top of the candidate list adding 5 more marker genes at a time until all 231 genes are used as discriminators. The number of mispredictions of type 1 and type 2 errors varies significantly with the number of marker genes used. The combined error rate is lowest when using the "70 gene tag" from the top of the candidate list. Thus, the "70 gene signature" considers this group of 70 genes as the best marker genome that can be used to classify patients into two prognostic subgroups, a "good prognosis" group and a "poor prognosis" group. Interestingly, the accuracy of predicting the prognosis of "sporadic" breast cancer patients is rather low when only a few marker genes are used. Accuracy increased with increasing number of marker genes until an optimal number of marker genes (70 genes) was reached. However, in addition to the optimal number of marker genes, accuracy is deteriorated due to the introduction of noise.
(2.4) "70 Gene tag" was approved in US patent 2007, U.S. patent No.: 7171311 (patent document [ 1 ])
Disclosure of the invention
(3.1) overview:
by applying the self-developed deep machine learning data mining algorithm, a probability recognition model for cancer prognosis is self-developed. The "70 gene signature" was the first and by far the only us FDA-approved test for prognosis of breast cancer. Based on the probability recognition model of 2 ten thousand 5 thousand RNA data sets (151 breast cancer lymph node negative patients, 97 breast cancer lymph node negative patients with five years of survival and 54 controls) in the same clinic, the number of genes to be detected is reduced, and the accuracy of the gene label is improved by over 70.
(3.2) our deep machine learning data mining algorithm:
firstly, a self recognition model is constructed by adopting a deep machine learning algorithm. We use 70 genes of the '70 gene label' as the basis, and use the deep machine learning data mining algorithm developed by us to calculate the detection capability by starting from 2 genes and increasing 1 gene each time. Our algorithm is a deep machine learning algorithm. All combinations of genes are learned. For example, taking 5 genes from 70 genes requires 1 thousand to 2 million studies. Taking 6 genes from 70 genes requires 1 million to 3 million studies. Before each learning, data normalization is carried out, and the accuracy of the data is guaranteed.
In the second step, our identification model hopes that gene expression is as independent as possible from each other, so that each gene can fully play a role in the identification process. Our recognition model is an independent probabilistic model. Breast cancer patients are classified by the probability of "good prognosis" and "poor prognosis". Which probability is high belongs to that class. The recognition model adopts the independent gene expression probability, and the probability of good prognosis and the probability of poor prognosis are the product of the probabilities of good prognosis and poor prognosis of each gene expression. And finally, classifying according to the probability.
How to determine the probability of "good prognosis" and "poor prognosis" for a single gene of a sample? First, the sample data set for machine learning (learning set) was divided into "good" set (survived for more than 5 years) and "bad" set (survived for less than 5 years) according to clinical 5-year survival. Then, the mean values of the expression intensities of the individual genes (RNAs) in the "good" set and the "poor" set were calculated. The expression of the entire sample data set (the learning set) at this gene was divided into two groups using the midpoint of the two averages as a boundary. The group containing the mean values of gene expression of the "good" set is called "near good group"; similarly, a set containing "differences" is referred to as a "near difference group". This demarcation also localizes the gene expression of the identified sample in a "near good group" or a "near bad group". The two groups calculate the probabilities of "good after prognosis" and "poor after prognosis", respectively. For example, the expression of the gene in the sample being tested is in the "near elite group"; the 'good group' has 80 'good' set members and 10 'poor' set members; then the probability of the test sample being "good after prognosis" at this gene is 80/90 and the probability of "poor after prognosis" is 10/90. The gene (RNA) expression intensity of the sample to be tested belongs to which group, and the probability of "good prognosis" and "poor prognosis" of that group is the probability of "good prognosis" and "poor prognosis" of this sample. The total probability of "good prognosis" and "poor prognosis" of the sample to be tested is the product of the probabilities of "good prognosis" and "poor prognosis" of the expression of each gene. The total probability of good prognosis and poor prognosis of the sample to be tested is high, and the sample to be tested is classified into that class.
Third, we use the method of "rule-one-out" to perform cross-validation to ensure the recognition ability of the model built by our deep machine learning.
(IV) 7-gene probability identification model for breast cancer prognosis
We started the deep machine learning with 2 genes, and each time 1 gene was added, the combination with the strongest recognition ability was selected. The recognition ability of the combination of 7 genes exceeds the ability of the '70 gene label'. We selected the following 7 genes to construct our cognitive model: contig46223_ RC, X05610, NM _006931, Contig55725_ RC, NM _020386, AF055033, Contig2399_ RC. Because our model contains only 7 genes, it is much simpler to detect the expression of these 7 genes (RNA) than 70 genes, the cost can be as low as one tenth, and the noise infiltration is reduced, and the precision is improved. We can do this by RT-PCR. Technical literature [ 2 ] published a comparison of the accuracy of the "70 gene signature" and the traditional clinical st. Here we add the results of the 7-gene probabilistic recognition model. Clinical samples, 151 breast cancer node negative patients, 97 survived for more than five years, and 54 controls. "7 Gene model": accuracy, 84.1%; "70 gene tag": accuracy, 80.8%; galen "accuracy, 59.0%; "NIH" accuracy, 46.2%.
(V) detailed description of the preferred embodiments
(5.1) overview:
the "learning set" of our 7-gene probabilistic identification model for breast cancer prognosis includes the "learning set" of the "70-gene signature". Our "7 gene model" is the result of deep machine learning and can be seen as an upgraded version of the "70 gene signature".
(5.2) the detection method comprises the following steps:
we will produce a 7-gene counterpart kit for the "7-gene model" to help hospitals and other institutions in need thereof. We were also prepared to set up a third party testing facility to undertake the testing of 7 genes for the "7 gene model".
(5.3) "calculation of 7 Gene model":
we will provide a computational server for the "7 gene model" of the network. The computational APP of the "7 gene model" of the mobile phone is also provided.
Technical literature
【1】Gene expression profiling predicts clinical outcome of breastcancer.Nature.2002Jan 31;415(6871):530-536.
【2】A GENE-EXPRESSION SIGNATURE AS A PREDICTOR OF SURVIVAL IN BREASTCANCER.N EnglJ Med,Vol.347,No.25December 19,2002
【3】70-Gene Signature as an Aid to Treatment Decisions in Early-StageBreast Cancer.N Engl J Med 2016;375:717-29.
Patent document
【1】Methods of assigning treatment to breast cancer patients.USPatient 7171311;January 30,2007
Claims (1)
1. Probabilistic identification model for breast cancer prognosis generated by deep machine learning
The inventor: zhang Peesen
1, independent claim
Description of the invention
Our invention is a "probabilistic identification model of breast cancer prognosis generated by deep machine learning". Our invention is a gene recognition model for breast cancer prognosis to calculate clinical prognosis and to determine if adjuvant chemotherapy would be worthwhile. We used the same clinical 2 ten thousand 5 kilo RNA dataset with the "70 gene signature" (151 lymph node negative patients with breast cancer, 97 patients who survived for more than five years, 54 controls).
(II) characteristic section
Our invention is characterized by: deep machine learning and probabilistic recognition models. From the data set of 151 cases of 70 genes of the "70 gene tag", we developed a probabilistic identification model for cancer prognosis by applying our self-developed deep machine learning data mining algorithm. We started the deep machine learning with 2 genes, and each time 1 gene was added, the combination with the strongest recognition ability was selected. The recognition ability of the combination of 7 genes exceeds the ability of the '70 gene label'. We selected the following 7 genes to construct our cognitive model: contig46223_ RC, X05610, NM _006931, Contig55725_ RC, NM _020386, AF055033, Contig2399_ RC.
Dependent claims
2, deep machine learning:
(1.1) our invention is a "probabilistic identification model of breast cancer prognosis generated by deep machine learning".
(1.2) the invention is characterized in that: deep machine learning. Changes in the machine learning pattern (e.g., starting with 3 genes, adding 2 genes at a time) should all be considered as included in the present invention.
3, probability recognition model:
(3.1) our invention is a "probabilistic identification model of breast cancer prognosis generated by deep machine learning".
(3.2) our invention is characterized by: the recognition model adopts the independent gene expression probability, and the probability of good prognosis and the probability of poor prognosis are the product of the probabilities of good prognosis and poor prognosis of each gene expression. Modifications of the recognition model, (e.g., classification tree models) they are all considered to be included in the present invention.
4, gene combination:
(4.1) we selected the following 7 genes to construct our cognitive model: contig46223_ RC, X05610, NM _006931, Contig55725_ RC, NM _020386, AF055033, Contig2399_ RC.
(4.2) our invention is characterized by: the 7 gene model formed by the 7 gene combination is one of the best models obtained by deep learning, and other gene combinations can form similar models with very close precision. These similar combinations of genes are to be considered as included in the present invention.
5, a detection method:
(5.1) we selected the following 7 genes to construct our cognitive model: contig46223_ RC, X05610, NM _006931, Contig55725_ RC, NM _020386, AF055033, Contig2399_ RC.
(5.2) we will produce 7 gene corresponding kits of the "7 gene model" to help hospitals and other required facilities. We were also prepared to set up the detection mechanism to undertake the detection of 7 genes of the "7 gene model".
Calculation of "7 Gene model":
(6.1) we selected the following 7 genes to construct our cognitive model: contig46223_ RC, X05610, NM _006931, Contig55725_ RC, NM _020386, AF055033, Contig2399_ RC. .
(6.2) we will provide a computational server for the "7 gene model" of the network. The computational APP of the "7 gene model" of the mobile phone is also provided.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811265590.5A CN111105879A (en) | 2018-10-29 | 2018-10-29 | Probabilistic identification model for breast cancer prognosis generated by deep machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811265590.5A CN111105879A (en) | 2018-10-29 | 2018-10-29 | Probabilistic identification model for breast cancer prognosis generated by deep machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111105879A true CN111105879A (en) | 2020-05-05 |
Family
ID=70420268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811265590.5A Pending CN111105879A (en) | 2018-10-29 | 2018-10-29 | Probabilistic identification model for breast cancer prognosis generated by deep machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111105879A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113764101A (en) * | 2021-09-18 | 2021-12-07 | 新疆医科大学第三附属医院 | CNN-based breast cancer neoadjuvant chemotherapy multi-modal ultrasonic diagnosis system |
-
2018
- 2018-10-29 CN CN201811265590.5A patent/CN111105879A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113764101A (en) * | 2021-09-18 | 2021-12-07 | 新疆医科大学第三附属医院 | CNN-based breast cancer neoadjuvant chemotherapy multi-modal ultrasonic diagnosis system |
CN113764101B (en) * | 2021-09-18 | 2023-08-25 | 新疆医科大学第三附属医院 | Novel auxiliary chemotherapy multi-mode ultrasonic diagnosis system for breast cancer based on CNN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ozawa et al. | A microRNA signature associated with metastasis of T1 colorectal cancers to lymph nodes | |
CN105121665B (en) | Medical prognosis and prediction using the active therapeutic response of cell multiplex signal transduction path | |
Tothill et al. | An expression-based site of origin diagnostic method designed for clinical application to cancer of unknown origin | |
JP7186700B2 (en) | Methods to Distinguish Tumor Suppressor FOXO Activity from Oxidative Stress | |
JP7354099B2 (en) | How to diagnose, stage, and monitor melanoma using microRNA gene expression | |
ES2821300T3 (en) | Prognostic Prediction for Cancer Melanoma | |
Smeets et al. | Prediction of lymph node involvement in breast cancer from primary tumor tissue using gene expression profiling and miRNAs | |
CN113462776B (en) | m 6 Application of A modification-related combined genome in prediction of immunotherapy efficacy of renal clear cell carcinoma patient | |
EP3931318A1 (en) | Purity independent subtyping of tumors (purist), a platform and sample type independent single sample classifier for treatment decision making in pancreatic cancer | |
CN114150063A (en) | Urine miRNA marker for bladder cancer diagnosis, diagnostic reagent and kit | |
Maxwell et al. | Transcript expression in endometrial cancers from Black and White patients | |
CN111105879A (en) | Probabilistic identification model for breast cancer prognosis generated by deep machine learning | |
WO2009002175A1 (en) | A method of typing a sample comprising colorectal cancer cells | |
CN111748626B (en) | System for predicting treatment effect and prognosis of neoadjuvant radiotherapy and chemotherapy of esophageal squamous carcinoma patient and application of system | |
Kroon et al. | Microarray gene‐expression profiling to predict lymph node metastasis in penile carcinoma | |
TW202242143A (en) | Risk estimation method of breast cancer recurrence or metastasis and kit thereof | |
Wen et al. | Breast Cancer Pathology in the Era of Genomics | |
EP2459748B1 (en) | Determination of the risk of distant metastases in surgically treated patients with non-small cell lung cancer in stage i-iiia | |
KR102138517B1 (en) | Extracting method for biomarker for diagnosis of pancreatic cancer, computing device therefor, biomarker, and pancreatic cancer diagnosis device comprising same | |
CN117004711A (en) | Tool for measuring prognosis marker of breast cancer local recurrence risk and application thereof | |
Nomikou | Investigating Nuclear Morphological Features and Chromatin Architecture in Cancer | |
WO2024002599A1 (en) | Novel signatures for lung cancer detection | |
Mook et al. | Personalized Medicine by the Use of Microarray Gene Expression Profiling | |
Kroon et al. | Microarray gene-expression profiling to predict lymph node metastasis in penile carcinoma | |
WO2022018086A1 (en) | Prognostic and treatment response predictive method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200505 |