CN112201346B - Cancer lifetime prediction method, device, computing equipment and computer readable storage medium - Google Patents

Cancer lifetime prediction method, device, computing equipment and computer readable storage medium Download PDF

Info

Publication number
CN112201346B
CN112201346B CN202011086809.2A CN202011086809A CN112201346B CN 112201346 B CN112201346 B CN 112201346B CN 202011086809 A CN202011086809 A CN 202011086809A CN 112201346 B CN112201346 B CN 112201346B
Authority
CN
China
Prior art keywords
differential
cancer
expression
gene
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011086809.2A
Other languages
Chinese (zh)
Other versions
CN112201346A (en
Inventor
李君一
平原
李辉年
许清哲
王立新
刘莹
刘博�
王亚东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202011086809.2A priority Critical patent/CN112201346B/en
Publication of CN112201346A publication Critical patent/CN112201346A/en
Application granted granted Critical
Publication of CN112201346B publication Critical patent/CN112201346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioethics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a cancer lifetime prediction method, a cancer lifetime prediction device, a cancer lifetime prediction computing device and a computer readable storage medium. The cancer survival prediction method comprises the following steps: obtaining gene expression profile data of a patient with cancer to be predicted; providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient; and obtaining the output of the neural network prediction model to obtain a survival prediction result of the cancer patient to be predicted. The method relieves the degree of overfitting to a certain extent, and can be more accurately suitable for predicting the survival state of the prognosis of the cancer.

Description

Cancer lifetime prediction method, device, computing equipment and computer readable storage medium
Technical Field
The present invention relates to cancer survival prediction, and in particular, to a cancer survival prediction method, apparatus, computing device, and computer-readable storage medium.
Background
Cancer is one of the common diseases leading to death in humans because normal cells of the human body produce canceration and cancerous cells possess unlimited differentiation and proliferation capabilities, and cancer occurrence and progression is often a multifactorial, multi-step complex process. Recent global tumor statistics in 2018 show that up to now there are estimated 1819 thousand new cases of cancer and 960 ten thousand cases of cancer death worldwide. Thus, accurately predicting survival of cancer patients is of great importance for mental rehabilitation of cancer patients, as well as guiding clinicians to develop appropriate treatment regimens.
There are two main prediction methods for survival prediction of cancer: 1) The clinical prediction method for the survival time is characterized in that a doctor uses clinical data and combines some informal subjective methods (such as experience and the like) to judge the survival time of a patient, and the evaluation is flexible, but the prediction accuracy is inevitably reduced due to the influence of cognitive deviation. For some cancers, even experienced clinical oncologists, the accuracy of the predicted survival is only about 20%. 2) The life cycle calculation prediction method is used for analyzing life cycle related factors by using a prediction algorithm and establishing a life cycle prediction model. Research shows that the occurrence and development of cancer are obviously affected by certain gene markers, so that early cancer survival calculation and prediction methods are mainly focused on the exploitation of cancer related gene markers. Numerous studies have shown that many genes are involved in the cancer development and progression process compared to monogenic diseases, and that each gene has an important impact on the human body. Thus cancer survival predictions today are mainly developed around gene expression data. For example, van de Vijver et al found that, among the gene expression data of 98 breast cancer patients, the gene marker involved in the survival of cancer was 70 using a multivariate analysis method. In addition, wang et al further found a gene marker related to the survival of cancer comprising 76 genes from the cancer patient dataset, and predicted the test set data using the gene marker, obtaining a predictive performance of 48% specificity and 93% sensitivity.
Although the above work suggests that gene markers play an important role in cancer survival prediction, such methods still have drawbacks using some simple gene marker screening methods such as multivariate analysis, hypothesis testing, and the like. Since gene expression data is a high-dimensional data containing a large number of genes, such methods are inefficient. Therefore, xu et al propose a feature selection method based on a support vector machine for selecting key genes (features) in data. The method adopts a two-step feature selection algorithm to process a high-dimensional feature set and screen important features which can help prediction. The result shows that the characteristic selection method based on machine learning is obviously superior to the traditional manual selection method.
The establishment of predictive models for gene expression profile data using machine learning methods still has some problems: (1) The number of samples is far less than the number of signature genes (2) the sample data is noisy, which can lead to over-fitting problems with the predictive model, thereby impeding the use of deep learning techniques because the training process typically requires a large number of samples.
Disclosure of Invention
According to a first aspect of the present invention there is provided a method of cancer survival prediction comprising: obtaining gene expression profile data of a patient with cancer to be predicted; providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient; and obtaining the output of the neural network prediction model to obtain a survival prediction result of the cancer patient to be predicted.
In one embodiment, the neural network predictive model is trained as follows: constructing a differential gene regulation network A according to gene expression profile data of cancer patients, wherein A is a p×p adjacency matrix comprising expression values of p characteristic genes of the cancer patients,If a regulation and control relation exists between the gene i and the gene j, the corresponding matrix value is 1, otherwise, the corresponding matrix value is 0; constructing a graph-embedded deep neural network prediction model, the graph-embedded deep neural network prediction model comprising: the input layer, the output layer and the three layers of hidden layers, wherein the first hidden layer is a graph embedded layer, and the graph embedded layer is operated according to the following formula: t i=σ((W1⊙A)·x1+bin), wherein the weight W 1 and the bias term b in are model parameters, the initial values are randomly selected according to normal distribution, the final values are obtained by training, and the ELU activation function is used: Training the input sample set by using the graph embedded deep neural network prediction model to obtain the final value of the model parameter.
In one embodiment, the cancer patient gene expression profile data is pre-processed prior to constructing a differential gene regulation network.
In one embodiment, the preprocessing includes at least one of missing value padding, data normalization, data feature selection.
In one embodiment, the differential co-expression genes and the differential co-expression relationship are screened according to the change of the expression level correlation among the genes under different states to construct a differential co-expression network; wherein, the quantitative method using the link adopts a half-threshold strategy: if at least one of the two co-expression values of a particular link exceeds a threshold, links in two gene co-expression networks from two different states are maintained and those non-information links in which the relevant values in both networks are not important are deleted.
In one embodiment, in a constructed differential co-expression network, a specific regulatory sub-network is identified using prior knowledge of TF-target regulatory relationships, differential regulatory genes and differential regulatory relationships are obtained, wherein the differential co-expression genes and differential co-expression relationships are mapped in a TF2target library, if the differential co-expression genes are TF, the differential co-expression genes are considered to be differential regulatory genes, if the differential co-expression genes are not TF, but TF upstream thereof can exist in the TF2target library, such differential co-expression genes are also retained in the list of differential regulatory genes; for the differential co-expression relationship, it is identified as two differential regulatory relationships, if a pair of differential co-expression relationships is exactly one TF-regulated target gene relationship, such TF is defined as a first class of differential regulatory relationship, and if two genes in a pair of differential co-expression relationships are exactly regulated by the same upstream TF, such TF is defined as a second class of differential regulatory relationship.
According to a second aspect of the present invention, there is provided a cancer lifetime prediction device comprising: the data acquisition module is used for acquiring gene expression profile data of a cancer patient to be predicted; a prediction module for providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient; the prediction result acquisition module is used for acquiring the output of the neural network prediction model to obtain the survival time prediction result of the cancer patient to be predicted.
The prediction module trains the neural network prediction model in the following manner: constructing a differential gene regulation network A according to gene expression profile data of cancer patients, wherein A is a p×p adjacency matrix comprising expression values of p characteristic genes of the cancer patients,If a regulation and control relation exists between the gene i and the gene j, the corresponding matrix value is 1, otherwise, the corresponding matrix value is 0; constructing a graph-embedded deep neural network prediction model, the graph-embedded deep neural network prediction model comprising: the input layer, the output layer and the three layers of hidden layers, wherein the first hidden layer is a graph embedded layer, and the graph embedded layer is operated according to the following formula: t i=σ((W1⊙A)·x1+bin), wherein the weight W1 and the bias term bin are model parameters, initial values are randomly selected according to normal distribution, a final value is obtained by training, and an ELU activation function is used: Training the input sample set by using the graph embedded deep neural network prediction model to obtain the final value of the model parameter.
According to a third aspect of the present invention, there is provided a computing device comprising a memory storing a program and a processor implementing the above-described cancer lifetime prediction method when executing the program.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the above-described cancer lifetime prediction method.
The method relieves the degree of overfitting to a certain extent, and can be more accurately suitable for predicting the survival state of the prognosis of the cancer.
Drawings
FIG. 1 shows a flowchart of a method for predicting cancer lifetime according to an embodiment of the invention.
FIG. 2 shows a gene co-expression network according to an embodiment of the present invention.
FIG. 3 illustrates a flow of feature selection using differential regulation analysis in accordance with an embodiment of the present invention.
Fig. 4 shows a neural network structure diagram of an embodiment of the present invention.
Fig. 5a shows ROC curves for a GSE10143 sample set of an embodiment of the invention.
Fig. 5b shows ROC curves for GSE14520 sample sets of an embodiment of the present invention.
Fig. 5c shows ROC curves for TCGA sample sets for embodiments of the present invention.
FIG. 6a shows a survival curve of a GSE10143 sample set according to an embodiment of the invention.
Fig. 6b shows a survival curve of a GSE14520 sample set of an embodiment of the present invention.
Fig. 6c shows a survival curve of a TCGA sample set according to an embodiment of the present invention.
Fig. 7 is a block diagram showing a cancer lifetime prediction device according to an embodiment of the present invention.
FIG. 8 illustrates a block diagram of the interior of a computing device for implementing a method of cancer survival prediction according to an embodiment of the invention.
Detailed Description
Deep learning is a new field of machine learning, and greatly promotes the development of intelligent fields such as image processing, voice recognition and the like. Deep neural networks (Deep Neural Networks, DNN) are an important application in deep learning. DNNs have been used in medical related studies of neuroimaging, medical imaging tissue, and the like. However, there are few related studies on the establishment of disease models, and DNN has the most important characteristic that it has little dependence on feature selection, and feature selection can be completed in model experiments.
Furthermore, the Graph-Embedded Deep Feedforward neural network model (GEDNN) integrates a known gene network into a deep neural network architecture, so that sparse connection between network layers can be realized, and the problem of overfitting is effectively prevented. The method is applied to the problem of prognosis survival state prediction of cancers such as liver cancer, and a deep nerve model based on embedding of a differential gene regulation network (Gene Regulatory Networks, GRN) is constructed.
Studies have shown that gene-specific physiological activities are related to the expression levels of genes in cells, while the expression levels of genes may be affected by other genes, and that the regulatory network formed by the interrelationship of such genes dominates various vital activities of the organism. The gene regulation network plays a decisive role in the regulation of gene expression and the control of phenotype. By constructing and utilizing gene regulatory network information, physiological activities occurring in cells can be known from a system perspective, interactions among various biomolecules in cells can be known, and expression levels of the genes can be changed in a cooperative manner, so that behavior prediction is provided for the progress of life exercise from a system perspective. The application of the gene regulation network to research cancer can obtain new knowledge of the cancer which is difficult to obtain by the traditional molecular experiment, as the cancer is a complex system related to multiple genes, the regulation networks at different stages have differences, the changed regulation relationship and genes are identified, and the construction of the differential gene regulation network has been proved to be effectively used for revealing the mechanism of complex diseases. The application utilizes a differential regulation analysis method in DCGL software packages to construct differential gene regulation network for characteristic screening of gene expression profile data so as to obtain key pathogenic genes and regulation relations, and embeds regulation network information into GEDNN neural network model for lifetime prediction.
Specifically, referring to fig. 1, in one embodiment of the present application, a cancer survival prediction method includes the steps of: 1) Obtaining gene expression profile data of a patient with cancer to be predicted; 2) Providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient; 3) And obtaining the output of the neural network prediction model to obtain a survival prediction result of the cancer patient to be predicted. Wherein the neural network prediction model is trained as follows.
Construction of differential Gene regulatory networks based on Gene expression profiling data of cancer patients
A large number of existing histology databases were collected for tumor-related data. Tumor-associated histology data, such as gene expression, is downloaded from the relevant database and further pre-processed, such as deletion value filling, data normalization, data feature selection, etc., to construct a tumor-associated data set, i.e., gene expression profile data, for later use.
Samples with paired gene expression data in both normal and cancer were selected from the pretreated gene expression profile data. The differential co-expression genes DCGs and the differential co-expression relationship DCLs are screened according to the change of the expression level correlation between the genes under different states to construct a specific differential co-expression network DCN, as shown in FIG. 2. The quantification method using links employs a half-threshold strategy: if at least one of the two co-expression values of a particular link exceeds a threshold, links in two gene co-expression networks from two different states are maintained and those non-information links in which the relevant values in both networks are not important are deleted.
Since most of the regulatory relationships are unchanged during the biological phenotype change, a group of genes representing significant changes in the regulatory relationships need to be screened first in order to be able to effectively focus on the changing regulatory relationships. In the constructed differential co-expression network, a specific regulation sub-network is identified by using prior knowledge of TF-target regulation relation, and differential regulation gene DRG and differential regulation relation DRL are obtained. In the process of screening Differential Regulatory Genes (DRGs) and differential regulatory gene pairs (DRLs), the differential co-expression genes DCGs and the differential co-expression relationship DCLs are mapped in a TF2target library by the differential regulatory analysis method in DCGL v 2.0.0 software package. If a DCG happens to be a TF, it is considered to be a DRG (as in genes a and B of the left table in fig. 3), if a DCG is not a TF, but the TF upstream thereof can exist in the TF2target library, such DCG is not a DRG but is also retained in the list of DRGs (as in genes C and D of the left table in fig. 3). For DCLs, two DRLs were identified, if a pair of DCLs happens to be a relationship of TF regulatory target genes, such TF is defined as a first class of DRL: TF2target_dcl (as in (a, B) of the right table in fig. 3); if both genes in a pair of DCLs are regulated by exactly the same upstream TF, this type of TF is defined as a second class of DRL: tf_ bridged _dcl (as in (B, C) of the right table in fig. 3). The gene regulation network may be represented in an abstract way as an unauthorized graph, each gene being a node of the graph, and if there is a relationship between two genes, an edge exists between the two nodes, and finally the graph may be represented by using an adjacency matrix.
Construction map embedded deep neural network prediction model
The graph-embedded deep neural network (GEDNN) model differs from the common fully-connected deep neural network in that it does not have the input layer fully connected with the first hidden layer, but embeds the feature graph into the first hidden layer, thereby achieving a sparse connection with fixed information.
Referring to fig. 4, the present application contemplates a five-layer neural network comprising: an input layer (p features), an output layer (two features) and a three-layer hidden layer (p features, 64 features and 16 features from the first layer to the third layer, respectively), wherein the first hidden layer is a picture-embedded layer:
ti=σ((W1⊙A)·x1+bin)
Wherein A is the differential gene regulation network constructed previously, which is a p×p adjacency matrix:
That is, if there is a regulatory relationship between gene i and gene j, the corresponding matrix value is 1, otherwise, the corresponding matrix value is 0. The P features are the expression values of P feature genes of each patient sample in the data set, namely, each gene node of the graph corresponding to the matrix A, the weight W 1 and the bias term b in are model parameters, the initial values are randomly selected according to normal distribution, and the final values are learned after the samples enter the model for training.
Using the ELU activation function:
Training the input sample set by using the graph embedded deep neural network prediction model to obtain the final value of the model parameter. In order to train the DNN model, the present application uses the L2 norm as a regularization term and reduces the degree of overfitting with droupout method, and selects to use Adam optimizer, a variant learning algorithm of the most widely used conventional gradient descent algorithm, for parameter learning. Furthermore, the present application uses a small batch training strategy of the optimizer to randomly train a small portion of the samples in each iteration.
The application uses two sets of liver cancer RNA-seq data sets from cancer genome map GEO database and one set of liver cancer RNA-seq data set from TCGA database to respectively construct gene regulation and control network to obtain differential regulation and control gene DRG and connect DRL (differential regulation and control relation). Each set consisted of gene expression profiles, data set 1 from GEO with 80 cancer patient tissue samples and 82 normal tissue samples, each sample containing 6100 signature genes, data set 2 from GEO with 221 cancer patient tissue samples and 210 normal tissue samples, each sample containing 13050 signature genes, data set 3 from TCGA with 373 cancer patient tissue samples and 50 normal tissue samples, each sample containing 56926 signature genes, and clinical data containing measurements of various disease states. And constructing a differential gene regulation network for the expression profile data of each dataset by using a differential regulation analysis method in DCGL software packages, and then combining to obtain 1869 differential regulation genes and 421380 differential regulation relations.
In combination with the obtained differential regulatory genes and regulatory relationships, the application uses the GSE10143 and GSE14520 data sets from the GEO database and the data set 3 from the TCGA database to manufacture a sample set for life cycle prediction tasks. For the GSE10143 sample set, 1378 differential regulation genes are screened out as characteristics of the sample set, the expression value of each gene is normalized, and the ten-year survival state of liver cancer is determined as a prediction target according to the survival condition in the sample. For a GSE14520 sample set, 1253 differential regulation genes are screened out as characteristics of the sample set, the expression value of each gene is normalized, and the five-year survival state of liver cancer is determined as a prediction target according to the survival condition in the sample. 768 differential regulation genes are screened out from a TCGA sample set to serve as characteristics of the sample set, normalization processing is carried out on the expression value of each gene, and ten years of survival states of liver cancer are taken as prediction targets according to the survival time conditions in the sample.
Three sets of samples were trained using the constructed graph-embedded deep neural network model and the common DNN model, 10 validation experiments were performed in the training set, and the average AUC (Area Under the ROC Curve, where ROC is an abbreviation of receiver operating characteristic Curve, i.e., the subject operating characteristic Curve, with the horizontal axis representing false positive rate False positive rate, FPR, the vertical axis representing true positive rate True positive rate, TPR) results were as follows in table 1:
TABLE 1
Table 1 shows the cross-validated AUC predictions for the trained model in the training set, indicating the generalization ability of the model training.
In addition to this average AUC, the overall results of the accuracy, precision, recall, and F1 score for the four evaluation indicators are shown in table 2 below:
TABLE 2
The correct rate (Accuracy) is the total specific gravity of all predicted correct (positive class negative class).
Precision, also known as Precision, is the proportion of the total prediction that is correctly predicted to be positive. Recall (Recall), also known as Recall. I.e. a positive proportion of all real positive, correctly predicted. The F1 value (F1 Score) is the harmonic average of the precision and recall.
In the final test set, the predicted AUC value of the neural network model embedded based on the differential gene regulation network in the GSE10143 sample is 0.9583, the predicted AUC value in the GSE14520 sample is 0.7823, the predicted AUC value in the TCGA sample is 0.7214, and corresponding ROC curves are respectively shown in fig. 5a, 5b and 5c, and compared with the common DNN model, the result shows that the method of the application relieves the degree of overfitting to a certain extent and can be more accurately suitable for the prediction of the prognosis survival state of cancer, such as liver cancer patients. According to the application, the difference regulation network is constructed in a targeted manner, so that the neural network model is trained, a better effect is obtained on data application, a better prediction effect is obtained, and the consumption of hardware (such as a memory resource) and the processing time are not increased more than those of the prior art.
All samples of the three sample sets are classified by using the finally obtained model, three predicted patient groups are respectively obtained, a survival curve (Survival curve) is drawn, the generated curve is used for researching the relation between the survival probability of an individual and time, the follow-up time is taken as a horizontal axis, the survival rate is taken as a vertical axis, and each point is connected into a curve, and the result is shown in fig. 6a, 6b and 6 c.
Referring to fig. 7, a cancer lifetime prediction apparatus according to an embodiment of the present invention includes: the data acquisition module is used for acquiring gene expression profile data of a cancer patient to be predicted; a prediction module for providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient; the prediction result acquisition module is used for acquiring the output of the neural network prediction model to obtain the survival time prediction result of the cancer patient to be predicted. Wherein the prediction module trains the neural network prediction model in the following manner: constructing a differential gene regulation network A according to gene expression profile data of cancer patients, wherein A is a p×p adjacency matrix comprising expression values of p characteristic genes of the cancer patients,If a regulation and control relation exists between the gene i and the gene j, the corresponding matrix value is 1, otherwise, the corresponding matrix value is 0; constructing a graph-embedded deep neural network prediction model, the graph-embedded deep neural network prediction model comprising: the input layer, the output layer and the three layers of hidden layers, wherein the first hidden layer is a graph embedded layer, and the graph embedded layer is operated according to the following formula: t i=σ((W1⊙A)·x1+bin), wherein the weight W1 and the bias term bin are model parameters, initial values are randomly selected according to normal distribution, a final value is obtained by training, and an ELU activation function is used: /(I)Training the input sample set by using the graph embedded deep neural network prediction model to obtain the final value of the model parameter.
The method of the present application may be implemented in a computing device. An exemplary internal architecture diagram of a computing device may be as shown in fig. 8, which may include a processor, memory, external interfaces, display and input devices connected by a system bus. Wherein the processor is configured to provide computing and control capabilities. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, application programs, databases, etc. The internal memory provides an environment for the operation of the operating system and programs in the non-volatile storage media. The external interface includes, for example, a network interface for communicating with an external terminal through a network connection. The external interface may also include a USB interface, or the like. The display of the computing device may be a liquid crystal display or an electronic ink display, and the input device may be a touch layer covered on the display, or may be a key, a track ball or a touch pad arranged on a shell of the computing device, or may be an external keyboard, a touch pad or a mouse, for example.
The program stored by the non-volatile storage medium in the computing device, when executed by the processor, may implement the cancer lifetime prediction method described above. Alternatively, the nonvolatile storage medium may exist in a separate physical form, for example, a usb disk, and when the nonvolatile storage medium is connected to a processor, a program stored on the usb disk is executed to implement the method described above. The method of the invention can also be realized as an APP in apple or android application markets
(Application program) for the user to download to the respective mobile terminal for operation.
It will be appreciated by those skilled in the art that the architecture shown in fig. 8 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computing device to which the present inventive arrangements may be applied, and that a particular computing device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
As described above, it will be understood by those skilled in the art that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program stored in a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The computer according to the present invention is a computing device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction in a broad sense, and its hardware may include at least one memory, at least one processor, and at least one communication bus. Wherein the communication bus is used to enable connection communication between these elements. The processor may include, but is not limited to, a microprocessor. Computer hardware may also include Application SPECIFIC INTEGRATED Circuits (ASICs), programmable gate arrays (Field-Programmable GATE ARRAY, FPGA), digital processors (DIGITAL SIGNAL processors, DSPs), embedded devices, and the like. The computer may also include network devices and/or user devices. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers, wherein Cloud Computing is one of distributed Computing, and is a super virtual computer composed of a group of loosely coupled computer sets.
The computing device may be, but is not limited to, any terminal such as a personal computer, a server, etc. that can interact with a user by means of a keyboard, a touch pad, or a voice-operated device, etc. The computing device herein may also include a mobile terminal, which may be, but is not limited to, any electronic device that can interact with a user by way of a keyboard, touchpad, or voice-controlled device, such as a tablet, smart phone, personal digital assistant (Personal DIGITAL ASSISTANT, PDA), smart wearable device, or the like. The network in which the computing device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
The memory is used for storing program codes. The Memory may be a circuit with a Memory function, such as RAM (Random-Access Memory), FIFO (FIRST IN FIRST Out), etc., without physical form in the integrated circuit. Or the memory may be a physical form of memory, such as a memory bank, TF card (Trans-FLASH CARD), smart media card (SMART MEDIA CARD), secure digital card (secure DIGITAL CARD), flash memory card (FLASH CARD), or the like.
The processor may include one or more microprocessors, digital processors. The processor may call program code stored in the memory to perform the relevant functions. For example, each of the modules depicted in fig. 8 is program code stored in the memory and executed by the processor to implement the methods described above. The processor is also called a central processing Unit (CPU, central Processing Unit), which can be a very large scale integrated circuit, and is an operation Core (Core) and a Control Core (Control Unit).
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or elements may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (8)

1. A method for predicting survival of cancer, comprising:
Obtaining gene expression profile data of a patient with cancer to be predicted;
providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient;
Obtaining the output of the neural network prediction model to obtain a survival prediction result of a cancer patient to be predicted; the neural network predictive model is trained as follows:
constructing a differential gene regulation network A according to gene expression profile data of cancer patients, wherein A is one Comprising expression values of p signature genes of cancer patients: /(I)Wherein, if there is a regulation relation between gene i and gene j, the corresponding matrix value is 1, otherwise, the corresponding matrix value is 0;
Constructing a graph-embedded deep neural network prediction model, the graph-embedded deep neural network prediction model comprising: the input layer, the output layer and the three layers of hidden layers, wherein the first hidden layer is a graph embedded layer, and the graph embedded layer is operated according to the following formula: Wherein the weight W 1 and the bias term b in are model parameters, initial values are randomly selected according to normal distribution, a final value is obtained by training, and an ELU activation function is used: /(I)
Training the input sample set by using the graph embedded deep neural network prediction model to obtain the final value of the model parameter.
2. The method of claim 1, wherein the cancer patient gene expression profile data is pre-processed prior to constructing a differential gene regulation network.
3. The method of claim 2, wherein the preprocessing comprises at least one of missing value padding, data normalization, data feature selection.
4. The method of claim 1, wherein the differential co-expression genes and differential co-expression relationships are screened for changes in the correlation of expression levels between genes under different conditions to construct a differential co-expression network; wherein, the quantitative method using the link adopts a half-threshold strategy: if at least one of the two co-expression values of a particular link exceeds a threshold, links in two gene co-expression networks from two different states are maintained and those non-information links in which the relevant values in both networks are not important are deleted.
5. The method of claim 4, wherein in the constructed differential co-expression network, a priori knowledge of TF-target regulatory relationships is used to identify specific regulatory sub-networks, differential regulatory genes and differential regulatory relationships are obtained, wherein the differential co-expression genes and differential co-expression relationships are mapped in a TF2target library, if the differential co-expression genes are TF, the differential co-expression genes are considered to be differential regulatory genes, if the differential co-expression genes are not TF, but TF upstream thereof can exist in the TF2target library, such differential co-expression genes are also retained in the list of differential regulatory genes; for the differential co-expression relationship, it is identified as two differential regulatory relationships, if a pair of differential co-expression relationships is exactly one TF-regulated target gene relationship, such TF is defined as a first class of differential regulatory relationship, and if two genes in a pair of differential co-expression relationships are exactly regulated by the same upstream TF, such TF is defined as a second class of differential regulatory relationship.
6. A cancer survival prediction apparatus, comprising:
the data acquisition module is used for acquiring gene expression profile data of a cancer patient to be predicted;
a prediction module for providing the gene expression profile data as input to a trained neural network prediction model trained to predict a lifetime of a cancer patient based on the gene expression profile data of the cancer patient;
The prediction result acquisition module is used for acquiring the output of the neural network prediction model to obtain a survival period prediction result of a cancer patient to be predicted;
the prediction module trains the neural network prediction model in the following manner:
constructing a differential gene regulation network A according to gene expression profile data of cancer patients, wherein A is one Comprising expression values of p signature genes of cancer patients: /(I)Wherein, if there is a regulation relation between gene i and gene j, the corresponding matrix value is 1, otherwise, the corresponding matrix value is 0;
Constructing a graph-embedded deep neural network prediction model, the graph-embedded deep neural network prediction model comprising: the input layer, the output layer and the three layers of hidden layers, wherein the first hidden layer is a graph embedded layer, and the graph embedded layer is operated according to the following formula: Wherein the weight W 1 and the bias term b in are model parameters, initial values are randomly selected according to normal distribution, a final value is obtained by training, and an ELU activation function is used: /(I)
Training the input sample set by using the graph embedded deep neural network prediction model to obtain the final value of the model parameter.
7. A computing device comprising a memory and a processor, the memory storing a program, wherein the processor implements the method of any of claims 1-5 when executing the program.
8. A computer readable storage medium having a program stored thereon, which when executed by a processor implements the method of any of claims 1-5.
CN202011086809.2A 2020-10-12 2020-10-12 Cancer lifetime prediction method, device, computing equipment and computer readable storage medium Active CN112201346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011086809.2A CN112201346B (en) 2020-10-12 2020-10-12 Cancer lifetime prediction method, device, computing equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011086809.2A CN112201346B (en) 2020-10-12 2020-10-12 Cancer lifetime prediction method, device, computing equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112201346A CN112201346A (en) 2021-01-08
CN112201346B true CN112201346B (en) 2024-05-07

Family

ID=74008610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011086809.2A Active CN112201346B (en) 2020-10-12 2020-10-12 Cancer lifetime prediction method, device, computing equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112201346B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112820403B (en) * 2021-02-25 2024-03-29 中山大学 Deep learning method for predicting prognosis risk of cancer patient based on multiple sets of learning data
CN113436682B (en) * 2021-06-30 2024-05-24 平安科技(深圳)有限公司 Risk group prediction method and device, terminal equipment and storage medium
CN113782092B (en) * 2021-09-16 2023-06-02 平安科技(深圳)有限公司 Method and device for generating lifetime prediction model and storage medium
CN113782093B (en) * 2021-09-16 2024-03-05 平安科技(深圳)有限公司 Method and device for acquiring gene expression filling data and storage medium
CN115631847B (en) * 2022-10-19 2023-07-14 哈尔滨工业大学 Early lung cancer diagnosis system, storage medium and equipment based on multiple groups of chemical characteristics
CN116129992A (en) * 2023-04-17 2023-05-16 之江实验室 Gene regulation network construction method and system based on graphic neural network
CN116228753B (en) * 2023-05-06 2023-08-01 中山大学孙逸仙纪念医院 Tumor prognosis evaluation method, device, computer equipment and storage medium
CN118116585B (en) * 2024-04-30 2024-07-26 奥明星程(杭州)生物科技有限公司 Method and device for judging benign and malignant cancers through DNN

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202988A (en) * 2016-10-11 2016-12-07 哈尔滨工业大学深圳研究生院 The Stepwise multiple-regression model of a kind of predictive disease life cycle and application
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network
CN111612243A (en) * 2020-05-18 2020-09-01 湖南大学 Traffic speed prediction method, system and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI681406B (en) * 2018-12-20 2020-01-01 中國醫藥大學附設醫院 Deep learning of tumor image-aided prediction of prognosis of patients with uterine cervical cancer system, method and computer program product thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202988A (en) * 2016-10-11 2016-12-07 哈尔滨工业大学深圳研究生院 The Stepwise multiple-regression model of a kind of predictive disease life cycle and application
CN111161882A (en) * 2019-12-04 2020-05-15 深圳先进技术研究院 Breast cancer life prediction method based on deep neural network
CN111612243A (en) * 2020-05-18 2020-09-01 湖南大学 Traffic speed prediction method, system and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks;J.KHAN 等;NATURE MEDICINE;20011231(第6期);673-749 *
Yan 等.Artificial Neural Networks and Gene Filtering Distinguish Between Global Gene Expression Profiles of Barrett's Esophagus and Esophageal Cancer.Cancer Research.2002,(第11-12期),3493-3511 . *

Also Published As

Publication number Publication date
CN112201346A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN112201346B (en) Cancer lifetime prediction method, device, computing equipment and computer readable storage medium
Zhang et al. Optimization of the convolutional neural networks for automatic detection of skin cancer
KR102190299B1 (en) Method, device and program for predicting the prognosis of gastric cancer using artificial neural networks
Tabrizchi et al. An improved VGG model for skin cancer detection
Kamel et al. Improving the performance of support-vector machine by selecting the best features by Gray Wolf algorithm to increase the accuracy of diagnosis of breast cancer
Savareh et al. A machine learning approach identified a diagnostic model for pancreatic cancer through using circulating microRNA signatures
Chatterjee et al. Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data
KR20190030876A (en) Method for prediting health risk
Rosado et al. Survival model in oral squamous cell carcinoma based on clinicopathological parameters, molecular markers and support vector machines
Sharma et al. Usage of probabilistic and general regression neural network for early detection and prevention of oral cancer
CN112735592B (en) Construction method and application method of lung cancer prognosis model and electronic equipment
KR20180071243A (en) System and method for patient-specific prediction of drug response from cell line genomics
JP7568276B2 (en) System or method for predicting trait information of an individual
Luque-Baena et al. Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data
Yeh et al. Prediction of treatment outcome in neovascular age-related macular degeneration using a novel convolutional neural network
CN115132358A (en) Machine learning for multi-state models of disease
KR102172374B1 (en) Apparatus, Method and Computer Program for Predicting the Prognosis of Synovial Sarcoma Using Artificial Neural Network
CN106228034A (en) A kind of method for mixing and optimizing of tumor-related gene search
KR20190031192A (en) Method for prediting health risk
Fargeas et al. Independent component analysis for rectal bleeding prediction following prostate cancer radiotherapy
Rajalaxmi et al. A systematic review of lung cancer prediction using machine learning algorithm
Zhou et al. Identifying biomarkers of nottingham prognosis index in breast cancer survivability
Yan et al. Survival prediction for patients with glioblastoma multiforme using a Cox proportional hazards denoising autoencoder network
Ganapathy et al. A brain tumor prediction system for detecting the tumor disease using mini batch K-Means clustering and CNN
KR20210158253A (en) A tissue origin prediction device, method of predicting the tissue origin using a genome data, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant