CN113284611B - Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity - Google Patents

Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity Download PDF

Info

Publication number
CN113284611B
CN113284611B CN202110535516.6A CN202110535516A CN113284611B CN 113284611 B CN113284611 B CN 113284611B CN 202110535516 A CN202110535516 A CN 202110535516A CN 113284611 B CN113284611 B CN 113284611B
Authority
CN
China
Prior art keywords
cancer
data
model
individual
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110535516.6A
Other languages
Chinese (zh)
Other versions
CN113284611A (en
Inventor
杨铁林
柯欣
董珊珊
郭燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110535516.6A priority Critical patent/CN113284611B/en
Publication of CN113284611A publication Critical patent/CN113284611A/en
Application granted granted Critical
Publication of CN113284611B publication Critical patent/CN113284611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a cancer diagnosis and prognosis prediction system, equipment and storage medium based on individual pathway activity, wherein the system comprises: the data acquisition module is used for acquiring transcriptome sequencing data of an individual to be tested; the standardized processing module is used for carrying out standardized processing on transcriptome sequencing data of an individual to be tested, converting a standardized gene expression value into a gene sequencing value, and the pathway activity calculating module is used for calculating the pathway activity according to the gene sequencing value; and a prediction module for performing cancer diagnosis and prognosis prediction using the cancer diagnosis model and the cancer prognosis prediction model, respectively, based on the pathway activity data. Compared with the prior art, the system does not depend on the population, eliminates the batch effect generated by heterogeneity among cancer samples and cross-platform sequencing, more comprehensively and stably reflects the substance metabolism level of an individual, improves the efficiency of cancer diagnosis and prognosis prediction, and provides references for the research of subsequent cancer heterogeneity and the development of personalized medicine.

Description

Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity
Technical Field
The invention relates to the field of molecular diagnosis of cancers, in particular to a cancer diagnosis and prognosis prediction system, equipment and storage medium based on individual pathway activity.
Background
The molecular markers which are characteristic in the transcriptomic data screening are used for layering cancer patients, so that diagnosis, risk assessment and prognosis prediction of cancer can be remarkably improved. However, existing molecular markers for cancer are mostly based on a single gene or a single molecule, have limited reproducibility and sensitivity, and are difficult to apply to clinical practice. More and more studies have shown that cancer is essentially a result of a disturbance in the complex regulatory relationships between multiple functionally related genes, suggesting that cancer expression data should be interpreted from the level of functional modules (e.g., biological pathways) rather than from the level of individual genes and molecules. Existing cancer pathway activity algorithms are mostly dependent on populations or accumulated normal samples, and are susceptible to inter-sample heterogeneity and batch effects caused by different sequencing analysis methods.
The artificial neural network is a mathematical model which is based on network topology knowledge and simulates a processing mechanism of a nervous system of a human brain on complex information, and the artificial neural network is not used for executing operation step by step according to a given program, but can adapt to environment, summarize rules and complete operation, identification or process control. As one of the most widely used artificial neural network models, a Back Propagation (BP) neural network is a multi-layer feedforward network trained according to an error Back Propagation algorithm. The BP neural network uses the steepest descent method, and continuously adjusts the weight and the threshold value of the network through back propagation, so that the error between the actual output value and the expected output value of the network is minimized, and the training purpose is achieved. The BP neural network has excellent nonlinear approximation capability and obvious superiority in processing missing values and nonlinear problems, and has been widely successful in various fields such as pattern recognition, intelligent control, risk assessment, artificial intelligence and the like.
Therefore, it can be fully introduced into the field of clinical medical diagnosis. However, the original standard BP algorithm has the problems of easy formation of local minimum, slow convergence speed, over fitting and the like. For this reason, researchers have made many beneficial improvements over standard BP algorithms, such as momentum methods, levenberg-Marquardt (LM) optimization methods, conjugate gradient learning algorithms, and the like.
Disclosure of Invention
The invention aims to provide a cancer diagnosis and prognosis prediction system, equipment and storage medium based on individual pathway activity, which are used for carrying out clinical diagnosis and prognosis prediction on cancer patients according to an individual pathway activity algorithm and combining machine learning, and provide references for subsequent research on cancer heterogeneity and development of personalized medicine.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a cancer diagnosis and prognosis prediction system based on individual pathway activity, comprising:
the data acquisition module is used for acquiring transcriptome sequencing data of an individual to be tested;
the standardized processing module is used for carrying out standardized processing on the transcriptome sequencing data of the individual to be tested, converting the standardized gene expression value into a gene sequencing value,
the pathway activity calculation module is used for calculating pathway activity according to the gene sequencing value;
and a diagnosis prediction module for performing cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the pathway activity data.
As a further improvement of the invention, the normalization processing module is specifically configured to normalize transcriptome sequencing data of an individual to be tested; sequencing the expression values of the standardized genes from small to large, and taking the sequence as the expression level of the genes; assigning the gene rank in the specific rank to the same value according to the rank of the ranking value, so as to obtain the final expression level of each gene;
as a further improvement of the invention, the pathway activity calculation module is specifically used for collecting and organizing information of all biological pathways from a Kyoto gene and genome encyclopedia (Kyoto Encyclopedia of Genes and Genomes, KEGG) database, and respectively extracting a gene list participating in each pathway; calculating the average value of genes in a pathway according to the gene expression level to obtain the activity level of the pathway; the activity level of all KEGG pathways was calculated in bulk to give pathway activity.
As a further improvement of the present invention, the cancer diagnosis model is constructed by the following method:
calculating the pathway activity of samples in a tumor genome map (The Cancer Genome Atlas, TCGA) database;
constructing a cancer diagnostic model using the pathway activity data:
for each cancer, randomly dividing the sample dataset into a training set and a test set; creating a double hidden layer BP neural network by using data in a training set, training the built network by using an LM algorithm to optimize the prediction capacity of the built network, and finally testing the performance of the model in a testing set;
and verifying and optimizing the model by using the independent cancer data set to obtain a cancer diagnosis model.
As a further improvement of the present invention, the prognosis prediction model is constructed by the following method:
for each cancer, performing a survival analysis on each pathway using the pathway activity data in combination with clinical prognosis data for the sample; screening for pathways that significantly affect patient survival;
for each cancer, a single factor COX regression model was separately constructed for pathways that significantly affected patient survival time;
screening out significant channels in single-factor COX regression, and further screening out representative channels by using Lasso regression;
for each cancer, a multi-factor COX regression model was constructed from the resulting representative pathways to yield a prognostic prediction model.
An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of a method of cancer diagnosis and prognosis prediction based on individual pathway activity when the computer program is executed;
a method for diagnosis and prognosis of cancer based on individual pathway activity comprising the steps of:
acquiring transcriptome sequencing data of an individual to be tested;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting a standardized gene expression value into a gene sequencing value;
calculating the pathway activity according to the gene sequencing value;
and respectively carrying out cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the channel activity data.
As a further improvement of the invention, the method for carrying out standardization treatment on the transcriptome sequencing data of the individual to be tested, and converting the standardized gene expression value into the gene sequencing value specifically comprises the following steps:
normalizing transcriptome sequencing data of an individual to be tested; sequencing the expression values of the standardized genes from small to large, and taking the sequence as the expression level of the genes; assigning the gene rank in the specific rank to the same value according to the rank of the ranking value, so as to obtain the final expression level of each gene;
as a further improvement of the present invention, the specific steps of calculating the pathway activity based on the gene order value include:
collecting and sorting information of all biological pathways from the KEGG database, and respectively extracting gene lists participating in the pathways; calculating the average value of genes in a pathway according to the gene expression level to obtain the activity level of the pathway; the activity level of all KEGG pathways was calculated in bulk to give pathway activity.
A computer readable storage medium storing a computer program which when executed by a processor implements the steps of the individual pathway activity based cancer diagnosis and prognosis prediction method.
A method for diagnosis and prognosis of cancer based on individual pathway activity comprising the steps of:
acquiring transcriptome sequencing data of an individual to be tested;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting a standardized gene expression value into a gene sequencing value;
calculating the pathway activity according to the gene sequencing value;
and respectively carrying out cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the channel activity data.
Compared with the prior art, the invention has the beneficial effects that:
the prediction system based on the individual pathway activity algorithm eliminates the batch effect generated by different sequencing analysis methods, is suitable for data generated by various sequencing platforms, and more comprehensively and stably reflects the individual substance metabolism level. An individual-based cancer diagnosis and prognosis prediction model is constructed by combining an individual pathway activity algorithm with machine learning, so that good prediction efficiency is shown in various cancers, and references are provided for subsequent research of cancer heterogeneity and development of personalized medicine. Considering the heterogeneity of cancer samples, the invention finally decides to adopt LM algorithm with generalization capability obviously superior to other models to improve BP neural network algorithm for training and predicting cancer pathway data. Compared with the prior art, the individual-based pathway activity algorithm provided by the invention is independent of the population, eliminates the batch effect generated by heterogeneity among cancer samples and cross-platform sequencing, more comprehensively and stably reflects the individual substance metabolism level, improves the efficiency of cancer diagnosis and prognosis prediction, and provides references for the research of subsequent cancer heterogeneity and the development of personalized medicine.
Drawings
FIG. 1 is a flow chart of an individual-based pathway activity algorithm in accordance with the present invention;
FIG. 2 is a comparison of different pathway activity algorithms for cancer diagnosis efficiency;
FIG. 3 is a graph comparing the efficiency of different pathway activity algorithms in cancer diagnosis in independent cancer datasets;
FIG. 4 is a graph comparing the efficiency of different pathway activity algorithms for prognosis prediction of cancer;
FIG. 5 is a schematic diagram of a cancer diagnosis and prognosis prediction system based on individual pathway activity;
fig. 6 is a schematic structural diagram of an electronic device.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. The examples are given solely for the purpose of illustration and are not intended to limit the scope of the invention.
As shown in fig. 5, a cancer diagnosis and prognosis prediction system based on individual pathway activity of the present invention comprises:
the data acquisition module is used for acquiring transcriptome sequencing data of an individual to be tested;
the standardized processing module is used for carrying out standardized processing on the transcriptome sequencing data of the individual to be tested, converting the standardized gene expression value into a gene sequencing value,
the pathway activity calculation module is used for calculating pathway activity according to the gene sequencing value;
and a diagnosis prediction module for performing cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the pathway activity data.
Specific examples are given below to illustrate the various modules of the invention:
taking TCGA pan-cancer samples as an example, a cancer diagnosis and prognosis model is constructed by the method of the present invention, as will be described in detail below.
As shown in fig. 1, the present invention provides that the standardized processing module is specifically used for processing the following method, including the following steps P1-P3.
P1: normalizing the transcriptome sequencing data of the TCGA, sequencing the expression values of the normalized genes from small to large in the sample for each sample, and taking the sequence as the expression level of the genes;
p2: in order to prevent the influence of small variation on the overall level, the influence of genes with large variation level on the overall level is highlighted. According to the score of the sequencing value, the gene scores in the specific score are assigned to the same value, so that the final expression level of each gene is obtained.
P3: collecting and sorting information of all biological pathways from the KEGG database, and respectively extracting gene lists participating in the pathways; for each sample, the average value of the genes in the pathway was calculated to obtain the activity level of the pathway based on the gene expression level obtained by P2.
The pathway activity calculation module is specifically used for collecting and arranging information of all biological pathways from the KEGG database and respectively extracting gene lists participating in each pathway; calculating the average value of genes in a pathway according to the gene expression level to obtain the activity level of the pathway; the activity level of all KEGG pathways was calculated in bulk to give pathway activity.
The cancer diagnosis model is constructed by the following method:
the invention constructs a cancer diagnosis model by using pathway activity data of a pan-cancer sample, and the specific flow is as follows:
p4: construction of cancer diagnostic models using machine learning
The method specifically comprises the following steps: the calculated TCGA pathway activity data is used for the construction of a cancer diagnostic model. For each cancer, the sample dataset was randomly divided into training and test sets at a 7:3 ratio. Creating a double hidden layer BP neural network by using data in a training set, and training the built network by using an LM algorithm to ensure that the prediction capacity of the built network is optimal;
p5: drawing a subject working characteristic (Receiver Operating Characteristic, ROC) Curve of the model, calculating the Area Under the ROC Curve (Area Under Curve, AUC), evaluating the prediction efficiency of the model, and comparing with the existing pathway activity algorithm;
the method specifically comprises the following steps: referring to the literature, a pathway activity algorithm with better prediction efficiency is selected PLAGE, pathifier, iPAS, individPath, and KEGG pathway activity of all samples in the TCGA is calculated. Using the pathway activity data, cancer diagnostic models were constructed according to the method of P4, respectively. And drawing ROC curves for each model, calculating AUC, and comparing the prediction efficiency of the algorithm with that of the existing algorithm.
P6: the model was validated in a cancer independent dataset.
The method specifically comprises the following steps: transcriptome sequencing data for each cancer is collected in the GEO database, and the independent dataset with the largest sample in each cancer is selected as validation data. The data were normalized and the pathway activity level of each sample in each data was calculated. And verifying a cancer prediction model constructed in the P4 by using the pathway activity data, drawing an ROC curve, calculating an AUC, and comparing the prediction efficiency of the algorithm with that of the existing algorithm.
P7: the model was validated in a separate dataset based on liquid biopsies.
The method specifically comprises the following steps: transcriptome sequencing data based on tumor education platelets (tumor-educated platelets, TEPs) were collected and normalized. The level of pathway activity in each sample was calculated and the cancer predictive model constructed in P1 was validated.
The prognosis prediction model is constructed by adopting the following method:
in combination with prognosis data, the invention constructs a cancer prognosis prediction model based on individual pathway activity, and the specific flow is as follows:
p8: for each cancer, survival analysis was performed on each pathway using TCGA pathway activity data in combination with clinical prognosis data for the sample; screening for pathways that significantly affect patient survival (P-value < 0.05);
p9: for each cancer, a single factor COX regression model was separately constructed for pathways that significantly affected patient survival time;
p10: screening out significant paths (P < 0.05) in single-factor COX regression, and further screening out representative paths by using Lasso regression to construct a multi-factor COX regression model;
p11: calculating a consistency index (concordance index, C-index) of the multi-factor COX regression model, and comparing the prognosis prediction efficiency of the algorithm with that of the existing algorithm;
TABLE 1 predictive efficiency of the invention in independent data of tumor-based educational platelets
Figure BDA0003069475240000091
Experimental results: the invention constructs a cancer diagnosis and prognosis prediction model based on individual pathway activity.
Compared with the prior method, the universal cancer diagnosis and prognosis prediction model constructed by the invention has better prediction efficiency (figures 2-4), and has high clinical application value in liquid biopsy based on tumor education platelets (table 1).
As shown in fig. 6, a second object of the present invention is to provide an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which processor, when executing the computer program, implements the steps of the individual pathway activity based cancer diagnosis and prognosis method.
A method for diagnosis and prognosis of cancer based on individual pathway activity comprising the steps of:
acquiring transcriptome sequencing data of an individual to be tested;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting a standardized gene expression value into a gene sequencing value;
calculating the pathway activity according to the gene sequencing value;
and respectively carrying out cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the channel activity data.
The method for converting the standardized gene expression value into the gene sequencing value specifically comprises the following steps of:
normalizing transcriptome sequencing data of an individual to be tested; sequencing the expression values of the standardized genes from small to large, and taking the sequence as the expression level of the genes; assigning the gene rank in the specific rank to the same value according to the rank of the ranking value, so as to obtain the final expression level of each gene;
the specific steps of calculating the pathway activity according to the gene sequencing value include:
collecting and sorting information of all biological pathways from the KEGG database, and respectively extracting gene lists participating in the pathways; calculating the average value of genes in a pathway according to the gene expression level to obtain the activity level of the pathway; the activity level of all KEGG pathways was calculated in bulk to give pathway activity.
It is a third object of the present invention to provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the individual pathway activity based cancer diagnosis and prognosis prediction method.
A method for diagnosis and prognosis of cancer based on individual pathway activity comprising the steps of:
acquiring transcriptome sequencing data of an individual to be tested;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting a standardized gene expression value into a gene sequencing value;
calculating the pathway activity according to the gene sequencing value;
and respectively carrying out cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the channel activity data.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (5)

1. A cancer diagnosis and prognosis prediction system based on individual pathway activity, comprising:
the data acquisition module is used for acquiring transcriptome sequencing data of an individual to be tested;
the standardized processing module is used for carrying out standardized processing on the transcriptome sequencing data of the individual to be tested, converting the standardized gene expression value into a gene sequencing value,
the pathway activity calculation module is used for calculating pathway activity according to the gene sequencing value;
a diagnosis prediction module for performing cancer diagnosis and prognosis prediction using the cancer diagnosis model and the cancer prognosis prediction model, respectively, according to the pathway activity data;
the standardized processing module is specifically used for standardizing transcriptome sequencing data of an individual to be tested; sequencing the expression values of the standardized genes from small to large, and taking the sequence as the expression level of the genes; assigning the gene rank in the specific rank to the same value according to the rank of the ranking value, so as to obtain the final expression level of each gene;
the cancer diagnosis model is constructed by the following method:
calculating the path activity of a sample in a database;
constructing a cancer diagnostic model using the pathway activity data:
for each cancer, randomly dividing the sample dataset into a training set and a test set; creating a double hidden layer back propagation neural network by using data in a training set, training the built network by using an LM algorithm to optimize the prediction capacity, and finally testing the performance of the model in a testing set;
verifying and optimizing the model by using the independent cancer data set to obtain a cancer diagnosis model;
the prognosis prediction model is constructed by adopting the following method:
for each cancer, performing a survival analysis on each pathway using the pathway activity data in combination with clinical prognosis data for the sample; screening for pathways that significantly affect patient survival;
for each cancer, a single factor COX regression model was separately constructed for pathways that significantly affected patient survival time;
screening out significant channels in single-factor COX regression, and further screening out representative channels by using Lasso regression;
for each cancer, a multi-factor COX regression model was constructed from the resulting representative pathways to yield a prognostic prediction model.
2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
the pathway activity calculation module is specifically used for collecting and arranging information of all biological pathways from the KEGG database and respectively extracting gene lists participating in each pathway; calculating the average value of genes in a pathway according to the gene expression level to obtain the activity level of the pathway; the activity level of all KEGG pathways was calculated in bulk to give pathway activity.
3. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of a method of cancer diagnosis and prognosis prediction based on individual pathway activity when the computer program is executed;
a method for diagnosis and prognosis of cancer based on individual pathway activity comprising the steps of:
acquiring transcriptome sequencing data of an individual to be tested;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting a standardized gene expression value into a gene sequencing value;
calculating the pathway activity according to the gene sequencing value;
respectively carrying out cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the channel activity data;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting the standardized gene expression value into a gene sequencing value specifically comprises the following steps:
normalizing transcriptome sequencing data of an individual to be tested; sequencing the expression values of the standardized genes from small to large, and taking the sequence as the expression level of the genes; assigning the gene rank in the specific rank to the same value according to the rank of the ranking value, so as to obtain the final expression level of each gene;
the cancer diagnosis model is constructed by the following method:
calculating the path activity of a sample in a database;
constructing a cancer diagnostic model using the pathway activity data:
for each cancer, randomly dividing the sample dataset into a training set and a test set; creating a double hidden layer back propagation neural network by using data in a training set, training the built network by using an LM algorithm to optimize the prediction capacity, and finally testing the performance of the model in a testing set;
verifying and optimizing the model by using the independent cancer data set to obtain a cancer diagnosis model;
the prognosis prediction model is constructed by adopting the following method:
for each cancer, performing a survival analysis on each pathway using the pathway activity data in combination with clinical prognosis data for the sample; screening for pathways that significantly affect patient survival;
for each cancer, a single factor COX regression model was separately constructed for pathways that significantly affected patient survival time;
screening out significant channels in single-factor COX regression, and further screening out representative channels by using Lasso regression;
for each cancer, a multi-factor COX regression model was constructed from the resulting representative pathways to yield a prognostic prediction model.
4. The electronic device of claim 3, wherein the electronic device comprises a plurality of electronic devices,
the specific steps for calculating the pathway activity according to the gene sequencing value include:
collecting and sorting information of all biological pathways from the KEGG database, and respectively extracting gene lists participating in the pathways; calculating the average value of genes in a pathway according to the gene expression level to obtain the activity level of the pathway; the activity level of all KEGG pathways was calculated in bulk to give pathway activity.
5. A computer readable storage medium storing a computer program which when executed by a processor performs the steps of a method for cancer diagnosis and prognosis prediction based on individual pathway activity:
a method for diagnosis and prognosis of cancer based on individual pathway activity comprising the steps of:
acquiring transcriptome sequencing data of an individual to be tested;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting a standardized gene expression value into a gene sequencing value;
calculating the pathway activity according to the gene sequencing value;
respectively carrying out cancer diagnosis and prognosis prediction by using a cancer diagnosis model and a cancer prognosis prediction model according to the channel activity data;
carrying out standardization treatment on transcriptome sequencing data of an individual to be tested, and converting the standardized gene expression value into a gene sequencing value specifically comprises the following steps:
normalizing transcriptome sequencing data of an individual to be tested; sequencing the expression values of the standardized genes from small to large, and taking the sequence as the expression level of the genes; assigning the gene rank in the specific rank to the same value according to the rank of the ranking value, so as to obtain the final expression level of each gene;
the cancer diagnosis model is constructed by the following method:
calculating the path activity of a sample in a database;
constructing a cancer diagnostic model using the pathway activity data:
for each cancer, randomly dividing the sample dataset into a training set and a test set; creating a double hidden layer back propagation neural network by using data in a training set, training the built network by using an LM algorithm to optimize the prediction capacity, and finally testing the performance of the model in a testing set;
verifying and optimizing the model by using the independent cancer data set to obtain a cancer diagnosis model;
the prognosis prediction model is constructed by adopting the following method:
for each cancer, performing a survival analysis on each pathway using the pathway activity data in combination with clinical prognosis data for the sample; screening for pathways that significantly affect patient survival;
for each cancer, a single factor COX regression model was separately constructed for pathways that significantly affected patient survival time;
screening out significant channels in single-factor COX regression, and further screening out representative channels by using Lasso regression;
for each cancer, a multi-factor COX regression model was constructed from the resulting representative pathways to yield a prognostic prediction model.
CN202110535516.6A 2021-05-17 2021-05-17 Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity Active CN113284611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110535516.6A CN113284611B (en) 2021-05-17 2021-05-17 Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110535516.6A CN113284611B (en) 2021-05-17 2021-05-17 Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity

Publications (2)

Publication Number Publication Date
CN113284611A CN113284611A (en) 2021-08-20
CN113284611B true CN113284611B (en) 2023-06-06

Family

ID=77279463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110535516.6A Active CN113284611B (en) 2021-05-17 2021-05-17 Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity

Country Status (1)

Country Link
CN (1) CN113284611B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341366A (en) * 2017-07-19 2017-11-10 西安交通大学 A kind of method that complex disease susceptibility loci is predicted using machine learning
CN109599157A (en) * 2018-11-29 2019-04-09 同济大学 A kind of accurate intelligent diagnosis and treatment big data system
CN110706749A (en) * 2019-09-10 2020-01-17 至本医疗科技(上海)有限公司 Cancer type prediction system and method based on tissue and organ differentiation hierarchical relation
WO2020232548A1 (en) * 2019-05-21 2020-11-26 Ontario Institute For Cancer Research (Oicr) Pan-cancer transcriptional signature
CN112725454A (en) * 2021-02-03 2021-04-30 山东第一医科大学附属省立医院(山东省立医院) Bladder cancer patient overall survival rate prognosis model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3044709A1 (en) * 2016-11-25 2018-05-31 Koninklijke Philips N.V. Method to distinguish tumor suppressive foxo activity from oxidative stress

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341366A (en) * 2017-07-19 2017-11-10 西安交通大学 A kind of method that complex disease susceptibility loci is predicted using machine learning
CN109599157A (en) * 2018-11-29 2019-04-09 同济大学 A kind of accurate intelligent diagnosis and treatment big data system
WO2020232548A1 (en) * 2019-05-21 2020-11-26 Ontario Institute For Cancer Research (Oicr) Pan-cancer transcriptional signature
CN110706749A (en) * 2019-09-10 2020-01-17 至本医疗科技(上海)有限公司 Cancer type prediction system and method based on tissue and organ differentiation hierarchical relation
CN112725454A (en) * 2021-02-03 2021-04-30 山东第一医科大学附属省立医院(山东省立医院) Bladder cancer patient overall survival rate prognosis model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"泛癌症图谱"的解读及应用;王思琪;唐明;王梁华;焦炳华;孙铭娟;;生命的化学(第05期);全文 *
Pujan Joshi ; Brent Basso ; Honglin Wang ; Seung-Hyun Hong ; Charles Giardina ; Dong-Guk Shin.Identification of Key Biological Pathway Routes in Cancer Cohorts.IEEE.2021,全文. *
基于多维基因组学的卵巢癌亚型分析;孟令豪;章琳;厉力华;;杭州电子科技大学学报(自然科学版)(第04期);全文 *
基于肿瘤基质评分的胃癌预后基因分析;罗安;朱欣彦;胡晔东;刘雁冰;冉晨曦;刘菲;;同济大学学报(医学版)(第04期);全文 *

Also Published As

Publication number Publication date
CN113284611A (en) 2021-08-20

Similar Documents

Publication Publication Date Title
Toh et al. Applications of machine learning in healthcare
CN114783524B (en) Path abnormity detection system based on self-adaptive resampling depth encoder network
US20210183524A1 (en) Method and system for providing interpretation information on pathomics data
CN107924430A (en) The multilevel hierarchy framework of biological data patterns identification
Sekaran et al. Predicting autism spectrum disorder from associative genetic markers of phenotypic groups using machine learning
CN111243662A (en) Pan-cancer gene pathway prediction method, system and storage medium based on improved XGboost
KR20210068713A (en) System for predicting disease progression using multiple medical data based on deep learning
Nuhić et al. Comparative study on different classification techniques for ovarian cancer detection
CN114373547A (en) Method and system for predicting disease risk
CA3154621A1 (en) Single cell rna-seq data processing
CN113284611B (en) Cancer diagnosis and prognosis prediction system, apparatus and storage medium based on individual pathway activity
TWI709904B (en) Methods for training an artificial neural network to predict whether a subject will exhibit a characteristic gene expression and systems for executing the same
Ono et al. Introduction to supervised machine learning in clinical epidemiology
Li et al. Multiclass nonnegative matrix factorization for comprehensive feature pattern discovery
CN116631572B (en) Acute myocardial infarction clinical decision support system and device based on artificial intelligence
Anand et al. Building an intelligent integrated method of gene selection for facioscapulohumeral muscular dystrophy diagnosis
Khan et al. Genetic Algorithm for Biomarker Search Problem and Class Prediction
Gagula-Palalic et al. Denver groups classification of human chromosomes using cann teams
Roth Cardoso Enabling cardiovascular multimodal, high dimensional, integrative analytics
Zhou Integrating web data miningand machine learningalgorithms to predict progression free survival and overall survival in multiple myeloma patients
Yu Deep Generative Models for Single-Cell Perturbation Experiments
Vasanthakumar et al. A HYBRID ENSEMBLE METHOD FOR ACCURATE FUZZY AND SUPPORT VECTOR MACHINE FOR GENE EXPRESSION IN DATA MINING.
Mostafa Gene expression analysis using machine learning
Fouquier et al. EXPLANA: A user-friendly workflow for EXPLoratory ANAlysis and feature selection in cross-sectional and longitudinal microbiome studies
Mohammed Statistical and deep learning methods for cancer genomic data. Izindlela zokufunda ezijulile zezibalomidanti zemininingo yeqoqozinhlayiyafuzo lomdlavuza.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant