CN112599207A - Cancer drug sensitivity prediction method based on pathway activity and elastic net - Google Patents

Cancer drug sensitivity prediction method based on pathway activity and elastic net Download PDF

Info

Publication number
CN112599207A
CN112599207A CN202011539984.2A CN202011539984A CN112599207A CN 112599207 A CN112599207 A CN 112599207A CN 202011539984 A CN202011539984 A CN 202011539984A CN 112599207 A CN112599207 A CN 112599207A
Authority
CN
China
Prior art keywords
drug
matrix
pathway
pathway activity
cell line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011539984.2A
Other languages
Chinese (zh)
Inventor
秦玉芳
高冲
陈明
宋春晖
孙浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ocean University
Original Assignee
Shanghai Ocean University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ocean University filed Critical Shanghai Ocean University
Priority to CN202011539984.2A priority Critical patent/CN112599207A/en
Publication of CN112599207A publication Critical patent/CN112599207A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides a cancer drug sensitivity prediction method based on pathway activity and elastic network, firstly, selecting key gene expression values which are closely connected with other genes in each pathway network in a gene expression matrix, calculating and analyzing to obtain each pathway activity vector, and then combining all pathway activity vectors into a matrix; combining the molecular description of each drug as a chemical characteristic matrix and a pathway activity characteristic matrix of the drug into a new characteristic matrix; finally, training the feature matrix of the 24 medicines by using an elastic net and optimizing parameters, and using a mean square error as an evaluation standard; the invention can effectively predict the sensitivity level of each drug and the biomarker thereof, has higher precision on most drugs, and can help clinical experiments to reduce time and money cost, thereby effectively predicting drug response; the method of the invention can learn the potential relation of the anti-cancer drugs from multi-dimensional characteristics, and has better biological interpretation.

Description

Cancer drug sensitivity prediction method based on pathway activity and elastic net
Technical Field
The invention belongs to the technical field of bioinformatics and machine learning, and particularly relates to a cancer drug sensitivity prediction method based on pathway activity and elastic network regression, and more particularly relates to a prediction method for predicting cancer cell lines sensitive to 24 anti-cancer drugs.
Background
Determining whether a patient will respond to an anticancer drug typically takes months, goes through trial and error, and is accompanied by the possibility of misuse over time. The development of effective anti-cancer drug response prediction methods has been an important topic of oncology research.
At present, most methods predict drug sensitivity by establishing a prediction model of a single gene, and do not consider the interaction relation among genes in a pathway and key genes in the pathway, and the repeatability of the prediction model is low due to biodiversity and the biological relevance of new gene-drug association which is difficult to explain.
Disclosure of Invention
In view of the deficiencies of the prior art, it is an object of the present invention to provide a method for predicting cancer drug sensitivity based on pathway activity and elastic network. The model has high prediction precision, can effectively predict the drug sensitivity, and has better prediction effect and better biological explanation for the cancer drug sensitivity by the method for selecting the expression level of the key gene in the pathway to represent the activity level of the pathway.
In order to achieve the above purpose, the solution of the invention is as follows:
a method for predicting cancer drug sensitivity based on pathway activity and elastic network, comprising the steps of:
s1: acquiring gene expression characteristic data of a cancer cell line and chemical characteristic data of an anti-cancer drug;
s2: downloading and integrating from a KEGG pathway database to obtain an interaction network table among genes in each pathway (gene set), and selecting key genes which are tightly connected with gene expression characteristic data genes of the cancer cell line from the network table;
s3: calculating the channel activity vector of each channel, and combining all the channel activity vectors to obtain a channel activity characteristic matrix;
s4: integrating the chemical characteristic data of the anti-cancer drugs in the step S1 into a chemical characteristic matrix of the drugs, and connecting and merging the chemical characteristic matrix of the drugs and the pathway activity characteristic matrix in the step S3 to obtain a new matrix containing the characteristics of the cancer cell line and the chemical characteristics of the drugs;
s5: constructing an elastic network model for predicting the sensitivity of the anti-cancer drugs based on the steps S1-S4, inputting a new matrix containing the characteristics of the cancer cell line and the chemical characteristics of the drugs into the model as the characteristic tensor to train and predict, and verifying the reproducibility of the model.
As a preferred scheme, in step S1, the chemical characteristics of the anticancer drug and the gene expression characteristics of the cancer cell line are integrated from the internet public database, and a two-dimensional tensor of chemical characteristics and a two-dimensional tensor of gene expression characteristics of the drug are formed corresponding to the known sensitivity level of the anticancer drug. And downloading the channel gene interaction network data from the channel database.
As a preferable embodiment, in step S2, the two-dimensional tensor of gene expression characteristics and the two-dimensional tensor of chemical characteristics of the anticancer drug of the cancer cell line are normalized, then genes with higher degree of selection from the pathway gene interaction network are selected as key genes, and the mean/variance of the gene expression amounts of the genes is taken. Drug response is a synergistic behavior of multiple genes, and applying key genes of gene interaction networks in pathways to drug sensitivity will improve the prediction effect.
As a preferable scheme, in step S3, the mean/variance activity vectors of the expression levels of all key genes in the cancer cell line sample in each pathway of step S2 are combined into a pathway activity vector according to all the pathway data of step S1, when the pathway is characterized, and the sample is a cancer cell line. Wherein, each channel activity vector is the mean value/variance of the expression quantity of the key genes in the channel interaction network table in the gene expression matrix, and the characteristics in the channel activity matrix after all the channels are combined become the channels.
As a preferable scheme, in step S4, the pathway activity characteristic matrix and the pharmaceutical chemistry characteristic matrix are merged into a new characteristic matrix, i.e., a new matrix including the characteristics of the cancer cell line and the pharmaceutical chemistry characteristics, and are used as input values of the elastic network model, so as to perform regression prediction.
As a preferable scheme, in step S5, an elastic network model is constructed for the new matrix and the drug sensitivity level obtained in step S4, and the new matrix including the cancer cell line characteristics and the drug chemical characteristics is used as a model input value, the drug sensitivity level is used as a model output value, and the model is verified by adjusting and optimizing the adjustment parameters according to Mean Square Error (MSE).
Due to the adoption of the scheme, the invention has the beneficial effects that:
first, the method of the present invention for predicting cancer drug sensitivity using pathway activity and elastic network has higher accuracy in most drugs, and can help clinical experiments to reduce time and money costs, thereby effectively predicting drug response.
Secondly, the method for combining the chemical characteristics of the anti-cancer drug and the pathway activity characteristics of the cancer cell line, which is provided by the invention, can learn the potential relationship of the anti-cancer drug from multidimensional characteristics, and has better biological explanation.
Drawings
FIG. 1 is a flow chart of channel activity inference in an embodiment of the invention.
FIG. 2 is a graph of the predicted performance of 24 drugs on a validation set in an example of the present invention.
FIG. 3 is a graph of the pathway activity of drug PF2341066 between sensitive and resistant cell lines in an example of the invention.
FIG. 4 is a graph of the pathway activity of drug 17-AAG between sensitive and resistant cell lines in an example of the invention.
Detailed Description
The invention provides a cancer drug sensitivity prediction method based on pathway activity and elastic network.
The present invention will be further described with reference to the following examples.
Example (b):
the method for predicting cancer drug sensitivity based on pathway activity and elastic network of the present embodiment comprises:
1 materials and analysis
1.1 data Source
The data used for model training and prediction in the present invention is derived from a data set published on the internet. Specifically, the chemical characteristic data set of the anti-cancer drug is derived from the results calculated by the alvaDesc software, the gene expression data and the anti-cancer drug sensitivity value of the cancer cell line are derived from a database CCLE (cancer cell line encyclopedia https:// portals. branched. induced. org/cc /), and the pathway database is derived from a KEGG database (Kyoto gene and genome encyclopedia https:// www.genome.jp/KEGG /).
1.2 model method
FIG. 1 is a flow chart of the inferred pathway activity of the present invention. The invention firstly calculates the expression quantity of high-connectivity genes (key genes) in a channel gene interaction network to obtain channel activity vectors, then combines all the channel activity vectors to obtain a channel activity characteristic matrix, and then combines the channel activity characteristic matrix and a drug chemical characteristic matrix into a new matrix, thereby serving as an input value of an elastic network model and finally predicting drug sensitivity.
1.2.1 pathway Activity inference
Inferring pathway activity comprises the steps of:
the first step is as follows: gene expression data were normalized. And (3) normalization processing, namely subtracting the mean value of the dimensions of all the elements in each dimension of the data tensor, and dividing the mean value by the standard deviation of each dimension to obtain the new data tensor which is the normalized data.
The second step is that: the mean/variance of the gene expression levels with high connectivity in each pathway gene interaction network is taken as the pathway activity vector.
The third step: all pathway activity vectors are combined into a pathway activity signature matrix.
1.2.2 construction of models
Combining the channel activity characteristic matrix and the pharmaceutical chemistry characteristic matrix into a new characteristic matrix, namely, the new matrix containing the cancer cell line characteristics and the pharmaceutical chemistry characteristics, serving as an input value of an elastic net model, corresponding to the cancer drug sensitivity IC50 value, selecting the optimal parameters to predict the drug sensitivity level by using a grid search algorithm, and simultaneously ranking the model coefficients of the elastic net and selecting the channel characteristics which are ranked 10 percent of the first rank for analysis.
2 results and analysis
2.1 channel Activity inference
The channel activity inference considers the interaction relationship among the genes of the channel, selects the genes with the degree of top 10% in the channel, converts the gene characteristics in the gene expression matrix into the channel characteristics in the channel activity matrix by using the gene relationship in the channel, converts the gene characteristics of more than 2 ten thousand in the original gene expression matrix into 388 channel characteristics, and greatly reduces the dimension.
2.2 prediction of cancer drug sensitivity based on pathway Activity and elastic Net
An elastic network is adopted to construct a regression model, input data is a new matrix containing cancer cell line characteristics and drug chemical characteristics, output data is a drug sensitivity IC50 value, a prediction model of pathway activity to drug sensitivity is constructed, and a simulation experiment is carried out on the prediction model by using a glmnet package in R language.
The model is trained, tested and predicted on the test set, and the experimental result obtained on the verification set is shown in figure 2. The experimental results obtained on the test set are shown in table 1.
Table 1 predictive performance on test set of drug susceptibility prediction models based on pathway activity
Medicine MSE Medicine MSE Medicine MSE Medicine MSE
17-AAG 3.730 L-68545 3.464 Panobinostat 0.006 RAF265 8.241
AEW541 6.566 Lapatinib 7.265 PD-0325901 9.112 Sorafenib 2.753
AZD0530 7.287 LBW242 3.107 PD-0332991 3.601 TAE684 7.465
AZD6244 10.381 Nilotinib 2.437 PF2341066 4.488 TKI258 5.766
Erlotinib 5.765 Nutin-3 1.614 PHA-665752 1.814 Topotecan 4.256
lrinotecan 1.516 Paclitaxel 4.945 PLX4720 2.412 ZD-6474 6.595
2.2 biological interpretation of pathways and drug sensitivity
In fig. 3 for PF2341066 drug, the 30 cell lines with the lowest and highest IC50 were selected as sensitive and resistant groups, respectively, and the p-value of the t-test provided the difference in pathway activity. TGF-beta signaling pathway ranks high in the drug PF2341066, and it has been found in fact that activation of this pathway leads to resistance to the drug PF 2341066.
The difference in pathway activity of the drug 17-AAG between sensitive and resistant cell lines is also shown in FIG. 4, consistent with the fact.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. It will be readily apparent to those skilled in the art that various modifications to these embodiments and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Therefore, the present invention is not limited to the above-described embodiments. Those skilled in the art should appreciate that many modifications and variations are possible in light of the above teaching without departing from the scope of the invention.

Claims (5)

1. A method for predicting cancer drug sensitivity based on pathway activity and elastic network, characterized by: which comprises the following steps:
s1: acquiring gene expression characteristic data of a cancer cell line and chemical characteristic data of an anti-cancer drug;
s2: obtaining an interaction network table among genes in each channel by using a channel database, and selecting key genes which are tightly connected with gene expression characteristic data genes of the cancer cell line from the network table;
s3: calculating the channel activity vector of each channel, and combining all the channel activity vectors to obtain a channel activity characteristic matrix;
s4: integrating the chemical characteristic data of the anti-cancer drugs in the step S1 into a drug chemical characteristic matrix, and connecting and merging the drug chemical characteristic matrix and the pathway activity characteristic matrix in the step S3 to obtain a new matrix containing the characteristics of the cancer cell line and the drug chemical characteristics;
s5: constructing an elastic network model for predicting the sensitivity of the anti-cancer drugs based on the steps S1-S4, inputting a new matrix containing the characteristics of the cancer cell line and the chemical characteristics of the drugs into the model as the characteristic tensor to train and predict, and verifying the reproducibility of the model.
2. The method of claim 1, wherein the method comprises: in step S1, gene expression profile data of the cancer cell line and chemical profile data of the anticancer drug are collected from the internet, and the sensitivity level of the anticancer drug is learned from the multidimensional profile of the cancer cell line.
3. The method of claim 1, wherein the method comprises: in step S2, genes having a high degree of gene interaction network in the pathway are selected as key genes, and the mean/variance of the gene expression levels of these genes is taken.
4. The method of claim 1, wherein the method comprises: in step S3, pathway activity vectors are combined based on the mean/variance activity vectors of the gene expression levels to obtain a pathway activity feature matrix.
5. The method of claim 1, wherein the method comprises: in step S5, the elastic network model constructed in steps S2-S4 is integrated, a new matrix including cancer cell line characteristics and pharmaceutical chemistry characteristics is used as a model input value, a drug sensitivity level is used as a model output value, and parameters of the optimized model are adjusted according to a mean square error for training and verification.
CN202011539984.2A 2020-12-23 2020-12-23 Cancer drug sensitivity prediction method based on pathway activity and elastic net Pending CN112599207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011539984.2A CN112599207A (en) 2020-12-23 2020-12-23 Cancer drug sensitivity prediction method based on pathway activity and elastic net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011539984.2A CN112599207A (en) 2020-12-23 2020-12-23 Cancer drug sensitivity prediction method based on pathway activity and elastic net

Publications (1)

Publication Number Publication Date
CN112599207A true CN112599207A (en) 2021-04-02

Family

ID=75200439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011539984.2A Pending CN112599207A (en) 2020-12-23 2020-12-23 Cancer drug sensitivity prediction method based on pathway activity and elastic net

Country Status (1)

Country Link
CN (1) CN112599207A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115458188A (en) * 2022-11-11 2022-12-09 神州医疗科技股份有限公司 Mining method and system for drug high-efficiency response candidate marker

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609326A (en) * 2017-07-26 2018-01-19 同济大学 Drug sensitivity prediction method in the accurate medical treatment of cancer
CN108830040A (en) * 2018-06-07 2018-11-16 中南大学 A kind of drug sensitivity prediction method based on cell line and drug similitude network
CN110232978A (en) * 2019-06-14 2019-09-13 西安电子科技大学 Cancer cell system therapeutic agent prediction technique based on multidimensional network
CN110277174A (en) * 2019-06-14 2019-09-24 上海海洋大学 A kind of prediction technique of anticancer drug synergistic effect neural network based

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609326A (en) * 2017-07-26 2018-01-19 同济大学 Drug sensitivity prediction method in the accurate medical treatment of cancer
CN108830040A (en) * 2018-06-07 2018-11-16 中南大学 A kind of drug sensitivity prediction method based on cell line and drug similitude network
CN110232978A (en) * 2019-06-14 2019-09-13 西安电子科技大学 Cancer cell system therapeutic agent prediction technique based on multidimensional network
CN110277174A (en) * 2019-06-14 2019-09-24 上海海洋大学 A kind of prediction technique of anticancer drug synergistic effect neural network based

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEE E等: "Inferring Pathway Activity toward Precise Disease Classification", 《PLOS COMPUT》, vol. 4, no. 11, 30 November 2008 (2008-11-30), pages 1000217 *
MENDEN MP等: "Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties", 《PLOS ONE》, vol. 8, no. 4, 30 April 2013 (2013-04-30), pages 61318 *
陈希,秦玉芳,陈明,张重阳: "基于多输入神经网络的药物组合协同作用预测", 《生物医学工程学杂志》, vol. 37, no. 4, 31 August 2020 (2020-08-31) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115458188A (en) * 2022-11-11 2022-12-09 神州医疗科技股份有限公司 Mining method and system for drug high-efficiency response candidate marker

Similar Documents

Publication Publication Date Title
Zeng et al. Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models
Sánchez‐Ferro et al. New methods for the assessment of Parkinson's disease (2005 to 2015): a systematic review
De Leeuw et al. The statistical properties of gene-set analysis
Sherwin et al. Fundamentals of population pharmacokinetic modelling: validation methods
Li et al. A prognostic model of Alzheimer's disease relying on multiple longitudinal measures and time-to-event data
US20170147743A1 (en) Rapid identification of pharmacological targets and anti-targets for drug discovery and repurposing
Saffian et al. Warfarin dosing algorithms underpredict dose requirements in patients requiring≥ 7 mg daily: A systematic review and meta‐analysis
Guidi et al. Parametric approaches in population pharmacokinetics
Masoudi-Nejad et al. RETRACTED ARTICLE: Candidate gene prioritization
Li et al. Accommodating informative dropout and death: a joint modelling approach for longitudinal and semicompeting risks data
Zhang et al. Predicting tumor cell response to synergistic drug combinations using a novel simplified deep learning model
Alden et al. Using emulation to engineer and understand simulations of biological systems
Iturria-Medina et al. Integrating molecular, histopathological, neuroimaging and clinical neuroscience data with NeuroPM-box
Lin et al. Machine learning in neural networks
Wang et al. Efficient gene–environment interaction tests for large biobank‐scale sequencing studies
Pinol et al. Rare disease discovery: An optimized disease ranking system
Khalilimeybodi et al. Context-specific network modeling identifies new crosstalk in β-adrenergic cardiac hypertrophy
Iddi et al. Power and Sample Size for Longitudinal Models in R-The longpower Package and Shiny App.
CN112599207A (en) Cancer drug sensitivity prediction method based on pathway activity and elastic net
WO2014121257A1 (en) Prescription decision support system and method using comprehensive multiplex drug monitoring
KR101067352B1 (en) System and method comprising algorithm for mode-of-action of microarray experimental data, experiment/treatment condition-specific network generation and experiment/treatment condition relation interpretation using biological network analysis, and recording media having program therefor
Li et al. End-to-end interpretable disease–gene association prediction
US20240079142A1 (en) A system and method to predict health outcomes and optimize health interventions
JP2007505405A (en) Apparatus and method for identifying therapeutic targets using a computer model
Lip et al. Transforming Clinical Trials with Artificial Intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination