CN117438102A - Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning - Google Patents
Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning Download PDFInfo
- Publication number
- CN117438102A CN117438102A CN202311560265.2A CN202311560265A CN117438102A CN 117438102 A CN117438102 A CN 117438102A CN 202311560265 A CN202311560265 A CN 202311560265A CN 117438102 A CN117438102 A CN 117438102A
- Authority
- CN
- China
- Prior art keywords
- relearning
- characterization
- cell line
- cell
- drug
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 239000002246 antineoplastic agent Substances 0.000 title claims abstract description 12
- 229940041181 antineoplastic drug Drugs 0.000 title claims abstract description 12
- 210000004027 cell Anatomy 0.000 claims abstract description 79
- 238000012512 characterization method Methods 0.000 claims abstract description 67
- 210000004881 tumor cell Anatomy 0.000 claims abstract description 62
- 238000013145 classification model Methods 0.000 claims abstract description 38
- 229940079593 drug Drugs 0.000 claims abstract description 34
- 239000003814 drug Substances 0.000 claims abstract description 34
- 230000035945 sensitivity Effects 0.000 claims abstract description 34
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 31
- 230000000857 drug effect Effects 0.000 claims abstract description 24
- 230000014509 gene expression Effects 0.000 claims abstract description 24
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 15
- 238000010276 construction Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 36
- 230000004927 fusion Effects 0.000 claims description 34
- 239000010410 layer Substances 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 12
- 230000001105 regulatory effect Effects 0.000 claims description 10
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 239000011229 interlayer Substances 0.000 claims description 4
- 102000053602 DNA Human genes 0.000 claims description 3
- 108020004414 DNA Proteins 0.000 claims description 3
- 206010059866 Drug resistance Diseases 0.000 claims description 3
- 210000004460 N cell Anatomy 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 230000001276 controlling effect Effects 0.000 claims 1
- 230000007547 defect Effects 0.000 abstract description 5
- 238000001228 spectrum Methods 0.000 abstract description 2
- 206010028980 Neoplasm Diseases 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000004797 therapeutic response Effects 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Epidemiology (AREA)
- Chemical & Material Sciences (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Toxicology (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention relates to an antitumor drug efficacy prediction method based on knowledge graph embedding representation relearning, which comprises the following steps: preparing original data; obtaining cell line embedded characterization Embed 1 The method comprises the steps of carrying out a first treatment on the surface of the Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed; obtaining a DNN classification model after trainingThe method comprises the steps of carrying out a first treatment on the surface of the And relearning the tumor cells, inputting the representation Embled into a DNN classification model after training, and predicting the relation between the tumor cells to be detected and the drug sensitivity. According to the invention, the convolutional neural network model is constructed by embedding and characterizing the original gene expression spectrum and the cell line, so that the new sample is directly represented by using the trained model, and the defect that the model needs to be retrained when the new sample is added in the existing method is overcome; integrates the original expression profile, drug effect label pair and gene regulation network information of the tumor cell line, and improves the sensitivity prediction performance of the tumor cell drug.
Description
Technical Field
The invention relates to the technical field of tumor cell drug sensitivity detection and evaluation, in particular to an anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning.
Background
Because of tumor heterogeneity and genetic diversity, individual patients with the same cancer will receive different therapeutic responses to even the same drug. Blind administration causes serious toxic side effects and even excessive treatment. The method based on the network representation learning has been proved to effectively extract the gene regulation characteristics of the sample, and has better tumor cell drug effect sensitivity prediction capability.
However, in the existing network-based expression learning method, samples are required to be fused to a priori gene regulation network in the process of extracting gene regulation characteristics, so that the fused network embedded expression is learned. The method for constructing the fusion network for all samples omits the defect that the new samples are added and the fusion network representation learning model needs to be reconstructed, thereby bringing inconvenience to field application and being unfavorable for improving the prediction capability.
Disclosure of Invention
In order to solve the defect that a fusion network needs to be reconstructed and a learning model represented by the fusion network is retrained when a new sample is added, the invention aims to provide the anti-tumor drug efficacy prediction method for relearning the expression based on knowledge graph embedding by relearning the regulation and control characteristics of the fusion genes, which not only solves the high-dimension of high-throughput data of genes, but also improves the drug sensitivity prediction performance of tumor cells.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the method for predicting the drug effect of the antitumor drug based on knowledge graph embedding representation relearning comprises the following steps in sequence:
(1) Preparing original data: the original data comprise N cell line original gene expression profiles, drug effect tag pairs and a gene regulation network;
(2) Obtaining cell line embedded characterization Embed 1 : the cell and the gene regulation network are fused to obtain a cell-gene fusion regulation network map, the cell-gene fusion regulation network map is input into a knowledge map embedding model for learning, and the embedded representation of the cell line is obtained 1 ;
(3) Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed;
(4) Constructing a DNN (deoxyribonucleic acid) classification model, and training the DNN classification model through relearning and characterizing Embed of tumor cells to obtain a trained DNN classification model;
(5) And relearning the tumor cells, inputting the representation Embled into a DNN classification model after training, and predicting the relation between the tumor cells to be detected and the drug sensitivity.
The step (2) specifically comprises the following steps:
(2a) Constructing a cell-gene fusion regulation network map: fusing all tumor cell nodes with a gene regulation network, fitting probability density distribution of tumor cell sample gene expression, and setting the probability density distribution at a quantile Z 1-α The other genes are used as hot spot genes of the cells, and the hot spot genes are linked with tumor cell nodes to obtain a cell-gene fusion regulation network map;
(2b) Inputting a cell-gene fusion regulation network map into a knowledge map embedding model, and calculating gene fusion expression characteristic expression of all tumor cell samples, wherein the method specifically comprises the following steps of:
(2b1) Extracting positive triplets in a cell-gene fusion regulation network map;
(2b2) And (3) carrying out negative triplet sampling to obtain a negative triplet set, and calculating the importance of the negative triplet by using the following formula:
where α is a constant representing the sampling rate, (h' j ,r,o′ j ) Represents the j-th negative triplet sample, h 'represents the negative triplet sample head vector representation, o' represents the negative triplet sample tail vector representation, r represents the negative triplet sample relationship vector representation, P j = |h 'o r-o' |is a scoring function of the sample, o represents the hadamard product;
(2b3) The resulting positive and negative triples are scored to calculate the total Loss:
wherein g (h' i ,r,o′ i ) Is the weight of a negative triplet sample i, M is the number of negative triplet samples, sigma represents the Sigmoid activation function, gamma represents a constant, and p (h, r, o) is the scoring function of the positive triplet;
(2b4) Updating regulatory fusion characteristic representations of all nodes and edges of the cell-gene fusion regulatory network map by using an Adam optimization algorithm;
(2b5) Repeating the steps (2 b 2) to (2 b 4) until the loss function shown in the step (2 b 3) converges, and taking the regulatory fusion characteristic representation of the cell line node as the embedded representation of the cell line 1 。
The step (3) specifically comprises the following steps:
(3a) Constructing a cell line embedded characterization relearning training set, wherein the cell line embedded characterization relearning training set is characterized by original gene expression profile of the cell line and embedded characterization Ebed of the cell line 1 Composition;
(3b) Constructing a cell line embedded characterization relearning depth network model, namely a one-dimensional convolutional neural network, wherein the one-dimensional convolutional neural network is provided with a plurality of convolutional layers, the cell line embedded characterization is processed through convolution, activation, batch standardization and pooling operations according to different convolution kernel sizes of facilities, a full connection layer is used as the output of the whole convolutional network after convolution, and the rejection rate Dropout is set to be 0.5;
(3c) The cell line embedding characterization relearning training set is input into a constructed cell line embedding characterization relearning depth network model, and the tumor cell relearning characterization embedded is obtained through the set one-dimensional convolutional neural network 2 Relearning tumor cells to characterize Ebed 2 Characterization of embedded with cell lines 1 Comparing the mean square error, and taking the mean square error as a loss function of the one-dimensional convolutional neural network;
wherein N is the number of cell lines, Y i Characterization of Ebed for tumor cell relearning 2 Is used as a reference to the value of (a),characterization of embedded for cell lines 1 Is a value of (b);
(3d) Updating tumor cell relearning characterization of embedded using Adam optimization algorithm 2 ;
(3e) Repeating the steps (3 c) to (3 d) until the loss function in the step (3 c) converges, and obtaining the tumor cell relearning characterization Embed.
The step (4) specifically comprises the following steps:
(4a) Constructing a drug effect prediction training set, wherein the drug effect prediction training set consists of tumor cell relearning characterization Embed and drug effect label pairs;
(4b) Constructing a DNN classification model:
(4b1) Inputting the drug effect prediction training set into a constructed DNN classification model to obtain the probability of sensitivity of a cell line to drugs, and judging whether the output is sensitive or drug resistant to obtain a sensitive relation; setting a plurality of hidden layers according to the DNN classification model, setting the number of inter-layer units according to the dimension of the cell line embedding characterization, wherein the hidden layers L of the DNN classification model are more than or equal to 3, a ReLU activation function is used between layers, the number of neurons of the output layer units is 1, and the activation function is set as Sigmoid to be used as a classification task; the Sigmoid function outputs event probability, the output is set between 0 and 1, when the result is larger than a certain threshold value, the threshold value is 0.5, and the positive class is divided, namely the sensitivity is divided;
(4b2) Calculating binary cross entropy loss according to the sensitivity relationship obtained in the step (4 b 1) and the real sensitivity relationship of the drug effect label to serve as a loss function of the DNN binary classification model;
wherein N is the number of cell lines, y i For binary tag values 0 or 1, p (y i ) Is of y i Probability of tag value;
(4b3) Optimizing the sensitivity relationship of DNN two-class model output by using an Adam algorithm;
(4b4) Repeating the steps (4 b 1) to (4 b 3) until the loss function of the step (4 b 2) converges to obtain a trained DNN classification model.
The step (5) specifically refers to: using a DNN classification model after training, adopting tumor cell relearning to represent the relationship between Embed prediction tumor cells to be detected and drug sensitivity, wherein the sensitivity is 1, and the drug resistance is 0:
wherein f represents a trained DNN classification model, z i Representing the probability that the ith tumor cell to be predicted in the Ebed is sensitive to drug response through a Sigmoid function output; outputting 1 if the probability of outputting the drug response as sensitive is greater than 0.5, indicating sensitivity to the drug; if the probability of outputting a drug response as sensitive is less than 0.5, then 0 is output, indicating resistance to the drug.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, constructing a convolutional neural network model by embedding and characterizing an original gene expression spectrum and a cell line, realizing that a new sample is directly represented by using a trained model, and solving the defect that the model needs to be retrained when the new sample is added in the existing method; secondly, the original expression profile, the drug effect label pair and the gene regulation network information of the tumor cell line are integrated through embedding the characterization and relearning of the cell line, so that the drug sensitivity prediction performance of the tumor cell is improved; thirdly, a deep learning coding technology is introduced to solve the problem of high-throughput data high-dimensionality of genes.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a predictive flow chart of a trained DNN classification model.
Detailed Description
As shown in fig. 1, a method for predicting the efficacy of an antitumor drug based on knowledge-graph embedding and relearning comprises the following steps in sequence:
(1) Preparing original data: the original data comprise N cell line original gene expression profiles, drug effect tag pairs and a gene regulation network;
(2) Obtaining cell line embedded characterization Embed 1 : the cell and the gene regulation network are fused to obtain a cell-gene fusion regulation network map, the cell-gene fusion regulation network map is input into a knowledge map embedding model for learning, and the embedded representation of the cell line is obtained 1 ;
(3) Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed;
(4) Constructing a DNN (deoxyribonucleic acid) classification model, and training the DNN classification model through relearning and characterizing Embed of tumor cells to obtain a trained DNN classification model;
(5) Tumor cells were relearned and characterized by input of an embedded model of DNN after training, and the relationship between the tumor cells to be tested and drug sensitivity was predicted, as shown in FIG. 2.
The step (2) specifically comprises the following steps:
(2a) Constructing a cell-gene fusion regulation network map: all tumors were treatedThe cell nodes are fused with a gene regulation network, and the probability density distribution of the gene expression of a tumor cell sample is fitted, so that the probability density distribution falls on a quantile Z 1-α The other genes are used as hot spot genes of the cells, and the hot spot genes are linked with tumor cell nodes to obtain a cell-gene fusion regulation network map;
(2b) Inputting a cell-gene fusion regulation network map into a knowledge map embedding model, and calculating gene fusion expression characteristic expression of all tumor cell samples, wherein the method specifically comprises the following steps of:
(2b1) Extracting positive triplets in a cell-gene fusion regulation network map;
(2b2) And (3) carrying out negative triplet sampling to obtain a negative triplet set, and calculating the importance of the negative triplet by using the following formula:
where α is a constant representing the sampling rate, (h' j ,r,o′ j ) Represents the j-th negative triplet sample, h 'represents the negative triplet sample head vector representation, o' represents the negative triplet sample tail vector representation, r represents the negative triplet sample relationship vector representation, P j = |h 'o r-o' |is a scoring function of the sample, o represents the hadamard product;
(2b3) The resulting positive and negative triples are scored to calculate the total Loss:
wherein g (h' i ,r,o′ i ) Is the weight of a negative triplet sample i, M is the number of negative triplet samples, sigma represents the Sigmoid activation function, gamma represents a constant, and p (h, r, o) is the scoring function of the positive triplet;
(2b4) Updating regulatory fusion characteristic representations of all nodes and edges of the cell-gene fusion regulatory network map by using an Adam optimization algorithm;
(2b5) Repeating the steps (2 b 2) to (2 b 4) until the loss function shown in the step (2 b 3) converges, and taking the regulatory fusion characteristic representation of the cell line node as the embedded representation of the cell line 1 。
The step (3) specifically comprises the following steps:
(3a) Constructing a cell line embedded characterization relearning training set, wherein the cell line embedded characterization relearning training set is characterized by original gene expression profile of the cell line and embedded characterization Ebed of the cell line 1 Composition;
(3b) And constructing a cell line embedded characterization relearning deep network model, namely a one-dimensional convolutional neural network. The one-dimensional convolutional neural network can be provided with a plurality of convolutional layers, and the cell line embedded characterization is processed through operations such as convolution, activation, batch standardization, pooling and the like by different convolutional kernel sizes of facilities. The one-dimensional convolutional network is set up into three convolutional layers: convolution width K 1 At a convolution step S of 7 1 1, maximum pooling width K 2 At 3, pooling step S 2 3, the only difference between layers is channel number C, which is 8, 16 and 32 respectively, after convolution, a full connection layer is used as the output of the whole convolution network, and the drop rate Dropout is set to be 0.5;
(3c) The cell line embedding characterization relearning training set is input into a constructed cell line embedding characterization relearning depth network model, and the tumor cell relearning characterization embedded is obtained through the set one-dimensional convolutional neural network 2 Relearning tumor cells to characterize Ebed 2 Characterization of embedded with cell lines 1 Comparing the mean square error, and taking the mean square error as a loss function of the one-dimensional convolutional neural network;
wherein N is the number of cell lines, Y i Characterization of Ebed for tumor cell relearning 2 Is used as a reference to the value of (a),is embedded into cell lineCharacterization of Embled 1 Is a value of (b);
(3d) Updating tumor cell relearning characterization of embedded using Adam optimization algorithm 2 ;
(3e) Repeating the steps (3 c) to (3 d) until the loss function in the step (3 c) converges, and obtaining the tumor cell relearning characterization Embed.
The step (4) specifically comprises the following steps:
(4a) Constructing a drug effect prediction training set, wherein the drug effect prediction training set consists of tumor cell relearning characterization Embed and drug effect label pairs;
(4b) Constructing a DNN classification model:
(4b1) Inputting the drug effect prediction training set into a constructed DNN classification model to obtain the probability of sensitivity of a cell line to drugs, and judging whether the output is sensitive or drug resistant to obtain a sensitive relation; the DNN classification model can be provided with a plurality of hidden layers, the number of interlayer units is reasonably set according to the dimension of the cell line embedding characterization, the hidden layer L of the DNN classification model is set to be 3, and the number of interlayer units is a i The method comprises the following steps of: 200. 50, using ReLU or other activation functions between layers, wherein the number of the output layer unit neurons is 1, and the activation functions are set as Sigmoid to be used as classification tasks; the Sigmoid function outputs event probability, the output is set between 0 and 1, when the result is larger than a certain threshold value, the threshold value is 0.5, and the positive class is divided, namely the sensitivity is divided;
(4b2) Calculating binary cross entropy loss according to the sensitivity relationship obtained in the step (4 b 1) and the real sensitivity relationship of the drug effect label to serve as a loss function of the DNN binary classification model;
wherein N is the number of cell lines, y i For binary tag values 0 or 1, p (y i ) Is of y i Probability of tag value;
(4b3) Optimizing the sensitivity relationship of DNN two-class model output by using an Adam algorithm;
(4b4) Repeating the steps (4 b 1) to (4 b 3) until the loss function of the step (4 b 2) converges to obtain a trained DNN classification model.
The step (5) specifically refers to: using a DNN classification model after training, adopting tumor cell relearning to represent the relationship between Embed prediction tumor cells to be detected and drug sensitivity, wherein the sensitivity is 1, and the drug resistance is 0:
wherein f represents a trained DNN classification model, z i Representing the probability that the ith tumor cell to be predicted in the Ebed is sensitive to drug response through a Sigmoid function output; outputting 1 if the probability of outputting the drug response as sensitive is greater than 0.5, indicating sensitivity to the drug; if the probability of outputting a drug response as sensitive is less than 0.5, then 0 is output, indicating resistance to the drug.
In summary, the convolutional neural network model is constructed by embedding and characterizing the original gene expression profile and the cell line, so that the new sample is directly represented by using the trained model, and the defect that the model needs to be retrained when the new sample is added in the existing method is overcome; the original expression profile, the drug effect label pair and the gene regulation network information of the tumor cell line are integrated through embedding the characterization and relearning of the cell line, so that the drug sensitivity prediction performance of the tumor cell is improved; the deep learning coding technology is introduced to solve the difficult problem of high-throughput data and high-dimension of genes.
Claims (5)
1. The method for predicting the drug effect of the antitumor drug based on knowledge graph embedding representation relearning is characterized by comprising the following steps of: the method comprises the following steps in sequence:
(1) Preparing original data: the original data comprise N cell line original gene expression profiles, drug effect tag pairs and a gene regulation network;
(2) Obtaining cell line embedded characterization Embed 1 : i.e. fusing the cell with the gene regulation network to obtain a cell-gene fusion regulation network map, regulating and controlling the cell-gene fusionInputting the network map into a knowledge map embedding model for learning to obtain a cell line embedded characterization embedded 1 ;
(3) Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed;
(4) Constructing a DNN (deoxyribonucleic acid) classification model, and training the DNN classification model through relearning and characterizing Embed of tumor cells to obtain a trained DNN classification model;
(5) And relearning the tumor cells, inputting the representation Embled into a DNN classification model after training, and predicting the relation between the tumor cells to be detected and the drug sensitivity.
2. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (2) specifically comprises the following steps:
(2a) Constructing a cell-gene fusion regulation network map: fusing all tumor cell nodes with a gene regulation network, fitting probability density distribution of tumor cell sample gene expression, and setting the probability density distribution at a quantile Z 1-α The other genes are used as hot spot genes of the cells, and the hot spot genes are linked with tumor cell nodes to obtain a cell-gene fusion regulation network map;
(2b) Inputting a cell-gene fusion regulation network map into a knowledge map embedding model, and calculating gene fusion expression characteristic expression of all tumor cell samples, wherein the method specifically comprises the following steps of:
(2b1) Extracting positive triplets in a cell-gene fusion regulation network map;
(2b2) And (3) carrying out negative triplet sampling to obtain a negative triplet set, and calculating the importance of the negative triplet by using the following formula:
wherein α is a constant, a substitutionTable sample rate, (h) j ′,r,o j ' represents the j-th negative triplet sample, h ' represents the negative triplet sample head vector representation, o ' represents the negative triplet sample tail vector representation, r represents the negative triplet sample relationship vector representation, P j = |h 'or-o' || is a scoring function of the sample, and o represents the hadamard product;
(2b3) The resulting positive and negative triples are scored to calculate the total Loss:
wherein g (h' i ,r,o′ i ) Is the weight of a negative triplet sample i, M is the number of negative triplet samples, sigma represents the Sigmoid activation function, gamma represents a constant, and p (h, r, o) is the scoring function of the positive triplet;
(2b4) Updating regulatory fusion characteristic representations of all nodes and edges of the cell-gene fusion regulatory network map by using an Adam optimization algorithm;
(2b5) Repeating the steps (2 b 2) to (2 b 4) until the loss function shown in the step (2 b 3) converges, and taking the regulatory fusion characteristic representation of the cell line node as the embedded representation of the cell line 1 。
3. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (3) specifically comprises the following steps:
(3a) Constructing a cell line embedded characterization relearning training set, wherein the cell line embedded characterization relearning training set is characterized by original gene expression profile of the cell line and embedded characterization Ebed of the cell line 1 Composition;
(3b) Constructing a cell line embedded characterization relearning depth network model, namely a one-dimensional convolutional neural network, wherein the one-dimensional convolutional neural network is provided with a plurality of convolutional layers, the cell line embedded characterization is processed through convolution, activation, batch standardization and pooling operations according to different convolution kernel sizes of facilities, a full connection layer is used as the output of the whole convolutional network after convolution, and the rejection rate Dropout is set to be 0.5;
(3c) The cell line embedding characterization relearning training set is input into a constructed cell line embedding characterization relearning depth network model, and the tumor cell relearning characterization embedded is obtained through the set one-dimensional convolutional neural network 2 Relearning tumor cells to characterize Ebed 2 Characterization of embedded with cell lines 1 Comparing the mean square error, and taking the mean square error as a loss function of the one-dimensional convolutional neural network;
wherein N is the number of cell lines, Y i Characterization of Ebed for tumor cell relearning 2 Is used as a reference to the value of (a),characterization of embedded for cell lines 1 Is a value of (b);
(3d) Updating tumor cell relearning characterization of embedded using Adam optimization algorithm 2 ;
(3e) Repeating the steps (3 c) to (3 d) until the loss function in the step (3 c) converges, and obtaining the tumor cell relearning characterization Embed.
4. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (4) specifically comprises the following steps:
(4a) Constructing a drug effect prediction training set, wherein the drug effect prediction training set consists of tumor cell relearning characterization Embed and drug effect label pairs;
(4b) Constructing a DNN classification model:
(4b1) Inputting the drug effect prediction training set into a constructed DNN classification model to obtain the probability of sensitivity of a cell line to drugs, and judging whether the output is sensitive or drug resistant to obtain a sensitive relation; setting a plurality of hidden layers according to the DNN classification model, setting the number of inter-layer units according to the dimension of the cell line embedding characterization, wherein the hidden layers L of the DNN classification model are more than or equal to 3, a ReLU activation function is used between layers, the number of neurons of the output layer units is 1, and the activation function is set as Sigmoid to be used as a classification task; the Sigmoid function outputs event probability, the output is set between 0 and 1, when the result is larger than a certain threshold value, the threshold value is 0.5, and the positive class is divided, namely the sensitivity is divided;
(4b2) Calculating binary cross entropy loss according to the sensitivity relationship obtained in the step (4 b 1) and the real sensitivity relationship of the drug effect label to serve as a loss function of the DNN binary classification model;
wherein N is the number of cell lines, y i For binary tag values 0 or 1, p (y i ) Is of y i Probability of tag value;
(4b3) Optimizing the sensitivity relationship of DNN two-class model output by using an Adam algorithm;
(4b4) Repeating the steps (4 b 1) to (4 b 3) until the loss function of the step (4 b 2) converges to obtain a trained DNN classification model.
5. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (5) specifically refers to: using a DNN classification model after training, adopting tumor cell relearning to represent the relationship between Embed prediction tumor cells to be detected and drug sensitivity, wherein the sensitivity is 1, and the drug resistance is 0:
wherein f represents a trained DNN classification model, z i Representing the probability that the ith tumor cell to be predicted in the Ebed is sensitive to drug response through a Sigmoid function output; if the probability of outputting the drug response as sensitive is greater than 0.5, then output1, representing sensitivity to the drug; if the probability of outputting a drug response as sensitive is less than 0.5, then 0 is output, indicating resistance to the drug.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311560265.2A CN117438102A (en) | 2023-11-22 | 2023-11-22 | Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311560265.2A CN117438102A (en) | 2023-11-22 | 2023-11-22 | Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117438102A true CN117438102A (en) | 2024-01-23 |
Family
ID=89549793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311560265.2A Pending CN117438102A (en) | 2023-11-22 | 2023-11-22 | Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117438102A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117612747A (en) * | 2024-01-24 | 2024-02-27 | 杭州广科安德生物科技有限公司 | Drug sensitivity prediction method and device for klebsiella pneumoniae |
-
2023
- 2023-11-22 CN CN202311560265.2A patent/CN117438102A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117612747A (en) * | 2024-01-24 | 2024-02-27 | 杭州广科安德生物科技有限公司 | Drug sensitivity prediction method and device for klebsiella pneumoniae |
CN117612747B (en) * | 2024-01-24 | 2024-05-03 | 杭州广科安德生物科技有限公司 | Drug sensitivity prediction method and device for klebsiella pneumoniae |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117438102A (en) | Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning | |
CN105488528B (en) | Neural network image classification method based on improving expert inquiry method | |
CN111563431A (en) | Plant leaf disease and insect pest identification method based on improved convolutional neural network | |
CN110473592B (en) | Multi-view human synthetic lethal gene prediction method | |
Maulik et al. | Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data | |
CN107992945B (en) | Characteristic gene selection method based on deep learning and evolutionary computation | |
CN116194995A (en) | Method for identifying chromosomal dimensional instability such as homologous repair defects in next generation sequencing data of low coverage | |
CN111222638B (en) | Neural network-based network anomaly detection method and device | |
CN110993113B (en) | LncRNA-disease relation prediction method and system based on MF-SDAE | |
CN115985503B (en) | Cancer prediction system based on ensemble learning | |
CN108520201A (en) | Robust face recognition method based on weighted mixed norm regression | |
Zhu et al. | Deep-gknock: nonlinear group-feature selection with deep neural networks | |
CN116469561A (en) | Breast cancer survival prediction method based on deep learning | |
CN113870951A (en) | Prediction system for predicting head and neck squamous cell carcinoma immune subtype | |
CN113450562B (en) | Road network traffic state discrimination method based on clustering and graph convolution network | |
CN113177587B (en) | Generalized zero sample target classification method based on active learning and variational self-encoder | |
CN112819087B (en) | Method for detecting abnormality of BOD sensor of outlet water based on modularized neural network | |
CN114462670A (en) | LSTM model-based power consumption prediction method | |
CN109858245A (en) | A kind of intrusion detection method based on improvement depth confidence network | |
US20220076782A1 (en) | Community Assignments in Identity by Descent Networks and Genetic Variant Origination | |
CN110459266B (en) | Method for establishing SNP (Single nucleotide polymorphism) pathogenic factor and disease association relation model | |
CN116913390B (en) | Gene regulation network prediction method based on multi-view attention network | |
CN116913390A (en) | Gene regulation network prediction method based on multi-view attention network | |
CN113516180B (en) | Method for identifying Z-Wave intelligent equipment | |
CN112832744A (en) | Pumping unit well pump detection period prediction method based on LSTM neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |