CN117438102A - Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning - Google Patents

Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning Download PDF

Info

Publication number
CN117438102A
CN117438102A CN202311560265.2A CN202311560265A CN117438102A CN 117438102 A CN117438102 A CN 117438102A CN 202311560265 A CN202311560265 A CN 202311560265A CN 117438102 A CN117438102 A CN 117438102A
Authority
CN
China
Prior art keywords
relearning
characterization
cell line
cell
drug
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311560265.2A
Other languages
Chinese (zh)
Inventor
谢新平
汪凤婷
王红强
姜晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Jianzhu University
Original Assignee
Anhui Jianzhu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Jianzhu University filed Critical Anhui Jianzhu University
Priority to CN202311560265.2A priority Critical patent/CN117438102A/en
Publication of CN117438102A publication Critical patent/CN117438102A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Epidemiology (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to an antitumor drug efficacy prediction method based on knowledge graph embedding representation relearning, which comprises the following steps: preparing original data; obtaining cell line embedded characterization Embed 1 The method comprises the steps of carrying out a first treatment on the surface of the Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed; obtaining a DNN classification model after trainingThe method comprises the steps of carrying out a first treatment on the surface of the And relearning the tumor cells, inputting the representation Embled into a DNN classification model after training, and predicting the relation between the tumor cells to be detected and the drug sensitivity. According to the invention, the convolutional neural network model is constructed by embedding and characterizing the original gene expression spectrum and the cell line, so that the new sample is directly represented by using the trained model, and the defect that the model needs to be retrained when the new sample is added in the existing method is overcome; integrates the original expression profile, drug effect label pair and gene regulation network information of the tumor cell line, and improves the sensitivity prediction performance of the tumor cell drug.

Description

Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning
Technical Field
The invention relates to the technical field of tumor cell drug sensitivity detection and evaluation, in particular to an anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning.
Background
Because of tumor heterogeneity and genetic diversity, individual patients with the same cancer will receive different therapeutic responses to even the same drug. Blind administration causes serious toxic side effects and even excessive treatment. The method based on the network representation learning has been proved to effectively extract the gene regulation characteristics of the sample, and has better tumor cell drug effect sensitivity prediction capability.
However, in the existing network-based expression learning method, samples are required to be fused to a priori gene regulation network in the process of extracting gene regulation characteristics, so that the fused network embedded expression is learned. The method for constructing the fusion network for all samples omits the defect that the new samples are added and the fusion network representation learning model needs to be reconstructed, thereby bringing inconvenience to field application and being unfavorable for improving the prediction capability.
Disclosure of Invention
In order to solve the defect that a fusion network needs to be reconstructed and a learning model represented by the fusion network is retrained when a new sample is added, the invention aims to provide the anti-tumor drug efficacy prediction method for relearning the expression based on knowledge graph embedding by relearning the regulation and control characteristics of the fusion genes, which not only solves the high-dimension of high-throughput data of genes, but also improves the drug sensitivity prediction performance of tumor cells.
In order to achieve the above purpose, the present invention adopts the following technical scheme: the method for predicting the drug effect of the antitumor drug based on knowledge graph embedding representation relearning comprises the following steps in sequence:
(1) Preparing original data: the original data comprise N cell line original gene expression profiles, drug effect tag pairs and a gene regulation network;
(2) Obtaining cell line embedded characterization Embed 1 : the cell and the gene regulation network are fused to obtain a cell-gene fusion regulation network map, the cell-gene fusion regulation network map is input into a knowledge map embedding model for learning, and the embedded representation of the cell line is obtained 1
(3) Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed;
(4) Constructing a DNN (deoxyribonucleic acid) classification model, and training the DNN classification model through relearning and characterizing Embed of tumor cells to obtain a trained DNN classification model;
(5) And relearning the tumor cells, inputting the representation Embled into a DNN classification model after training, and predicting the relation between the tumor cells to be detected and the drug sensitivity.
The step (2) specifically comprises the following steps:
(2a) Constructing a cell-gene fusion regulation network map: fusing all tumor cell nodes with a gene regulation network, fitting probability density distribution of tumor cell sample gene expression, and setting the probability density distribution at a quantile Z 1-α The other genes are used as hot spot genes of the cells, and the hot spot genes are linked with tumor cell nodes to obtain a cell-gene fusion regulation network map;
(2b) Inputting a cell-gene fusion regulation network map into a knowledge map embedding model, and calculating gene fusion expression characteristic expression of all tumor cell samples, wherein the method specifically comprises the following steps of:
(2b1) Extracting positive triplets in a cell-gene fusion regulation network map;
(2b2) And (3) carrying out negative triplet sampling to obtain a negative triplet set, and calculating the importance of the negative triplet by using the following formula:
where α is a constant representing the sampling rate, (h' j ,r,o′ j ) Represents the j-th negative triplet sample, h 'represents the negative triplet sample head vector representation, o' represents the negative triplet sample tail vector representation, r represents the negative triplet sample relationship vector representation, P j = |h 'o r-o' |is a scoring function of the sample, o represents the hadamard product;
(2b3) The resulting positive and negative triples are scored to calculate the total Loss:
wherein g (h' i ,r,o′ i ) Is the weight of a negative triplet sample i, M is the number of negative triplet samples, sigma represents the Sigmoid activation function, gamma represents a constant, and p (h, r, o) is the scoring function of the positive triplet;
(2b4) Updating regulatory fusion characteristic representations of all nodes and edges of the cell-gene fusion regulatory network map by using an Adam optimization algorithm;
(2b5) Repeating the steps (2 b 2) to (2 b 4) until the loss function shown in the step (2 b 3) converges, and taking the regulatory fusion characteristic representation of the cell line node as the embedded representation of the cell line 1
The step (3) specifically comprises the following steps:
(3a) Constructing a cell line embedded characterization relearning training set, wherein the cell line embedded characterization relearning training set is characterized by original gene expression profile of the cell line and embedded characterization Ebed of the cell line 1 Composition;
(3b) Constructing a cell line embedded characterization relearning depth network model, namely a one-dimensional convolutional neural network, wherein the one-dimensional convolutional neural network is provided with a plurality of convolutional layers, the cell line embedded characterization is processed through convolution, activation, batch standardization and pooling operations according to different convolution kernel sizes of facilities, a full connection layer is used as the output of the whole convolutional network after convolution, and the rejection rate Dropout is set to be 0.5;
(3c) The cell line embedding characterization relearning training set is input into a constructed cell line embedding characterization relearning depth network model, and the tumor cell relearning characterization embedded is obtained through the set one-dimensional convolutional neural network 2 Relearning tumor cells to characterize Ebed 2 Characterization of embedded with cell lines 1 Comparing the mean square error, and taking the mean square error as a loss function of the one-dimensional convolutional neural network;
wherein N is the number of cell lines, Y i Characterization of Ebed for tumor cell relearning 2 Is used as a reference to the value of (a),characterization of embedded for cell lines 1 Is a value of (b);
(3d) Updating tumor cell relearning characterization of embedded using Adam optimization algorithm 2
(3e) Repeating the steps (3 c) to (3 d) until the loss function in the step (3 c) converges, and obtaining the tumor cell relearning characterization Embed.
The step (4) specifically comprises the following steps:
(4a) Constructing a drug effect prediction training set, wherein the drug effect prediction training set consists of tumor cell relearning characterization Embed and drug effect label pairs;
(4b) Constructing a DNN classification model:
(4b1) Inputting the drug effect prediction training set into a constructed DNN classification model to obtain the probability of sensitivity of a cell line to drugs, and judging whether the output is sensitive or drug resistant to obtain a sensitive relation; setting a plurality of hidden layers according to the DNN classification model, setting the number of inter-layer units according to the dimension of the cell line embedding characterization, wherein the hidden layers L of the DNN classification model are more than or equal to 3, a ReLU activation function is used between layers, the number of neurons of the output layer units is 1, and the activation function is set as Sigmoid to be used as a classification task; the Sigmoid function outputs event probability, the output is set between 0 and 1, when the result is larger than a certain threshold value, the threshold value is 0.5, and the positive class is divided, namely the sensitivity is divided;
(4b2) Calculating binary cross entropy loss according to the sensitivity relationship obtained in the step (4 b 1) and the real sensitivity relationship of the drug effect label to serve as a loss function of the DNN binary classification model;
wherein N is the number of cell lines, y i For binary tag values 0 or 1, p (y i ) Is of y i Probability of tag value;
(4b3) Optimizing the sensitivity relationship of DNN two-class model output by using an Adam algorithm;
(4b4) Repeating the steps (4 b 1) to (4 b 3) until the loss function of the step (4 b 2) converges to obtain a trained DNN classification model.
The step (5) specifically refers to: using a DNN classification model after training, adopting tumor cell relearning to represent the relationship between Embed prediction tumor cells to be detected and drug sensitivity, wherein the sensitivity is 1, and the drug resistance is 0:
wherein f represents a trained DNN classification model, z i Representing the probability that the ith tumor cell to be predicted in the Ebed is sensitive to drug response through a Sigmoid function output; outputting 1 if the probability of outputting the drug response as sensitive is greater than 0.5, indicating sensitivity to the drug; if the probability of outputting a drug response as sensitive is less than 0.5, then 0 is output, indicating resistance to the drug.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, constructing a convolutional neural network model by embedding and characterizing an original gene expression spectrum and a cell line, realizing that a new sample is directly represented by using a trained model, and solving the defect that the model needs to be retrained when the new sample is added in the existing method; secondly, the original expression profile, the drug effect label pair and the gene regulation network information of the tumor cell line are integrated through embedding the characterization and relearning of the cell line, so that the drug sensitivity prediction performance of the tumor cell is improved; thirdly, a deep learning coding technology is introduced to solve the problem of high-throughput data high-dimensionality of genes.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a predictive flow chart of a trained DNN classification model.
Detailed Description
As shown in fig. 1, a method for predicting the efficacy of an antitumor drug based on knowledge-graph embedding and relearning comprises the following steps in sequence:
(1) Preparing original data: the original data comprise N cell line original gene expression profiles, drug effect tag pairs and a gene regulation network;
(2) Obtaining cell line embedded characterization Embed 1 : the cell and the gene regulation network are fused to obtain a cell-gene fusion regulation network map, the cell-gene fusion regulation network map is input into a knowledge map embedding model for learning, and the embedded representation of the cell line is obtained 1
(3) Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed;
(4) Constructing a DNN (deoxyribonucleic acid) classification model, and training the DNN classification model through relearning and characterizing Embed of tumor cells to obtain a trained DNN classification model;
(5) Tumor cells were relearned and characterized by input of an embedded model of DNN after training, and the relationship between the tumor cells to be tested and drug sensitivity was predicted, as shown in FIG. 2.
The step (2) specifically comprises the following steps:
(2a) Constructing a cell-gene fusion regulation network map: all tumors were treatedThe cell nodes are fused with a gene regulation network, and the probability density distribution of the gene expression of a tumor cell sample is fitted, so that the probability density distribution falls on a quantile Z 1-α The other genes are used as hot spot genes of the cells, and the hot spot genes are linked with tumor cell nodes to obtain a cell-gene fusion regulation network map;
(2b) Inputting a cell-gene fusion regulation network map into a knowledge map embedding model, and calculating gene fusion expression characteristic expression of all tumor cell samples, wherein the method specifically comprises the following steps of:
(2b1) Extracting positive triplets in a cell-gene fusion regulation network map;
(2b2) And (3) carrying out negative triplet sampling to obtain a negative triplet set, and calculating the importance of the negative triplet by using the following formula:
where α is a constant representing the sampling rate, (h' j ,r,o′ j ) Represents the j-th negative triplet sample, h 'represents the negative triplet sample head vector representation, o' represents the negative triplet sample tail vector representation, r represents the negative triplet sample relationship vector representation, P j = |h 'o r-o' |is a scoring function of the sample, o represents the hadamard product;
(2b3) The resulting positive and negative triples are scored to calculate the total Loss:
wherein g (h' i ,r,o′ i ) Is the weight of a negative triplet sample i, M is the number of negative triplet samples, sigma represents the Sigmoid activation function, gamma represents a constant, and p (h, r, o) is the scoring function of the positive triplet;
(2b4) Updating regulatory fusion characteristic representations of all nodes and edges of the cell-gene fusion regulatory network map by using an Adam optimization algorithm;
(2b5) Repeating the steps (2 b 2) to (2 b 4) until the loss function shown in the step (2 b 3) converges, and taking the regulatory fusion characteristic representation of the cell line node as the embedded representation of the cell line 1
The step (3) specifically comprises the following steps:
(3a) Constructing a cell line embedded characterization relearning training set, wherein the cell line embedded characterization relearning training set is characterized by original gene expression profile of the cell line and embedded characterization Ebed of the cell line 1 Composition;
(3b) And constructing a cell line embedded characterization relearning deep network model, namely a one-dimensional convolutional neural network. The one-dimensional convolutional neural network can be provided with a plurality of convolutional layers, and the cell line embedded characterization is processed through operations such as convolution, activation, batch standardization, pooling and the like by different convolutional kernel sizes of facilities. The one-dimensional convolutional network is set up into three convolutional layers: convolution width K 1 At a convolution step S of 7 1 1, maximum pooling width K 2 At 3, pooling step S 2 3, the only difference between layers is channel number C, which is 8, 16 and 32 respectively, after convolution, a full connection layer is used as the output of the whole convolution network, and the drop rate Dropout is set to be 0.5;
(3c) The cell line embedding characterization relearning training set is input into a constructed cell line embedding characterization relearning depth network model, and the tumor cell relearning characterization embedded is obtained through the set one-dimensional convolutional neural network 2 Relearning tumor cells to characterize Ebed 2 Characterization of embedded with cell lines 1 Comparing the mean square error, and taking the mean square error as a loss function of the one-dimensional convolutional neural network;
wherein N is the number of cell lines, Y i Characterization of Ebed for tumor cell relearning 2 Is used as a reference to the value of (a),is embedded into cell lineCharacterization of Embled 1 Is a value of (b);
(3d) Updating tumor cell relearning characterization of embedded using Adam optimization algorithm 2
(3e) Repeating the steps (3 c) to (3 d) until the loss function in the step (3 c) converges, and obtaining the tumor cell relearning characterization Embed.
The step (4) specifically comprises the following steps:
(4a) Constructing a drug effect prediction training set, wherein the drug effect prediction training set consists of tumor cell relearning characterization Embed and drug effect label pairs;
(4b) Constructing a DNN classification model:
(4b1) Inputting the drug effect prediction training set into a constructed DNN classification model to obtain the probability of sensitivity of a cell line to drugs, and judging whether the output is sensitive or drug resistant to obtain a sensitive relation; the DNN classification model can be provided with a plurality of hidden layers, the number of interlayer units is reasonably set according to the dimension of the cell line embedding characterization, the hidden layer L of the DNN classification model is set to be 3, and the number of interlayer units is a i The method comprises the following steps of: 200. 50, using ReLU or other activation functions between layers, wherein the number of the output layer unit neurons is 1, and the activation functions are set as Sigmoid to be used as classification tasks; the Sigmoid function outputs event probability, the output is set between 0 and 1, when the result is larger than a certain threshold value, the threshold value is 0.5, and the positive class is divided, namely the sensitivity is divided;
(4b2) Calculating binary cross entropy loss according to the sensitivity relationship obtained in the step (4 b 1) and the real sensitivity relationship of the drug effect label to serve as a loss function of the DNN binary classification model;
wherein N is the number of cell lines, y i For binary tag values 0 or 1, p (y i ) Is of y i Probability of tag value;
(4b3) Optimizing the sensitivity relationship of DNN two-class model output by using an Adam algorithm;
(4b4) Repeating the steps (4 b 1) to (4 b 3) until the loss function of the step (4 b 2) converges to obtain a trained DNN classification model.
The step (5) specifically refers to: using a DNN classification model after training, adopting tumor cell relearning to represent the relationship between Embed prediction tumor cells to be detected and drug sensitivity, wherein the sensitivity is 1, and the drug resistance is 0:
wherein f represents a trained DNN classification model, z i Representing the probability that the ith tumor cell to be predicted in the Ebed is sensitive to drug response through a Sigmoid function output; outputting 1 if the probability of outputting the drug response as sensitive is greater than 0.5, indicating sensitivity to the drug; if the probability of outputting a drug response as sensitive is less than 0.5, then 0 is output, indicating resistance to the drug.
In summary, the convolutional neural network model is constructed by embedding and characterizing the original gene expression profile and the cell line, so that the new sample is directly represented by using the trained model, and the defect that the model needs to be retrained when the new sample is added in the existing method is overcome; the original expression profile, the drug effect label pair and the gene regulation network information of the tumor cell line are integrated through embedding the characterization and relearning of the cell line, so that the drug sensitivity prediction performance of the tumor cell is improved; the deep learning coding technology is introduced to solve the difficult problem of high-throughput data and high-dimension of genes.

Claims (5)

1. The method for predicting the drug effect of the antitumor drug based on knowledge graph embedding representation relearning is characterized by comprising the following steps of: the method comprises the following steps in sequence:
(1) Preparing original data: the original data comprise N cell line original gene expression profiles, drug effect tag pairs and a gene regulation network;
(2) Obtaining cell line embedded characterization Embed 1 : i.e. fusing the cell with the gene regulation network to obtain a cell-gene fusion regulation network map, regulating and controlling the cell-gene fusionInputting the network map into a knowledge map embedding model for learning to obtain a cell line embedded characterization embedded 1
(3) Construction of a cell line embedded characterization relearning deep network model, characterization of Embed using cell line embedded 1 Learning the cell line embedded characterization relearning deep network model to obtain tumor cell relearning characterization Embed;
(4) Constructing a DNN (deoxyribonucleic acid) classification model, and training the DNN classification model through relearning and characterizing Embed of tumor cells to obtain a trained DNN classification model;
(5) And relearning the tumor cells, inputting the representation Embled into a DNN classification model after training, and predicting the relation between the tumor cells to be detected and the drug sensitivity.
2. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (2) specifically comprises the following steps:
(2a) Constructing a cell-gene fusion regulation network map: fusing all tumor cell nodes with a gene regulation network, fitting probability density distribution of tumor cell sample gene expression, and setting the probability density distribution at a quantile Z 1-α The other genes are used as hot spot genes of the cells, and the hot spot genes are linked with tumor cell nodes to obtain a cell-gene fusion regulation network map;
(2b) Inputting a cell-gene fusion regulation network map into a knowledge map embedding model, and calculating gene fusion expression characteristic expression of all tumor cell samples, wherein the method specifically comprises the following steps of:
(2b1) Extracting positive triplets in a cell-gene fusion regulation network map;
(2b2) And (3) carrying out negative triplet sampling to obtain a negative triplet set, and calculating the importance of the negative triplet by using the following formula:
wherein α is a constant, a substitutionTable sample rate, (h) j ′,r,o j ' represents the j-th negative triplet sample, h ' represents the negative triplet sample head vector representation, o ' represents the negative triplet sample tail vector representation, r represents the negative triplet sample relationship vector representation, P j = |h 'or-o' || is a scoring function of the sample, and o represents the hadamard product;
(2b3) The resulting positive and negative triples are scored to calculate the total Loss:
wherein g (h' i ,r,o′ i ) Is the weight of a negative triplet sample i, M is the number of negative triplet samples, sigma represents the Sigmoid activation function, gamma represents a constant, and p (h, r, o) is the scoring function of the positive triplet;
(2b4) Updating regulatory fusion characteristic representations of all nodes and edges of the cell-gene fusion regulatory network map by using an Adam optimization algorithm;
(2b5) Repeating the steps (2 b 2) to (2 b 4) until the loss function shown in the step (2 b 3) converges, and taking the regulatory fusion characteristic representation of the cell line node as the embedded representation of the cell line 1
3. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (3) specifically comprises the following steps:
(3a) Constructing a cell line embedded characterization relearning training set, wherein the cell line embedded characterization relearning training set is characterized by original gene expression profile of the cell line and embedded characterization Ebed of the cell line 1 Composition;
(3b) Constructing a cell line embedded characterization relearning depth network model, namely a one-dimensional convolutional neural network, wherein the one-dimensional convolutional neural network is provided with a plurality of convolutional layers, the cell line embedded characterization is processed through convolution, activation, batch standardization and pooling operations according to different convolution kernel sizes of facilities, a full connection layer is used as the output of the whole convolutional network after convolution, and the rejection rate Dropout is set to be 0.5;
(3c) The cell line embedding characterization relearning training set is input into a constructed cell line embedding characterization relearning depth network model, and the tumor cell relearning characterization embedded is obtained through the set one-dimensional convolutional neural network 2 Relearning tumor cells to characterize Ebed 2 Characterization of embedded with cell lines 1 Comparing the mean square error, and taking the mean square error as a loss function of the one-dimensional convolutional neural network;
wherein N is the number of cell lines, Y i Characterization of Ebed for tumor cell relearning 2 Is used as a reference to the value of (a),characterization of embedded for cell lines 1 Is a value of (b);
(3d) Updating tumor cell relearning characterization of embedded using Adam optimization algorithm 2
(3e) Repeating the steps (3 c) to (3 d) until the loss function in the step (3 c) converges, and obtaining the tumor cell relearning characterization Embed.
4. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (4) specifically comprises the following steps:
(4a) Constructing a drug effect prediction training set, wherein the drug effect prediction training set consists of tumor cell relearning characterization Embed and drug effect label pairs;
(4b) Constructing a DNN classification model:
(4b1) Inputting the drug effect prediction training set into a constructed DNN classification model to obtain the probability of sensitivity of a cell line to drugs, and judging whether the output is sensitive or drug resistant to obtain a sensitive relation; setting a plurality of hidden layers according to the DNN classification model, setting the number of inter-layer units according to the dimension of the cell line embedding characterization, wherein the hidden layers L of the DNN classification model are more than or equal to 3, a ReLU activation function is used between layers, the number of neurons of the output layer units is 1, and the activation function is set as Sigmoid to be used as a classification task; the Sigmoid function outputs event probability, the output is set between 0 and 1, when the result is larger than a certain threshold value, the threshold value is 0.5, and the positive class is divided, namely the sensitivity is divided;
(4b2) Calculating binary cross entropy loss according to the sensitivity relationship obtained in the step (4 b 1) and the real sensitivity relationship of the drug effect label to serve as a loss function of the DNN binary classification model;
wherein N is the number of cell lines, y i For binary tag values 0 or 1, p (y i ) Is of y i Probability of tag value;
(4b3) Optimizing the sensitivity relationship of DNN two-class model output by using an Adam algorithm;
(4b4) Repeating the steps (4 b 1) to (4 b 3) until the loss function of the step (4 b 2) converges to obtain a trained DNN classification model.
5. The method for predicting the efficacy of the antitumor drug based on knowledge-graph embedding representation relearning according to claim 1, which is characterized in that: the step (5) specifically refers to: using a DNN classification model after training, adopting tumor cell relearning to represent the relationship between Embed prediction tumor cells to be detected and drug sensitivity, wherein the sensitivity is 1, and the drug resistance is 0:
wherein f represents a trained DNN classification model, z i Representing the probability that the ith tumor cell to be predicted in the Ebed is sensitive to drug response through a Sigmoid function output; if the probability of outputting the drug response as sensitive is greater than 0.5, then output1, representing sensitivity to the drug; if the probability of outputting a drug response as sensitive is less than 0.5, then 0 is output, indicating resistance to the drug.
CN202311560265.2A 2023-11-22 2023-11-22 Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning Pending CN117438102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311560265.2A CN117438102A (en) 2023-11-22 2023-11-22 Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311560265.2A CN117438102A (en) 2023-11-22 2023-11-22 Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning

Publications (1)

Publication Number Publication Date
CN117438102A true CN117438102A (en) 2024-01-23

Family

ID=89549793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311560265.2A Pending CN117438102A (en) 2023-11-22 2023-11-22 Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning

Country Status (1)

Country Link
CN (1) CN117438102A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117612747A (en) * 2024-01-24 2024-02-27 杭州广科安德生物科技有限公司 Drug sensitivity prediction method and device for klebsiella pneumoniae

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117612747A (en) * 2024-01-24 2024-02-27 杭州广科安德生物科技有限公司 Drug sensitivity prediction method and device for klebsiella pneumoniae
CN117612747B (en) * 2024-01-24 2024-05-03 杭州广科安德生物科技有限公司 Drug sensitivity prediction method and device for klebsiella pneumoniae

Similar Documents

Publication Publication Date Title
CN117438102A (en) Anti-tumor drug efficacy prediction method based on knowledge graph embedding representation relearning
CN105488528B (en) Neural network image classification method based on improving expert inquiry method
CN111563431A (en) Plant leaf disease and insect pest identification method based on improved convolutional neural network
CN110473592B (en) Multi-view human synthetic lethal gene prediction method
Maulik et al. Simulated annealing based automatic fuzzy clustering combined with ANN classification for analyzing microarray data
CN107992945B (en) Characteristic gene selection method based on deep learning and evolutionary computation
CN116194995A (en) Method for identifying chromosomal dimensional instability such as homologous repair defects in next generation sequencing data of low coverage
CN111222638B (en) Neural network-based network anomaly detection method and device
CN110993113B (en) LncRNA-disease relation prediction method and system based on MF-SDAE
CN115985503B (en) Cancer prediction system based on ensemble learning
CN108520201A (en) Robust face recognition method based on weighted mixed norm regression
Zhu et al. Deep-gknock: nonlinear group-feature selection with deep neural networks
CN116469561A (en) Breast cancer survival prediction method based on deep learning
CN113870951A (en) Prediction system for predicting head and neck squamous cell carcinoma immune subtype
CN113450562B (en) Road network traffic state discrimination method based on clustering and graph convolution network
CN113177587B (en) Generalized zero sample target classification method based on active learning and variational self-encoder
CN112819087B (en) Method for detecting abnormality of BOD sensor of outlet water based on modularized neural network
CN114462670A (en) LSTM model-based power consumption prediction method
CN109858245A (en) A kind of intrusion detection method based on improvement depth confidence network
US20220076782A1 (en) Community Assignments in Identity by Descent Networks and Genetic Variant Origination
CN110459266B (en) Method for establishing SNP (Single nucleotide polymorphism) pathogenic factor and disease association relation model
CN116913390B (en) Gene regulation network prediction method based on multi-view attention network
CN116913390A (en) Gene regulation network prediction method based on multi-view attention network
CN113516180B (en) Method for identifying Z-Wave intelligent equipment
CN112832744A (en) Pumping unit well pump detection period prediction method based on LSTM neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination