CN113782089A - Drug sensitivity prediction method and device based on multigroup chemical data fusion - Google Patents

Drug sensitivity prediction method and device based on multigroup chemical data fusion Download PDF

Info

Publication number
CN113782089A
CN113782089A CN202111349387.8A CN202111349387A CN113782089A CN 113782089 A CN113782089 A CN 113782089A CN 202111349387 A CN202111349387 A CN 202111349387A CN 113782089 A CN113782089 A CN 113782089A
Authority
CN
China
Prior art keywords
data
cell line
drug
genes
drug sensitivity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111349387.8A
Other languages
Chinese (zh)
Other versions
CN113782089B (en
Inventor
吴健
冯芮苇
谢雨峰
赖泯汕
郭越
曹戟
何俏军
杨波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202111349387.8A priority Critical patent/CN113782089B/en
Publication of CN113782089A publication Critical patent/CN113782089A/en
Application granted granted Critical
Publication of CN113782089B publication Critical patent/CN113782089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Primary Health Care (AREA)
  • Toxicology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medicinal Chemistry (AREA)

Abstract

The invention discloses a method and a device for predicting drug sensitivity based on multigroup chemical data fusion, which belong to the field of drug sensitivity detection and comprise the following steps: the method comprises the steps of integrating three groups of chemical information, namely genomics data, proteomics data and metabonomics data of an individual cell line through a cell line graph characteristic module to obtain a cell line polygonal graph, fully considering the groups of chemical information of the cell line and potential relation among products expressed by genes in the groups of chemical layers, then extracting characteristics of the cell line polygonal graph through a cell line graph characteristic extraction module to fully extract node characteristics and side characteristics in the cell line polygonal graph as cell line characteristics, and finally predicting the semi-inhibitory concentration of a drug by a drug sensitivity prediction module according to the cell line characteristics and the drug characteristics extracted by the drug characteristic extraction module, so that the prediction accuracy of drug sensitivity is improved on the basis of comprehensively considering the genomics data, the proteomics data and the metabonomics data.

Description

Drug sensitivity prediction method and device based on multigroup chemical data fusion
Technical Field
The invention belongs to the technical field of drug sensitivity detection and evaluation, and particularly relates to a drug sensitivity prediction method and device based on multigroup chemical data fusion.
Background
The treatment of cancer is a great problem which is solved in an effort all over the world, and the development of high-throughput sequencing technology and artificial intelligence technology provides infinite possibility for the precise treatment of cancer. How to utilize abundant biological information of individuals and efficient analysis means such as deep learning and artificial intelligence to automatically learn the specific characteristics of the individuals and formulate a specific diagnosis and treatment scheme for each individual so as to realize accurate diagnosis and accurate treatment is an important problem which is very concerned by researchers and industries all over the world. Many researchers have made much effort and contribution to this problem, trying to apply individual genomic data to personalized diagnosis and medication recommendations for patients. However, the existing research still faces an important problem, and how to fully utilize the complex and diverse omics aggregates of each individual to realize more accurate prediction of drug efficacy and drug recommendation is still an important problem to be solved urgently.
With The progress of The research on Genomics, some public datasets are beginning to be applied more and more to bioinformatics research, such as Cancer Cell Line Encyclopedia (CCLE) and (Genomics of Drug Sensitivity in Cancer, GDSC), Cancer Genome map (The Cancer Genome Atlas, TCGA), etc., and proteomics dataset (STRING) for studying The interaction between human genes/proteins, metabolomics dataset for studying The human information pathway (GSEA dataset), etc. For example, a tumor cell drug sensitivity assessment method based on genetic material specificity disclosed in patent application publication No. CN105005693A, which uses a tumor cell sample set alone to predict the half inhibitory concentration (IC 50 value), and a drug sensitivity prediction method based on a self-expression model disclosed in patent application publication No. CN112164474A, and uses GDSC data set and cancer cell line encyclopedia to predict the half inhibitory concentration (IC 50 value).
The data sets are still continuously expanded and developed, and a rich sample data basis is provided for researching occurrence, development, prognosis, regression and the like of diseases. However, the existing data is rarely fully utilized, thereby solving the problems of drug susceptibility prediction and drug recommendation. For example, existing methods only use individual genomics data provided in the CCLE and GDSC databases to predict the semi-inhibitory concentration through genomics analysis, however, such methods often ignore the possible association of individual genes at other omics levels. Therefore, although such a method has been advanced to some extent, the accuracy of the semi-inhibitory concentration prediction is still insufficient. Therefore, at present, no good model is available which can sufficiently fuse multiple sets of individual mathematical information so as to predict drug sensitivity (half inhibitory concentration) more accurately.
Disclosure of Invention
In view of the above, the present invention aims to provide a method and a device for predicting drug sensitivity based on multigroup chemical data fusion, so as to solve the problem of poor accuracy of drug sensitivity prediction caused by neglecting potential connection between genes.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, an embodiment provides a drug sensitivity prediction method based on multigroup chemical data fusion, including the following steps:
acquiring multiple groups of chemical data, drug data and half-inhibition concentration data of drugs on the cell line, wherein the multiple groups of chemical data of the cell line comprise genomics data, proteomics data and metabonomics data;
constructing a drug sensitivity prediction model, which comprises a cell line graph characteristic module, a cell line graph characteristic extraction module, a drug characteristic extraction module and a drug sensitivity prediction module, wherein the cell line graph characteristic module is used for encoding multiple groups of chemical data of a cell line into a cell line polygonal graph, namely, genes of each sample are used as nodes of the cell line polygonal graph, and gene expression quantity, gene mutation condition and copy number variation condition corresponding to the genes are used as node characteristics, so that the connection edges between the nodes are constructed according to the correlation among the genes determined by genomics data, the protein interaction among the genes determined by proteomics data and the metabolic pathway information among the genes determined by the metabonomics data; the cell line image feature extraction module is used for extracting cell line features from a cell line polygonal image; the medicine characteristic extraction module is used for extracting medicine characteristics from the medicine data; the drug sensitivity prediction module is used for predicting the semi-inhibitory concentration of the drug according to the cell line characteristics and the drug characteristics;
performing parameter optimization on a drug sensitivity prediction model by taking multigroup mathematical data and drug data of a cell line as sample data and taking semi-inhibitory concentration data of a drug on the cell line as a truth label;
and (5) performing drug sensitivity prediction by using the drug sensitivity prediction model after parameter optimization.
In one embodiment, in the cell line graph characterization module, a pearson correlation coefficient between gene expression data of two genes is calculated according to genomics data to determine correlation between the genes, and when the pearson correlation coefficient is greater than a set threshold, a connecting edge between nodes corresponding to the two genes is constructed;
acquiring the interaction between two genes according to proteomics data as protein interaction, constructing a connecting edge between nodes corresponding to the two genes with the protein interaction, and simultaneously taking the interaction score of the interaction as the weight of the connecting edge;
metabolic pathway information among genes is obtained according to metabonomics data, and when multiple genes simultaneously appear in a certain metabolic pathway, a super edge is constructed between nodes corresponding to the genes to serve as a connecting edge.
In one embodiment, the cell line map feature extraction module comprises a first map neural network unit and a gating cycle unit, wherein the first map neural network unit is composed of a plurality of map convolutional layers, and two adjacent map convolutional layers are connected through the gating cycle unit, the first map neural network unit is used for extracting cell line features from a cell line polygon, and the gating cycle unit is used for performing feature attention on the extracted cell line features.
In one embodiment, in each convolutional layer, a three-step feature aggregation is performed on the node features, including:
the method comprises the steps of firstly, performing feature aggregation, namely determining all first-order neighbor nodes of a current node according to a first connecting edge constructed according to the correlation among genes, and performing feature aggregation through the following formula (1);
Figure 241192DEST_PATH_IMAGE001
(1)
wherein,
Figure 142021DEST_PATH_IMAGE003
is shown asiThe current node characteristics of the current node,
Figure 656179DEST_PATH_IMAGE004
representing the new node characteristics after the first step of characteristic aggregation,
Figure 785809DEST_PATH_IMAGE005
is shown asjThe node characteristics of the first one-order neighbor node,
Figure 600181DEST_PATH_IMAGE006
is shown asiA current node andjthe weight of the first connecting edge between the first one-order neighboring nodes,
Figure 157064DEST_PATH_IMAGE007
representing the number of first order neighbor nodes,
Figure 158518DEST_PATH_IMAGE004
representing the new node characteristics after the first step of characteristic aggregation;
secondly, performing feature aggregation, namely determining all second-first-order neighbor nodes of the current node according to a second connecting edge constructed by protein interaction between genes, and performing feature aggregation through the following formula (2);
Figure 357419DEST_PATH_IMAGE008
(2)
wherein,
Figure 760718DEST_PATH_IMAGE009
is shown asiNew node characteristics of current node
Figure 754082DEST_PATH_IMAGE004
The new node features after attention by the node gating unit,
Figure 242832DEST_PATH_IMAGE011
is shown askThe node characteristics of the second-order neighbor nodes,
Figure 714265DEST_PATH_IMAGE013
is shown asiA current node andkthe weight of the second connecting edge between the second first-order neighbor nodes,
Figure 221338DEST_PATH_IMAGE015
indicating the number of second-order neighbor nodes,
Figure 651183DEST_PATH_IMAGE017
representing the new node characteristics after the second step of characteristic aggregation;
thirdly, feature aggregation, namely determining all third-order neighbor nodes of the current node according to a third connecting edge constructed by metabolic pathway information among genes, and performing feature aggregation through the following formula (3);
Figure 627229DEST_PATH_IMAGE018
(3)
wherein,
Figure 902352DEST_PATH_IMAGE019
is shown asiNew node characteristics of current node
Figure 280244DEST_PATH_IMAGE021
The new node features after attention by the node gating unit,
Figure 349831DEST_PATH_IMAGE023
is shown astThe node characteristics of the third-first-order neighbor nodes,
Figure 78753DEST_PATH_IMAGE025
is shown asiCurrent node characteristics andtthe weight of the third connecting edge between the third-first-order neighbor nodes,
Figure 891988DEST_PATH_IMAGE027
indicating the number of third-order neighbor nodes,
Figure 389965DEST_PATH_IMAGE029
representing the new node characteristics after the third step of characteristic aggregation;
new node characteristics of the current node pass through
Figure 896033DEST_PATH_IMAGE031
Attention-off via node gating unitAnnotated new node features
Figure 581092DEST_PATH_IMAGE032
As the current node characteristic of the next convolution layer.
In one embodiment, the drug feature extraction module comprises a conversion unit and a second graph neural network unit, wherein the conversion unit is used for converting the drug data into a drug score graph, and the second graph neural network unit is used for extracting the drug features from the input drug score graph.
In one embodiment, the conversion unit encodes the drug data into a drug molecular graph using an open source library RDKit; the second graph neural network unit is constructed based on graph isomorphism principle.
In one embodiment, the drug sensitivity prediction module comprises a plurality of fully connected layers for performing feature fusion and regression on the input cell line features and the splicing features of the drug features to predict the semi-inhibitory concentration of the drug.
In one embodiment, after acquiring multiple sets of mathematical data, drug data and half-inhibitory concentration data of drug on cell lines, the data are subjected to outlier and missing value elimination, and the processed data are used for constructing a training sample.
In one embodiment, when the drug sensitivity prediction model is optimized, the model parameters of the drug sensitivity prediction model are updated by taking the predicted value of the half inhibitory concentration and the mean square error of the corresponding truth label as a loss function.
In a second aspect, embodiments provide a drug sensitivity prediction device based on multigroup chemical data fusion, including:
the data acquisition unit is used for acquiring multiple groups of chemical data, drug data and semi-inhibitory concentration data of drugs on the cell line, wherein the multiple groups of chemical data of the cell line comprise genomics data, proteomics data and metabonomics data;
the model construction unit is used for constructing a drug sensitivity prediction model and comprises a cell line graph characteristic module, a cell line graph characteristic extraction module, a drug characteristic extraction module and a drug sensitivity prediction module, wherein the cell line graph characteristic module is used for encoding multiple groups of chemical data of a cell line into a cell line polygonal graph, namely, genes of each sample are used as nodes, and gene expression quantity, gene mutation condition and copy number variation condition corresponding to the genes are used as node characteristics, so that the connection edges between the nodes are constructed according to the correlation among the genes determined by the genomics data, the protein interaction among the genes determined by the proteomics data and the metabolic pathway information among the genes determined by the metabolic data; the cell line image feature extraction module is used for extracting cell line features from a cell line polygonal image; the medicine characteristic extraction module is used for extracting medicine characteristics from the medicine data; the drug sensitivity prediction module is used for predicting the semi-inhibitory concentration of the drug according to the cell line characteristics and the drug characteristics;
the optimization learning unit is used for performing parameter optimization on the drug sensitivity prediction model by taking multigroup chemical data and drug data of the cell line as sample data and taking semi-inhibitory concentration data of the drug on the cell line as a truth label;
and the prediction unit is used for predicting the drug sensitivity by using the drug sensitivity prediction model after parameter optimization.
In a third aspect, embodiments provide a drug sensitivity prediction apparatus based on multi-set chemical data fusion, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the drug sensitivity prediction method based on multi-set chemical data fusion of the first aspect.
Compared with the prior art, the invention has the beneficial effects that at least:
the method comprises the steps of integrating three groups of chemical information, namely genomics data, proteomics data and metabonomics data of an individual cell line through a cell line graph characterization module to obtain a cell line polygonal graph, fully considering the groups of chemical information of the cell line and potential relation among products expressed by genes in the groups of chemical layers, then extracting the characteristics of the cell line polygonal graph through a cell line graph characteristic extraction module to fully extract node characteristics and side characteristics in the cell line polygonal graph as cell line characteristics, and finally predicting the semi-inhibitory concentration of a drug by a drug sensitivity prediction module according to the cell line characteristics and the drug characteristics extracted by the drug characteristic extraction module, so that the prediction accuracy of the drug sensitivity is improved on the basis of comprehensively considering the genomics data, the proteomics data and the metabonomics data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method for predicting drug sensitivity based on multigroup chemical data fusion provided by an embodiment;
FIG. 2 is a schematic structural diagram of a drug sensitivity prediction model provided in an embodiment;
FIG. 3 is a schematic diagram of a cell line profile constructed in a cell line profile characterization module according to an embodiment;
FIG. 4 is a schematic diagram of feature extraction in a cell line map feature extraction module according to an embodiment;
fig. 5 is a schematic structural diagram of a drug sensitivity prediction device based on multigroup chemical data fusion provided by an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
The method aims to solve the problems that the accuracy of a drug sensitivity prediction model is low and the drug sensitivity is difficult to predict accurately due to the existence of complex characteristics of individual multigroup information which are not considered and various potential relations of genes possibly existing on the multigroup level. The embodiment provides a drug sensitivity prediction method and a drug sensitivity prediction device in multigroup chemical data fusion. The potential relation possibly existing among multiple groups of chemical information and genes such as genomics, proteomics, metabonomics and the like is considered, and data characteristics are extracted by combining a graph neural network unit and a gate control circulation unit, so that the prediction accuracy of the drug sensitivity prediction model is improved.
Fig. 1 is a flowchart of a drug sensitivity prediction method based on multigroup chemical data fusion provided in the embodiment. As shown in fig. 1, the embodiment provides a method for predicting drug sensitivity based on multigroup chemical data fusion, which comprises the following steps:
s110, acquiring multigroup data, medicine data and semi-inhibitory concentration data of medicines to the cell line of the cell line, and constructing a training sample.
For each cell line sample, corresponding sets of mathematical data, drug data, and half-inhibitory concentration data of the drug on the cell line can be obtained. Wherein the plurality of sets of chemical data comprises genomic data, proteomic data, and metabolomic data. Drug data refers to the name of the drug acting on the cell line from which the drug molecular formula can be obtained. The semi-inhibitory concentration data characterize the resistance of the cell line to the drug, and the smaller the semi-inhibitory concentration data, the stronger the antibody specificity of the cell line to the drug. For all genes contained in the cell line, the genomics data comprise gene expression level, gene mutation condition and copy number variation condition; protein Interactions (PPIs) between proteomic data-responsive genes at the protein level; metabonomics data reflect the correspondence between genes at the level of the metabolic pathway, i.e., whether multiple genes are present on the same metabolic pathway.
In an embodiment, the acquisition data may be from multiple sets of mathematical data, such as: the CCLE data set records genomics data of cell lines, including gene expression level, copy number variation condition and gene mutation condition; the STRING data set records the interaction between human genes/proteins, the GSEA data set records the metabonomics information of a human metabonomics information channel, and the GDSC data set records the semi-inhibitory concentration value of a cell line to a certain drug. The general drug data in these data sets are expressed by name, and for convenience of extracting drug molecular graph, it is also necessary to obtain drug molecular formula from database (such as PubChem database) as drug research object.
Sample partitioning of the acquired data is required for optimizing drug sensitivity prediction model parameters. Specifically, cell line information, drug data and semi-inhibitory concentration data are extracted from each piece of record information of a GDSC dataset, and then multiple sets of mathematical data corresponding to each cell line are obtained from CCLE, STRING and GSEA datasets, wherein the data corresponding to each record is used as a training sample, namely the multiple sets of mathematical data and drug data of the cell line are used as sample data, and the semi-inhibitory concentration data of the drug on the cell line is used as a truth label.
In one possible embodiment, in order to improve the quality of the training sample and further improve the training effect of the model, after acquiring multiple sets of mathematical data of the cell line, drug data and semi-inhibitory concentration data of the drug on the cell line, the data is further subjected to outlier and missing value elimination processing, and the processed data is used for constructing the training sample.
And S120, constructing a drug sensitivity prediction model.
FIG. 2 is a schematic structural diagram of a drug sensitivity prediction model provided in the example. As shown in fig. 2, the drug sensitivity prediction model provided in the embodiment includes a cell line map characterization module, a cell line map feature extraction module, a drug feature extraction module, and a drug sensitivity prediction module. The cell line graph characterization module is used for encoding multiple sets of mathematical data of the cell line into a cell line polygon graph and realizing cell line polygon graph characterization based on fusion of the multiple sets of mathematical data; the cell line image feature extraction module is used for extracting cell line features from a cell line polygonal image; the medicine characteristic extraction module is used for extracting medicine characteristics from the medicine data; and the drug sensitivity prediction module is used for predicting the semi-inhibitory concentration of the drug according to the cell line characteristics and the drug characteristics.
FIG. 3 is a schematic diagram of the construction of a cell line polygon in the cell line graph characterization module provided in the examples. As shown in fig. 3, first, nodes and node features are constructed, specifically, genes of each sample are used as nodes of a cell line polygonal diagram, and accordingly, three features of the nodes are constructed according to genomic data of each gene, that is, a gene expression level, a gene mutation condition and a copy number variation condition corresponding to the gene are used as the node features, wherein the gene mutation condition is understood as whether a gene mutation occurs, and the copy number variation condition is understood as whether a copy number variation exists.
And then constructing connection edge information between the nodes, specifically, constructing connection edges between the nodes according to the correlation between genes determined according to the genomics data, the protein interaction between the genes determined according to the proteomics data and the metabolic pathway information between the genes determined according to the metabonomics data.
When constructing a connecting edge between nodes according to the correlation between genes, calculating a Pearson correlation coefficient between gene expression data of two genes to determine the correlation between the genes, and when the Pearson correlation coefficient is larger than a set threshold value, constructing a connecting edge between nodes corresponding to the two genes, wherein the corresponding weight is set to be 1, and the weight of the non-existing connecting edge is 0.
When the continuous edge between the nodes is established according to the protein interaction between the genes, the interaction between the two genes is obtained according to proteomics data to be used as the protein interaction, the continuous edge is established between the nodes corresponding to the two genes with the protein interaction, and meanwhile, the interaction score of the interaction is used as the continuous edge weight.
When the connecting edges between the nodes are established according to the metabolic pathway information between the genes, the metabolic pathway information between the genes is obtained according to the metabonomics data, and when a plurality of genes simultaneously appear in a certain metabolic pathway, a super edge is established between the nodes corresponding to the genes to be used as the connecting edge. In the examples, the linking and weighting between any two genes are obtained by two steps of super Edge Expansion (Clique Expansion) and Edge Merging (Edge Merging). Specifically, for a super edge formed by a plurality of genes, the super edge is firstly unfolded to obtain a full-connectivity graph in which every two nodes between nodes corresponding to the plurality of genes are interconnected, and a common connection edge is formed between every two nodes. After the super-edge unfolding operation is carried out on all super-edges, a plurality of connecting edges may be formed between nodes corresponding to two genes, all the connecting edges are subjected to edge merging operation, and the number of the connecting edges existing between the two nodes is used as the connecting edge weight of the two nodes.
Fig. 4 is a schematic diagram of feature extraction in the cell line map feature extraction module according to the embodiment. As shown in fig. 4, in consideration of the specific structure of the constructed cell line polygonal diagram, the cell line diagram feature extraction module provided by the embodiment includes a first diagram neural network unit and a gating cycle unit, wherein the first diagram neural network unit includes a plurality of diagram convolution layers, such as 8 diagram convolution layers, for extracting node features from the cell line polygonal diagram as cell line features. The gating circulation unit is connected with two adjacent graph convolution layers and used for giving different attention to the extracted node features to pay feature attention, namely the node features extracted by the previous graph convolution layer are used as the basis for feature extraction of the next graph convolution layer after being subjected to feature attention by the gating circulation unit, and therefore high attention of effective features in the feature extraction process can be achieved.
As shown in fig. 4, in each convolutional layer, three-step feature aggregation is performed on the node features, which are respectively used to implement information aggregation of the node features in the cell line multi-edge graph through different types of edges, and specifically includes:
the method comprises the steps of firstly, performing feature aggregation, namely determining all first-order neighbor nodes of a current node according to a first connecting edge constructed according to the correlation among genes, and performing feature aggregation through the following formula (1);
Figure 463598DEST_PATH_IMAGE033
(1)
wherein,
Figure 550502DEST_PATH_IMAGE034
is shown asiThe current node characteristics of the current node,
Figure 476739DEST_PATH_IMAGE035
representing new sections after the first step of feature aggregationThe characteristics of the points are such that,
Figure 914674DEST_PATH_IMAGE036
is shown asjThe node characteristics of the first one-order neighbor node,
Figure 69711DEST_PATH_IMAGE037
is shown asiA current node andjthe weight of the first connecting edge between the first one-order neighboring nodes,
Figure 542281DEST_PATH_IMAGE038
representing the number of first order neighbor nodes,
Figure 390151DEST_PATH_IMAGE035
representing the new node characteristics after the first step of characteristic aggregation;
secondly, performing feature aggregation, namely determining all second-first-order neighbor nodes of the current node according to a second connecting edge constructed by protein interaction between genes, and performing feature aggregation through the following formula (2);
Figure 315382DEST_PATH_IMAGE039
(2)
wherein,
Figure 274111DEST_PATH_IMAGE040
is shown asiNew node characteristics of current node
Figure 70028DEST_PATH_IMAGE035
The new node features after attention by the node gating unit,
Figure 354379DEST_PATH_IMAGE041
is shown askThe node characteristics of the second-order neighbor nodes,
Figure DEST_PATH_IMAGE042
is shown asiA current node andksecond connections between second first-order neighbor nodesThe weight of the edge(s) is,
Figure 501327DEST_PATH_IMAGE043
indicating the number of second-order neighbor nodes,
Figure 263747DEST_PATH_IMAGE044
representing the new node characteristics after the second step of characteristic aggregation;
thirdly, feature aggregation, namely determining all third-order neighbor nodes of the current node according to a third connecting edge constructed by metabolic pathway information among genes, and performing feature aggregation through the following formula (3);
Figure 179750DEST_PATH_IMAGE045
(3)
wherein,
Figure DEST_PATH_IMAGE046
is shown asiNew node characteristics of current node
Figure 353111DEST_PATH_IMAGE044
The new node features after attention by the node gating unit,
Figure 252934DEST_PATH_IMAGE047
is shown astThe node characteristics of the third-first-order neighbor nodes,
Figure DEST_PATH_IMAGE048
is shown asiCurrent node characteristics andtthe weight of the third connecting edge between the third-first-order neighbor nodes,
Figure 287886DEST_PATH_IMAGE049
indicating the number of third-order neighbor nodes,
Figure DEST_PATH_IMAGE050
representing the new node characteristics after the third step of characteristic aggregation;
new node of current nodeFeature(s)
Figure 323975DEST_PATH_IMAGE050
New node features after attention by node gating unit
Figure 684549DEST_PATH_IMAGE051
As the current node characteristic of the next convolution layer.
The feature aggregation of the three steps respectively aggregates the node features of the first-order neighbor nodes formed by the three edges, so that each graph convolution layer can aggregate the node features of all the first-order neighbor nodes formed by the three edges once, and after each step, feature attention is paid through a node gating unit on a node level, so that different weights are properly given to the node features aggregated by the different kinds of continuous edges.
As shown in fig. 2, the drug feature extraction module includes a conversion unit and a second graph neural network unit, wherein the conversion unit is used for converting the drug data into a drug score graph, and the second graph neural network unit is used for extracting the drug features from the input drug score graph. In one possible embodiment, the conversion unit encodes the drug data into a drug molecular graph by using the open source library RDKit, and the second graph neural network unit is constructed based on a graph isomorphism principle, that is, after the drug data is encoded into the drug molecular graph by using the open source library RDKit, the second graph neural network unit constructed based on the graph isomorphism principle is used for performing feature extraction on the drug molecular graph to obtain the drug features.
The second Graph neural Network unit constructed based on the Graph Isomorphism principle comprises a plurality of Graph Isomorphism Network (GIN) structures, each GIN structure comprises a convolutional layer (GINConv), a batch normalization layer (BN) and a ReLU activation layer (ReLU), and each GAT module comprises a convolutional layer (GATConv), a batch normalization layer (BN) and a ReLU activation layer (ReLU) of GAT.
As shown in fig. 2, the drug sensitivity prediction module includes a plurality of full-junction layers, such as 3 full-junction layers, the cell line characteristics and the drug characteristics are spliced and input to the drug sensitivity prediction module, and feature fusion and regression prediction are performed on the input spliced characteristics by using the plurality of full-junction layers to output the predicted half-inhibitory concentration of the drug-cell line pair.
And S130, performing parameter optimization on the drug sensitivity prediction model by using the training sample.
In the embodiment, parameter optimization is performed on the drug sensitivity prediction model by using multigroup mathematical data and drug data of a cell line as sample data and using half-inhibitory concentration data of a drug on the cell line as a truth label. Specifically, multigroup mathematical data of a cell line are input into a cell line graph characteristic module, a cell line polygon graph after characterization is input into a cell line graph characteristic extraction module, and cell line characteristics are obtained through information characterization and characteristic extraction; inputting the drug data into a drug feature extraction module, and obtaining drug features through information characterization and feature extraction; inputting the cell line characteristics and the drug characteristics into a drug sensitivity prediction module, outputting a predicted value of the semi-inhibitory concentration through calculation, and updating model parameters of a drug sensitivity prediction model by taking the predicted value of the semi-inhibitory concentration and the mean square error of a corresponding truth label as a loss function.
And S140, performing drug sensitivity prediction by using the drug sensitivity prediction model after parameter optimization.
When prediction is applied, multigroup mathematical data of a cell line are input into a drug sensitivity prediction model, the multigroup mathematical data of the cell line are encoded into a cell line polygonal diagram by using a cell line diagram characteristic module and input into a cell line diagram characteristic extraction module, and cell line characteristics are obtained through information characterization and characteristic extraction; inputting the drug data into a drug feature extraction module, and obtaining drug features through information characterization and feature extraction; inputting the cell line characteristics and the drug characteristics into a drug sensitivity prediction module, and outputting the predicted value of the semi-inhibitory concentration through calculation. For example, with the training and testing of drug sensitivity prediction models on 564 cell lines of pan-cancer species and 170 drugs, the RMSE on the test set was found to be only 0.7943, much better than the existing classes of models.
Fig. 5 is a schematic structural diagram of a drug sensitivity prediction device based on multigroup chemical data fusion provided by an embodiment. As shown in fig. 5, an embodiment provides a drug sensitivity prediction apparatus 500, including:
the data acquisition unit 510 is configured to acquire multiple sets of chemical data, drug data, and half-inhibitory concentration data of a drug on a cell line, where the multiple sets of chemical data include genomics data, proteomics data, and metabonomics data;
the model construction unit 520 is used for constructing a drug sensitivity prediction model and comprises a cell line graph characteristic module, a cell line graph characteristic extraction module, a drug characteristic extraction module and a drug sensitivity prediction module, wherein the cell line graph characteristic module is used for encoding multiple groups of chemical data of a cell line into a cell line polygonal graph, namely, genes of each sample are used as nodes, and gene expression quantity, gene mutation condition and copy number variation condition corresponding to the genes are used as node characteristics, so that the connection edges between the nodes are constructed according to the correlation among the genes determined by the genomics data, the protein interaction among the genes determined by the proteomics and the metabolic pathway information among the genes determined by the metabonomics; the cell line image feature extraction module is used for extracting cell line features from a cell line polygonal image; the medicine characteristic extraction module is used for extracting medicine characteristics from the medicine data; the drug sensitivity prediction module is used for predicting the semi-inhibitory concentration of the drug according to the cell line characteristics and the drug characteristics;
the optimization learning unit 530 is configured to perform parameter optimization on the drug sensitivity prediction model by using multiple sets of mathematical data and drug data of the cell line as sample data and using half-inhibitory concentration data of the drug on the cell line as a true value label;
and the predicting unit 540 is used for predicting the drug sensitivity by using the drug sensitivity prediction model after parameter optimization.
It should be noted that, when the drug sensitivity prediction device based on multi-set chemical data fusion provided in the above embodiments is used to perform drug sensitivity prediction, the division of the above functional units is taken as an example, and the above function assignment may be performed by different functional units according to needs, that is, the internal structure of the terminal or the server is divided into different functional units to perform all or part of the above described functions. In addition, the drug sensitivity prediction device based on the multigroup chemical data fusion provided in the above embodiments and the drug sensitivity prediction method based on the multigroup chemical data fusion provided in the above embodiments belong to the same concept, and the specific implementation process thereof is described in detail in the drug sensitivity prediction method based on the multigroup chemical data fusion, and is not described herein again.
The embodiment also provides a drug sensitivity prediction device based on multigroup chemical data fusion, which comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to realize the drug sensitivity prediction method based on multigroup chemical data fusion.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. In the embodiments provided in the present application, the memory may be a volatile memory at the near end, such as a RAM, a non-volatile memory, such as a ROM, a FLASH, a floppy disk, a mechanical hard disk, or the like, or a remote storage cloud. The processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA).
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A drug sensitivity prediction method based on multigroup chemical data fusion is characterized by comprising the following steps:
acquiring multiple groups of chemical data, drug data and half-inhibition concentration data of drugs on the cell line, wherein the multiple groups of chemical data of the cell line comprise genomics data, proteomics data and metabonomics data;
constructing a drug sensitivity prediction model, which comprises a cell line graph characteristic module, a cell line graph characteristic extraction module, a drug characteristic extraction module and a drug sensitivity prediction module, wherein the cell line graph characteristic module is used for encoding multiple groups of chemical data of a cell line into a cell line polygonal graph, namely, genes of each sample are used as nodes of the cell line polygonal graph, and gene expression quantity, gene mutation condition and copy number variation condition corresponding to the genes are used as node characteristics, so that the connection edges between the nodes are constructed according to the correlation among the genes determined by genomics data, the protein interaction among the genes determined by proteomics data and the metabolic pathway information among the genes determined by the metabonomics data; the cell line image feature extraction module is used for extracting cell line features from a cell line polygonal image; the medicine characteristic extraction module is used for extracting medicine characteristics from the medicine data; the drug sensitivity prediction module is used for predicting the semi-inhibitory concentration of the drug according to the cell line characteristics and the drug characteristics;
performing parameter optimization on a drug sensitivity prediction model by taking multigroup mathematical data and drug data of a cell line as sample data and taking semi-inhibitory concentration data of a drug on the cell line as a truth label;
and (5) performing drug sensitivity prediction by using the drug sensitivity prediction model after parameter optimization.
2. The method for predicting drug sensitivity based on multigroup chemical data fusion according to claim 1, wherein in the cell line graph characterization module, a pearson correlation coefficient between gene expression data of two genes is calculated according to genomic data to determine correlation between the genes, and when the pearson correlation coefficient is greater than a set threshold, a connecting edge between nodes corresponding to the two genes is constructed;
acquiring the interaction between two genes according to proteomics data as protein interaction, constructing a connecting edge between nodes corresponding to the two genes with the protein interaction, and simultaneously taking the interaction score of the interaction as the weight of the connecting edge;
metabolic pathway information among genes is obtained according to metabonomics data, and when multiple genes simultaneously appear in a certain metabolic pathway, a super edge is constructed between nodes corresponding to the genes to serve as a connecting edge.
3. The method for predicting drug sensitivity based on multigroup chemical data fusion according to claim 1, wherein the cell line map feature extraction module comprises a first map neural network unit and a gated cycle unit, the first map neural network unit and the gated cycle unit are composed of a plurality of map convolution layers, and two adjacent map convolution layers are connected through the gated cycle unit, wherein the first map neural network unit is used for extracting cell line features from a cell line polygon map, and the gated cycle unit is used for performing feature attention on the extracted cell line features.
4. The method of claim 3, wherein in each convolutional layer, node features are subjected to three-step feature aggregation, comprising:
the method comprises the steps of firstly, performing feature aggregation, namely determining all first-order neighbor nodes of a current node according to a first connecting edge constructed according to the correlation among genes, and performing feature aggregation through the following formula (1);
Figure 592918DEST_PATH_IMAGE001
(1)
wherein,
Figure 761862DEST_PATH_IMAGE002
is shown asiThe current node characteristics of the current node,
Figure 209024DEST_PATH_IMAGE003
representing the new node characteristics after the first step of characteristic aggregation,
Figure 664276DEST_PATH_IMAGE004
is shown asjThe node characteristics of the first one-order neighbor node,
Figure 829678DEST_PATH_IMAGE005
is shown asiA current node andjthe weight of the first connecting edge between the first one-order neighboring nodes,
Figure 661368DEST_PATH_IMAGE006
representing the number of first order neighbor nodes,
Figure 697457DEST_PATH_IMAGE003
representing the new node characteristics after the first step of characteristic aggregation;
secondly, performing feature aggregation, namely determining all second-first-order neighbor nodes of the current node according to a second connecting edge constructed by protein interaction between genes, and performing feature aggregation through the following formula (2);
Figure 713823DEST_PATH_IMAGE007
(2)
wherein,
Figure 100942DEST_PATH_IMAGE008
is shown asiNew node characteristics of current node
Figure 470744DEST_PATH_IMAGE003
The new node features after attention by the node gating unit,
Figure 892498DEST_PATH_IMAGE009
is shown askThe node characteristics of the second-order neighbor nodes,
Figure 955132DEST_PATH_IMAGE010
is shown asiA current node andkthe weight of the second connecting edge between the second first-order neighbor nodes,
Figure 563968DEST_PATH_IMAGE011
indicating the number of second-order neighbor nodes,
Figure 612826DEST_PATH_IMAGE012
representing the new node characteristics after the second step of characteristic aggregation;
thirdly, feature aggregation, namely determining all third-order neighbor nodes of the current node according to a third connecting edge constructed by metabolic pathway information among genes, and performing feature aggregation through the following formula (3);
Figure 623508DEST_PATH_IMAGE013
(3)
wherein,
Figure 857043DEST_PATH_IMAGE014
is shown asiNew node characteristics of current node
Figure 218754DEST_PATH_IMAGE015
The new node features after attention by the node gating unit,
Figure 195937DEST_PATH_IMAGE016
is shown astThe node characteristics of the third-first-order neighbor nodes,
Figure 326704DEST_PATH_IMAGE017
is shown asiCurrent node characteristics andtthe weight of the third connecting edge between the third-first-order neighbor nodes,
Figure 855775DEST_PATH_IMAGE018
indicating the number of third-order neighbor nodes,
Figure 173624DEST_PATH_IMAGE019
representing the new node characteristics after the third step of characteristic aggregation;
new node characteristics of the current node pass through
Figure 954498DEST_PATH_IMAGE020
New node features after attention by node gating unit
Figure 939771DEST_PATH_IMAGE021
As the current node characteristic of the next convolution layer.
5. The method for predicting drug sensitivity based on multigroup chemical data fusion of claim 1, wherein the drug feature extraction module comprises a conversion unit and a second graph neural network unit, wherein the conversion unit is used for converting the drug data into a drug score graph, and the second graph neural network unit is used for extracting the drug features from the input drug score graph.
6. The method for predicting drug sensitivity based on multigroup chemical data fusion of claim 5, wherein the transformation unit encodes the drug data into a drug molecular graph by using an open-source library RDkit; the second graph neural network unit is constructed based on graph isomorphism principle.
7. The method for predicting drug sensitivity based on multigroup chemical data fusion of claim 1, wherein the drug sensitivity prediction module comprises a plurality of fully-connected layers for performing feature fusion and regression prediction on the input cell line features and the splicing features of the drug features to obtain the semi-inhibitory concentration of the drug.
8. The method for predicting drug sensitivity based on multigroup chemical data fusion according to claim 1, characterized in that after multigroup chemical data of a cell line, drug data and semi-inhibitory concentration data of a drug on the cell line are obtained, outlier and missing value elimination processing is further performed on the data, and the processed data are used for constructing a training sample;
and when the parameters of the drug sensitivity prediction model are optimized, updating the model parameters of the drug sensitivity prediction model by taking the predicted value of the half inhibitory concentration and the mean square error of the corresponding truth label as a loss function.
9. A drug sensitivity prediction device based on multigroup chemical data fusion, comprising:
the data acquisition unit is used for acquiring multiple groups of chemical data, drug data and semi-inhibitory concentration data of drugs on the cell line, wherein the multiple groups of chemical data of the cell line comprise genomics data, proteomics data and metabonomics data;
the model construction unit is used for constructing a drug sensitivity prediction model and comprises a cell line graph characteristic module, a cell line graph characteristic extraction module, a drug characteristic extraction module and a drug sensitivity prediction module, wherein the cell line graph characteristic module is used for encoding multiple groups of chemical data of a cell line into a cell line polygonal graph, namely, genes of each sample are used as nodes, and gene expression quantity, gene mutation condition and copy number variation condition corresponding to the genes are used as node characteristics, so that the connection edges between the nodes are constructed according to the correlation among the genes determined by the genomics data, the protein interaction among the genes determined by the proteomics data and the metabolic pathway information among the genes determined by the metabolic data; the cell line image feature extraction module is used for extracting cell line features from a cell line polygonal image; the medicine characteristic extraction module is used for extracting medicine characteristics from the medicine data; the drug sensitivity prediction module is used for predicting the semi-inhibitory concentration of the drug according to the cell line characteristics and the drug characteristics;
the optimization learning unit is used for performing parameter optimization on the drug sensitivity prediction model by taking multigroup chemical data and drug data of the cell line as sample data and taking semi-inhibitory concentration data of the drug on the cell line as a truth label;
and the prediction unit is used for predicting the drug sensitivity by using the drug sensitivity prediction model after parameter optimization.
10. A drug sensitivity prediction device based on multigroup chemical data fusion, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the drug sensitivity prediction method based on multigroup chemical data fusion of any one of claims 1 to 8 when executing the computer program.
CN202111349387.8A 2021-11-15 2021-11-15 Drug sensitivity prediction method and device based on multigroup chemical data fusion Active CN113782089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111349387.8A CN113782089B (en) 2021-11-15 2021-11-15 Drug sensitivity prediction method and device based on multigroup chemical data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111349387.8A CN113782089B (en) 2021-11-15 2021-11-15 Drug sensitivity prediction method and device based on multigroup chemical data fusion

Publications (2)

Publication Number Publication Date
CN113782089A true CN113782089A (en) 2021-12-10
CN113782089B CN113782089B (en) 2022-02-18

Family

ID=78873903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111349387.8A Active CN113782089B (en) 2021-11-15 2021-11-15 Drug sensitivity prediction method and device based on multigroup chemical data fusion

Country Status (1)

Country Link
CN (1) CN113782089B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114255886A (en) * 2022-02-28 2022-03-29 浙江大学 Multi-group similarity guide-based drug sensitivity prediction method and device
CN114429787A (en) * 2021-12-30 2022-05-03 北京百度网讯科技有限公司 Omics data processing method and device, electronic device and storage medium
CN114678069A (en) * 2022-05-27 2022-06-28 浙江大学 Immune rejection prediction and signal path determination device for organ transplantation
CN114999630A (en) * 2022-06-07 2022-09-02 浙江大学 Liver transplantation recipient prognosis prediction device based on multi-source data fusion
CN115206421A (en) * 2022-07-19 2022-10-18 北京百度网讯科技有限公司 Drug repositioning method, and repositioning model training method and device
CN116110509A (en) * 2022-11-15 2023-05-12 浙江大学 Method and device for predicting drug sensitivity based on histology consistency pretraining
CN116597902A (en) * 2023-04-24 2023-08-15 浙江大学 Method and device for screening multiple groups of chemical biomarkers based on drug sensitivity data
CN116705194A (en) * 2023-06-06 2023-09-05 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
WO2023221125A1 (en) * 2022-05-20 2023-11-23 京东方科技集团股份有限公司 Drug sensitivity prediction method, model training method, storage medium and device
CN117524346A (en) * 2023-11-20 2024-02-06 东北林业大学 Multi-view cancer drug response prediction system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050131057A1 (en) * 2003-08-25 2005-06-16 Board Of Regents, The University Of Texas System Taxane chemosensitivity prediction test
CN107609326A (en) * 2017-07-26 2018-01-19 同济大学 Drug sensitivity prediction method in the accurate medical treatment of cancer
CN108877953A (en) * 2018-06-06 2018-11-23 中南大学 A kind of drug sensitivity prediction method based on more similitude networks
CN112599218A (en) * 2020-12-16 2021-04-02 北京深度制耀科技有限公司 Training method and prediction method of drug sensitivity prediction model and related device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050131057A1 (en) * 2003-08-25 2005-06-16 Board Of Regents, The University Of Texas System Taxane chemosensitivity prediction test
CN107609326A (en) * 2017-07-26 2018-01-19 同济大学 Drug sensitivity prediction method in the accurate medical treatment of cancer
CN108877953A (en) * 2018-06-06 2018-11-23 中南大学 A kind of drug sensitivity prediction method based on more similitude networks
CN112599218A (en) * 2020-12-16 2021-04-02 北京深度制耀科技有限公司 Training method and prediction method of drug sensitivity prediction model and related device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOSÉ ZARIFFA: "Relationship Between Clinical Assessments of Function and Measurements From an Upper-Limb Robotic Rehabilitation Device in Cervical Spinal Cord Injury", 《 IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING》 *
李叙潼: "人工智能算法在药物细胞敏感性预测中的应用", 《科学通报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429787A (en) * 2021-12-30 2022-05-03 北京百度网讯科技有限公司 Omics data processing method and device, electronic device and storage medium
CN114255886B (en) * 2022-02-28 2022-06-14 浙江大学 Multi-group similarity guide-based drug sensitivity prediction method and device
CN114255886A (en) * 2022-02-28 2022-03-29 浙江大学 Multi-group similarity guide-based drug sensitivity prediction method and device
WO2023221125A1 (en) * 2022-05-20 2023-11-23 京东方科技集团股份有限公司 Drug sensitivity prediction method, model training method, storage medium and device
CN114678069A (en) * 2022-05-27 2022-06-28 浙江大学 Immune rejection prediction and signal path determination device for organ transplantation
CN114999630A (en) * 2022-06-07 2022-09-02 浙江大学 Liver transplantation recipient prognosis prediction device based on multi-source data fusion
CN115206421A (en) * 2022-07-19 2022-10-18 北京百度网讯科技有限公司 Drug repositioning method, and repositioning model training method and device
CN116110509A (en) * 2022-11-15 2023-05-12 浙江大学 Method and device for predicting drug sensitivity based on histology consistency pretraining
CN116110509B (en) * 2022-11-15 2023-08-04 浙江大学 Method and device for predicting drug sensitivity based on histology consistency pretraining
CN116597902A (en) * 2023-04-24 2023-08-15 浙江大学 Method and device for screening multiple groups of chemical biomarkers based on drug sensitivity data
CN116597902B (en) * 2023-04-24 2023-12-01 浙江大学 Method and device for screening multiple groups of chemical biomarkers based on drug sensitivity data
CN116705194A (en) * 2023-06-06 2023-09-05 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN116705194B (en) * 2023-06-06 2024-06-04 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN117524346A (en) * 2023-11-20 2024-02-06 东北林业大学 Multi-view cancer drug response prediction system

Also Published As

Publication number Publication date
CN113782089B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN113782089B (en) Drug sensitivity prediction method and device based on multigroup chemical data fusion
CN112863696B (en) Drug sensitivity prediction method and device based on transfer learning and graph neural network
CN114255886B (en) Multi-group similarity guide-based drug sensitivity prediction method and device
Zeng et al. Review of statistical learning methods in integrated omics studies (an integrated information science)
WO2022111385A1 (en) Graph neural network-based clinical omics data processing method and apparatus, device, and medium
Wang et al. FastGGM: an efficient algorithm for the inference of Gaussian graphical model in biological networks
CN112784913B (en) MiRNA-disease association prediction method and device based on fusion of multi-view information of graphic neural network
CN112037912A (en) Triage model training method, device and equipment based on medical knowledge map
US20220130541A1 (en) Disease-gene prioritization method and system
CN105653846A (en) Integrated similarity measurement and bi-directional random walk based pharmaceutical relocation method
CN115798598B (en) Hypergraph-based miRNA-disease association prediction model and method
CN116110509B (en) Method and device for predicting drug sensitivity based on histology consistency pretraining
D’Agaro Artificial intelligence used in genome analysis studies
Yang et al. Machine learning methods for exploring sequence determinants of 3D genome organization
WO2024164739A1 (en) Graph network construction method and apparatus, electronic device, and storage medium
Bansal et al. A review on machine learning aided multi-omics data integration techniques for healthcare
CN117079804A (en) Method and system for constructing digestive system tumor clinical result prediction model
CN116978464A (en) Data processing method, device, equipment and medium
CN115410642A (en) Biological relation network information modeling method and system
Wang et al. Network clustering analysis using mixture exponential-family random graph models and its application in genetic interaction data
Wang et al. Prediction of the disease causal genes based on heterogeneous network and multi-feature combination method
CN114999630A (en) Liver transplantation recipient prognosis prediction device based on multi-source data fusion
Li et al. iEnhance: a multi-scale spatial projection encoding network for enhancing chromatin interaction data resolution
Gentry et al. Missingness adapted group informed clustered (MAGIC)-LASSO: A novel paradigm for prediction in data with widespread non-random missingness
CN116994652B (en) Information prediction method and device based on neural network and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant