CN116110509A - Method and device for predicting drug sensitivity based on histology consistency pretraining - Google Patents

Method and device for predicting drug sensitivity based on histology consistency pretraining Download PDF

Info

Publication number
CN116110509A
CN116110509A CN202211422775.9A CN202211422775A CN116110509A CN 116110509 A CN116110509 A CN 116110509A CN 202211422775 A CN202211422775 A CN 202211422775A CN 116110509 A CN116110509 A CN 116110509A
Authority
CN
China
Prior art keywords
cell line
drug
tumor cell
consistency
coding module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211422775.9A
Other languages
Chinese (zh)
Other versions
CN116110509B (en
Inventor
曹戟
陈文博
欧阳振球
杨波
何俏军
吴健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211422775.9A priority Critical patent/CN116110509B/en
Publication of CN116110509A publication Critical patent/CN116110509A/en
Application granted granted Critical
Publication of CN116110509B publication Critical patent/CN116110509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Computation (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a device for predicting drug sensitivity based on histology consistency pre-training, wherein the method comprises the following steps: constructing a medicine map, a gene expression map and a gene mutation map, and acquiring sensitivity data of a tumor cell line to medicines; constructing a tumor cell line coding module, and pre-training the tumor cell line coding module based on the histology consistency according to the gene expression diagram and the gene mutation diagram, wherein the histology consistency is any one or at least two of the histology consistency based on a predictive formula, the histology consistency based on a contrast formula and the histology consistency based on a generating formula; constructing a drug sensitivity prediction model based on a pre-trained tumor cell line coding module; parameter optimization is carried out on the drug sensitivity prediction model according to the gene diagram, the drug diagram and the sensitivity data; and predicting the drug sensitivity by using the parameter-optimized drug sensitivity prediction model. The method and the device can improve the accuracy of drug sensitivity prediction.

Description

Method and device for predicting drug sensitivity based on histology consistency pretraining
Technical Field
The invention belongs to the technical field of drug sensitivity detection and evaluation, and particularly relates to a drug sensitivity prediction method and device based on histology consistency pre-training.
Background
Because of the rise of individual medical treatment, scientific researchers and doctors put the eyes on the accurate treatment. Because of the temporal and spatial heterogeneity of tumors, each cancer patient may respond differently to the same drug or therapy, which in turn may lead to toxic side effects or even exacerbate tumor progression. Therefore, there is a great need in clinic for a method for rapidly and accurately predicting drug sensitivity of individual patients, so as to guide clinical medication. With the development of high-throughput technology, various sequencing means bring about explosive growth of histology data, and the use of multiple sets of histology information to predict drug sensitivity has become an extremely important task in personalized medicine due to the extremely strong individual differences of patient histology data.
Many existing public datasets, such as anticancer drug sensitive genome, transcriptome datasets CCLE (Cancer Cell Line Encyclopedia) and GDSC (Genomics of Drug Sensitivity in Cancer), cancer patient profile TCGA (The Cancer Genome Atlas), and proteomics dataset of protein-protein interactions (STRING database), provide a rich molecular level data and clinical sample data basis for studying disease occurrence, development, prognosis, etc. Based on these large data sets, researchers have proposed several machine learning methods to explore the relationship between histology information and drug response, using genetic information of tumor cell lines for prediction of half-inhibitory concentrations. However, these methods do not provide good extraction of the characteristics of the cell lines and thus characterize the cell lines due to insufficient data volume of the tumor cell lines used, resulting in insufficient training of the tumor cell line coding module and reduced drug sensitivity prediction performance. Meanwhile, in order to solve the above-mentioned problems, some methods employ more than one type of histology data, for example, using gene expression level, gene mutation, gene copy number, etc. simultaneously to characterize a cell line, so as to achieve better prediction accuracy, for example, a drug susceptibility prediction method and device based on multi-group chemical similarity guidance as disclosed in patent document publication No. CN114255886a, and a drug susceptibility prediction method and device based on transfer learning and graph neural network as disclosed in patent document publication No. CN112863696 a. However, these methods may overfit the tumor cell line coding modules and still do not adequately address the above-described problems. Therefore, no better model exists at present to accurately extract the characteristics of a tumor cell line, so that the drug sensitivity prediction with higher accuracy is realized.
Along with the development of big data and high-performance hardware, the pre-training model has been greatly successful in various fields of deep learning by utilizing massive unlabeled data, but there are few pre-training models for tumor cell lines in the field of drug sensitivity prediction.
Disclosure of Invention
In view of the above, the present invention aims to provide a method and a device for predicting drug sensitivity for consistent pretraining in histology, so as to solve the problem of poor performance of a drug sensitivity prediction model caused by insufficient training of a tumor cell line coding module.
In order to achieve the above object, an embodiment provides a method for predicting drug sensitivity based on a pretraining of histology consistency, comprising the steps of:
obtaining small molecular data of a drug, constructing a drug graph, obtaining histology information and proteomics data of a tumor cell line including gene expression quantity and gene mutation information, constructing a gene graph, a gene expression graph and a gene mutation graph, and obtaining sensitivity data of the tumor cell line to the drug as tag data;
constructing a tumor cell line coding module, and pre-training the tumor cell line coding module based on the histology consistency according to the gene expression diagram and the gene mutation diagram, wherein the histology consistency is any one or at least two of the histology consistency based on a predictive formula, the histology consistency based on a contrast formula and the histology consistency based on a generating formula;
constructing a drug sensitivity prediction model, wherein the drug sensitivity prediction model comprises a pre-trained tumor cell line coding module, a drug small molecule coding module and a drug sensitivity prediction module, the pre-trained tumor cell line coding module is used for extracting cell line representation of a gene map, the drug small molecule coding module is used for extracting drug representation of the drug map, and the drug sensitivity prediction module is used for calculating a sensitivity prediction result of a drug after acting on a tumor cell line according to the cell line representation and the drug representation;
and taking the gene diagram and the drug diagram as input, carrying out parameter optimization on the drug sensitivity prediction model under the supervision of the label data, and carrying out drug sensitivity prediction by using the drug sensitivity prediction model after parameter optimization.
In one embodiment, when a tumor cell line coding module is pre-trained based on the predictive type histology consistency according to a gene expression diagram, a predictive type training system is constructed, wherein the predictive type training system comprises a tumor cell line coding module, a first mapping head and a first regularization operation which are connected to the output end of the tumor cell line coding module, and a second mapping head and a second regularization operation;
pretraining a tumor cell line encoding module with a predictive training system, comprising:
acquiring inherent characteristics related to a tumor cell line and taking the inherent characteristics as a first supervision tag and taking gene mutation information as a second supervision tag, wherein the inherent characteristics comprise cancer type, tissue source, tissue type, sex or age;
inputting a gene expression diagram into a predictive training system, extracting a gene expression representation by a tumor cell line coding module, mapping and transforming the gene expression representation by a first mapping head, then predicting inherent characteristics by a first regularization treatment operation, and simultaneously, mapping and transforming the gene expression representation by a second mapping head and then predicting gene mutation information by a second regularization treatment operation;
calculating a first cross entropy loss according to the predicted inherent characteristics and the first supervision tag, constructing a second cross entropy loss according to the predicted gene mutation information and the second supervision tag, and pre-training a tumor cell line coding module by taking weighted summation of the first cross entropy loss and the second cross entropy loss as a prediction-based histology consistency loss.
In one embodiment, when the tumor cell line encoding module is pre-trained based on comparative genomic consistency from the gene expression profile and the gene mutation profile, the gene expression profile and the gene mutation profile are input to the tumor cell line encoding module to extract a gene expression characterization and a gene mutation characterization, respectively, a contrast loss is calculated based on the gene expression characterization and the gene mutation characterization, and the tumor cell line encoding module is pre-trained with minimized contrast loss as comparative based genomic consistency loss.
In one embodiment, when the tumor cell line coding module is pre-trained based on the generated histology consistency according to the gene expression map and the gene mutation map, a generated training system is constructed, wherein the generated training system comprises the tumor cell line coding module, a first variation self-encoder and a second variation self-encoder which are connected to the output end of the tumor cell line coding module;
pretraining a tumor cell line coding module with a generative training system, comprising:
inputting a gene expression diagram into a generating training system, extracting gene expression characterization from the gene expression diagram through a tumor cell line coding module, and predicting gene mutation data from the gene expression characterization through coding and decoding of a first variation self-coder;
inputting the gene mutation map into a generating training system, extracting gene mutation characterization from the gene expression map through a tumor cell line coding module, and predicting gene expression data through coding and decoding of a second variation self-coder by the gene mutation characterization;
calculating a first mean square error loss according to the predicted gene mutation data and the gene mutation information serving as a supervision tag, constructing a second mean square error loss according to the predicted gene expression data and the gene expression quantity serving as the supervision tag, and pre-training a tumor cell line coding module by taking weighted summation of the first mean square error loss and the second mean square error loss as a generation-based histology consistency loss.
In one embodiment, constructing a gene map, a gene expression map, and a gene mutation map from tumor cell line histology information, proteomics data including gene expression amounts and gene mutation information, includes:
the method comprises the steps of taking genes as nodes of a gene diagram, a gene expression diagram and a gene mutation diagram, determining protein-protein interaction encoded by the genes according to proteomics data, determining connection relations among the genes according to the protein-protein interaction, and constructing connecting edges among the nodes according to the connection relations;
the method comprises the steps of taking tumor cell line histology information as node characteristics, taking gene expression quantity as node characteristics for a gene expression diagram, and taking gene mutation information as node characteristics for a gene mutation diagram.
In one embodiment, the tumor cell line encoding module employs a graph attention network and the drug small molecule encoding module employs a graph attention network.
In one embodiment, the parameters of the pre-trained tumor cell line coding module, the drug small molecule coding module and the drug susceptibility prediction module are optimized by taking the genetic map and the drug map as inputs and taking the cross entropy of the susceptibility prediction result output by the drug susceptibility prediction model and the label data as a total loss function when the drug susceptibility prediction model is optimized under the supervision of the label data.
In one embodiment, acquiring drug small molecule data and constructing a drug map includes: and constructing a drug graph by taking atoms of drug small molecules as nodes and chemical bonds among the atoms as connecting edges.
To achieve the above object, an embodiment of the present invention further provides a device for predicting drug susceptibility based on a pretraining of histology consistency, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for predicting drug susceptibility based on pretraining of histology consistency when executing the computer program.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
pretraining of the tumor cell line coding module is completed through the histology consistency among the histology information of the tumor cell line so as to fully mine potential links of different histology data of the cell line, thereby enabling the tumor cell line coding module to have the capability of extracting cell line characterization more accurately; meanwhile, the data of the gene expression quantity and the gene mutation information can be fully utilized, and a consistency pre-training mode based on a predictive formula, a consistency pre-training mode based on a contrast formula and a consistency pre-training mode based on a generating formula are provided, so that robustness and generalization of histology data and potential hierarchical structural semantic information in histology information of tumor cell lines are fully considered.
The provided drug sensitivity prediction model based on the histology consistency pretraining utilizes the relativity among the histology to enable the tumor cell line coding module to contain more abundant biological information while extracting the cell line characteristics efficiently so as to train a more accurate drug sensitivity prediction model and improve the prediction accuracy of the drug sensitivity prediction model.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for predicting drug susceptibility based on a histologic consistency pre-training provided in an embodiment;
FIG. 2 is a pre-training flow chart for predictive-based omic consistency provided by an embodiment;
FIG. 3 is a diagram of a pretraining flow based on comparative omic consistency provided by the embodiments;
FIG. 4 is a pre-training flow chart based on generative team consistency provided by an embodiment;
fig. 5 is a schematic structural diagram of a drug susceptibility prediction model provided in the examples.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
In order to solve the problem of poor performance of a drug sensitivity prediction model caused by insufficient training of a tumor cell line coding module, a great deal of researches are carried out. It was found that the tumor cell line histology information, which contains the gene expression level in transcriptome and the gene mutation information in genomics, can well reflect the molecular level characteristics of one cell line, and that potential links and similarities exist between different tumor cell line histology information. Based on the above, the embodiment of the invention provides a drug sensitivity prediction method and device based on histology consistency pre-training, which are used for fully training a tumor cell line coding module by utilizing the histology consistency of a tumor cell line so as to extract more accurate cell line characterization and improve the prediction accuracy of a drug sensitivity prediction model.
FIG. 1 is a flow chart of a method for predicting drug susceptibility based on a histologic consistency pre-training provided in an embodiment. As shown in fig. 1, the method for predicting drug sensitivity based on the pretraining of the histology consistency provided in the embodiment includes the following steps:
step 1, obtaining histology information of tumor cell lines, proteomics data and constructing a gene map, a gene expression map and a gene mutation map.
In an embodiment, the obtained histology information of the tumor cell line is derived from various data sets, for example, from a TCGA data set, which records the histology information of the tumor cell line, including gene expression level and gene mutation information. Proteomic data is obtained from various data sets, for example from the sting data set, which records protein-protein interactions.
In the embodiment, when the gene map, the gene expression map and the gene mutation map are constructed according to the tumor cell line histology information and the proteomics data, nodes of the gene map, the gene expression map and the gene mutation map are all genes, but node characteristics are different, the tumor cell line histology information is used as node characteristics for the gene map, the gene expression amount is used as node characteristics for the gene expression map, and the gene mutation information is used as node characteristics for the gene mutation map. The construction modes of the connecting edges among the nodes in the gene diagram, the gene expression diagram and the gene mutation diagram are the same, the protein-protein interaction encoded by the genes is determined according to proteomics data, the connection relation among the genes is determined according to the protein-protein interaction, and when the correlation coefficient determined according to the protein-protein interaction is more than a threshold value, the genes are considered to have the connection relation, and the connecting edges are constructed among the nodes according to the connection relation.
And 2, acquiring small molecular data of the medicine and constructing a medicine graph.
In an embodiment, the obtained drug small molecule data is typically displayed in the form of a name or drug ID, and in order to facilitate extraction of a drug map, it is necessary to obtain a drug SMILES type from a database (e.g., pubChem database) for construction of a subsequent drug map. And when the drug graph is constructed, the drug small molecule data is characterized as a 2D graph, namely, nodes and sides are respectively constructed according to atoms and chemical bonds of the drug small molecules, so as to obtain the drug graph, wherein the atomic information of the drug is encoded into node characteristics, and the information on the chemical bonds is encoded into side information.
And step 3, acquiring sensitivity data of the tumor cell line to the drug as tag data.
In an embodiment, the sensitivity data of the tumor cell line to the drug is derived from various data sets, for example from the TCGA data set, which records the sensitivity data of the tumor cell line to a certain drug, i.e. sensitive/insensitive. These sensitivity data are used as label data for training a drug sensitivity prediction model.
And 4, pretraining the coding module of the tumor cell line based on the histology consistency according to the gene expression diagram and the gene mutation diagram.
In an embodiment, a tumor cell line coding module is constructed, wherein the tumor cell line coding module can employ a graph attention network (Graph Attention Network, GAT). After the structure of the tumor cell line coding module is built, pretraining the tumor cell line coding module based on the histology consistency according to the gene expression diagram and the gene mutation diagram, wherein the histology consistency is any one or at least two of the histology consistency based on a predictive formula, the histology consistency based on a contrast formula and the histology consistency based on a generating formula. That is, when the pre-training is performed, the pre-training may be performed by prediction-based histology consistency, contrast-based histology consistency, or generation-based histology consistency alone, or may be performed by combination of prediction-based histology consistency and weighted sum of contrast-based histology consistency, or may be performed by combination of prediction-based histology consistency and weighted sum of generation-based histology consistency, or may be performed by combination of prediction-based histology consistency training, contrast-based histology consistency training, and generation-based histology consistency weighted sum. The following is a detailed description of three separate pretrains.
In an embodiment, as shown in fig. 2, when a tumor cell line coding module is pre-trained based on the histology consistency of a predictive expression according to a gene expression diagram, a predictive expression training system is constructed, wherein the predictive expression training system comprises a tumor cell line coding module, a first mapping head and a first regularization operation which are connected to the output end of the tumor cell line coding module, and a second mapping head and a second regularization operation;
pretraining a tumor cell line encoding module with a predictive training system, comprising:
acquiring inherent characteristics related to a tumor cell line and taking the inherent characteristics as a first supervision tag and taking gene mutation information as a second supervision tag, wherein the inherent characteristics comprise cancer type, tissue source, tissue type, sex or age;
inputting a gene expression diagram into a predictive training system, extracting a gene expression representation by a tumor cell line coding module, mapping and transforming the gene expression representation by a first mapping head, then predicting inherent characteristics by a first regularization treatment operation, and simultaneously, mapping and transforming the gene expression representation by a second mapping head and then predicting gene mutation information by a second regularization treatment operation;
calculating a first cross entropy loss according to the predicted inherent characteristics and the first supervision tag, constructing a second cross entropy loss according to the predicted gene mutation information and the second supervision tag, and pre-training a tumor cell line coding module by taking weighted summation of the first cross entropy loss and the second cross entropy loss as a prediction-based histology consistency loss.
In an example, as shown in fig. 3, when a tumor cell line coding module is pretrained based on comparative genomic consistency from a gene expression profile and a gene mutation profile, the gene expression profile and the gene mutation profile are input to the tumor cell line coding module to extract a gene expression characterization and a gene mutation characterization, respectively, a comparative loss is calculated based on the gene expression characterization and the gene mutation characterization, and the tumor cell line coding module is pretrained with minimized comparative loss as comparative-based genomic consistency loss.
In an embodiment, as shown in fig. 4, when the tumor cell line coding module is pre-trained according to the gene expression diagram and the gene mutation diagram based on the generated group consistency, a generated training system is constructed, wherein the generated training system comprises a tumor cell line coding module, a first variation self-encoder and a second variation self-encoder which are connected to the output end of the tumor cell line coding module;
pretraining a tumor cell line coding module with a generative training system, comprising:
inputting a gene expression diagram into a generating training system, extracting gene expression characterization from the gene expression diagram through a tumor cell line coding module, and predicting gene mutation data from the gene expression characterization through coding and decoding of a first variation self-coder;
inputting the gene mutation map into a generating training system, extracting gene mutation characterization from the gene expression map through a tumor cell line coding module, and predicting gene expression data through coding and decoding of a second variation self-coder by the gene mutation characterization;
calculating a first mean square error loss according to the predicted gene mutation data and the gene mutation information serving as a supervision tag, constructing a second mean square error loss according to the predicted gene expression data and the gene expression quantity serving as the supervision tag, and pre-training a tumor cell line coding module by taking weighted summation of the first mean square error loss and the second mean square error loss as a generation-based histology consistency loss.
In an embodiment, pre-training of the tumor cell line coding module is accomplished through the histology consistency between the histology information of the tumor cell line, so that the potential links of different histology data of the cell line are fully mined, and the tumor cell line coding module has the capability of extracting the cell line characterization more accurately. The three pre-training modes fully consider robustness and generalization of the histology data and potential hierarchical structural semantic information in the histology information of the tumor cell line, so that the accuracy of cell line representation and extraction of the tumor cell line coding module can be improved.
And 5, constructing a drug sensitivity prediction model based on the pre-trained tumor cell line coding module.
In an embodiment, after obtaining the pre-trained tumor cell line coding module, a drug susceptibility prediction model is constructed according to the pre-trained tumor cell line coding module, as shown in fig. 5, where the constructed drug susceptibility prediction model includes the pre-trained tumor cell line coding module, the drug small molecule coding module, and the drug susceptibility prediction module, the pre-trained tumor cell line coding module is used for extracting a cell line representation of a gene map, the drug small molecule coding module is used for extracting a drug representation of the drug map, and the drug susceptibility prediction module is used for calculating a susceptibility prediction result after the drug acts on the tumor cell line according to the cell line representation and the drug representation. Wherein the sensitivity prediction result comprises a sensitivity prediction result or a insensitivity prediction result.
In an embodiment, the drug small molecule coding module adopts a graph attention network, and the drug sensitivity prediction module adopts a full connection layer.
And 6, carrying out parameter optimization on the drug sensitivity prediction model according to the gene diagram, the drug diagram and the label data.
In the embodiment, when the parameter optimization is performed on the drug sensitivity prediction model, a gene diagram and a drug diagram are taken as input, a cell line representation of the gene diagram is extracted by using a pre-trained tumor cell line coding module, a drug representation of the drug diagram is extracted by using a drug small molecule coding module, the cell line representation and the drug representation are spliced and then input into the drug sensitivity prediction module, a sensitivity prediction result of a drug small molecule on the tumor cell line is calculated and output, and cross entropy of the sensitivity prediction result and tag data is taken as a total loss function to optimize parameters of the pre-trained tumor cell line coding module, the drug small molecule coding module and the drug sensitivity prediction module.
And 7, predicting the drug sensitivity by using a parameter-optimized drug sensitivity prediction model.
In the embodiment, when the drug sensitivity prediction model with optimized parameters is used for carrying out the drug sensitivity prediction, firstly, the histology information of a tumor cell line is converted into a gene diagram, the small molecular data of a drug is converted into a drug diagram, the input gene diagram is subjected to feature extraction by using a tumor cell line coding module to obtain cell line characterization, and the drug diagram is subjected to feature extraction by using the small molecular coding module to obtain drug characterization; and the drug sensitivity prediction module calculates and outputs a sensitivity prediction result of the drug after acting on the tumor cell line according to the cell line characterization and the splicing result of the drug characterization, so as to realize drug sensitivity prediction.
Based on the same inventive concept, the embodiment also provides a drug sensitivity prediction device based on the histology consistency pre-training, which comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to realize the drug sensitivity prediction method based on the histology consistency pre-training, and the method comprises the following steps:
step 1, obtaining histology information of tumor cell lines, proteomics data and constructing a gene map, a gene expression map and a gene mutation map.
And 2, acquiring small molecular data of the medicine and constructing a medicine graph.
And step 3, acquiring sensitivity data of the tumor cell line to the drug as tag data.
And 4, pretraining the coding module of the tumor cell line based on the histology consistency according to the gene expression diagram and the gene mutation diagram.
And 5, constructing a drug sensitivity prediction model based on the pre-trained tumor cell line coding module.
And 6, carrying out parameter optimization on the drug sensitivity prediction model according to the gene diagram, the drug diagram and the label data.
And 7, predicting the drug sensitivity by using a parameter-optimized drug sensitivity prediction model.
In specific application, the memory may be a volatile memory at the near end, such as a RAM, or a nonvolatile memory, such as a ROM, a FLASH, a floppy disk, a mechanical hard disk, or a remote storage cloud. The processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA), i.e. the steps of the drug susceptibility prediction method based on the pretraining of the histologic consistency may be implemented by these processors.
According to the method and the device provided by the embodiment, the medicine sensitivity prediction model based on the histology consistency pre-training is utilized, the histology information of the cell line is extracted efficiently, and meanwhile, the correlation among the histology is utilized to enable the tumor cell line coding module to contain richer biological information, so that the more accurate medicine sensitivity prediction model is trained, and the prediction accuracy of the medicine sensitivity prediction model is improved.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (9)

1. A method for predicting drug sensitivity based on a histologic consistency pretraining, comprising the steps of:
obtaining small molecular data of a drug, constructing a drug graph, obtaining histology information and proteomics data of a tumor cell line including gene expression quantity and gene mutation information, constructing a gene graph, a gene expression graph and a gene mutation graph, and obtaining sensitivity data of the tumor cell line to the drug as tag data;
constructing a tumor cell line coding module, and pre-training the tumor cell line coding module based on the histology consistency according to the gene expression diagram and the gene mutation diagram, wherein the histology consistency is any one or at least two of the histology consistency based on a predictive formula, the histology consistency based on a contrast formula and the histology consistency based on a generating formula;
constructing a drug sensitivity prediction model, wherein the drug sensitivity prediction model comprises a pre-trained tumor cell line coding module, a drug small molecule coding module and a drug sensitivity prediction module, the pre-trained tumor cell line coding module is used for extracting cell line representation of a gene map, the drug small molecule coding module is used for extracting drug representation of the drug map, and the drug sensitivity prediction module is used for calculating a sensitivity prediction result of a drug after acting on a tumor cell line according to the cell line representation and the drug representation;
and taking the gene diagram and the drug diagram as input, carrying out parameter optimization on the drug sensitivity prediction model under the supervision of the label data, and carrying out drug sensitivity prediction by using the drug sensitivity prediction model after parameter optimization.
2. The method for predicting drug sensitivity based on the predictive model consistency pre-training according to claim 1, wherein when the tumor cell line coding module is pre-trained based on the predictive model consistency according to the gene expression diagram, a predictive model training system is constructed, and the predictive model training system comprises a tumor cell line coding module, a first mapping head and a first regularization operation, and a second mapping head and a second regularization operation which are connected to the output end of the tumor cell line coding module;
pretraining a tumor cell line encoding module with a predictive training system, comprising:
acquiring inherent characteristics related to a tumor cell line and taking the inherent characteristics as a first supervision tag and taking gene mutation information as a second supervision tag, wherein the inherent characteristics comprise cancer type, tissue source, tissue type, sex or age;
inputting a gene expression diagram into a predictive training system, extracting a gene expression representation by a tumor cell line coding module, mapping and transforming the gene expression representation by a first mapping head, then predicting inherent characteristics by a first regularization treatment operation, and simultaneously, mapping and transforming the gene expression representation by a second mapping head and then predicting gene mutation information by a second regularization treatment operation;
calculating a first cross entropy loss according to the predicted inherent characteristics and the first supervision tag, constructing a second cross entropy loss according to the predicted gene mutation information and the second supervision tag, and pre-training a tumor cell line coding module by taking weighted summation of the first cross entropy loss and the second cross entropy loss as a prediction-based histology consistency loss.
3. The method of claim 1, wherein when the tumor cell line coding module is pretrained based on comparative genomic consistency based on the gene expression profile and the gene mutation profile, the gene expression profile and the gene mutation profile are input to the tumor cell line coding module to extract the gene expression characterization and the gene mutation characterization, respectively, a contrast loss is calculated based on the gene expression characterization and the gene mutation characterization, and the tumor cell line coding module is pretrained with the minimized contrast loss as the comparative-based genomic consistency loss.
4. The method for predicting drug sensitivity based on the histologic consistency pre-training of claim 1, wherein when the tumor cell line coding module is pre-trained based on the histologic consistency of the generation formula according to the gene expression diagram and the gene mutation diagram, a generation formula training system is constructed, and the generation formula training system comprises the tumor cell line coding module, a first variation self-encoder and a second variation self-encoder which are connected to the output end of the tumor cell line coding module;
pretraining a tumor cell line coding module with a generative training system, comprising:
inputting a gene expression diagram into a generating training system, extracting gene expression characterization from the gene expression diagram through a tumor cell line coding module, and predicting gene mutation data from the gene expression characterization through coding and decoding of a first variation self-coder;
inputting the gene mutation map into a generating training system, extracting gene mutation characterization from the gene expression map through a tumor cell line coding module, and predicting gene expression data through coding and decoding of a second variation self-coder by the gene mutation characterization;
calculating a first mean square error loss according to the predicted gene mutation data and the gene mutation information serving as a supervision tag, constructing a second mean square error loss according to the predicted gene expression data and the gene expression quantity serving as the supervision tag, and pre-training a tumor cell line coding module by taking weighted summation of the first mean square error loss and the second mean square error loss as a generation-based histology consistency loss.
5. The method for predicting drug sensitivity based on the pretraining of the genomic consistency according to claim 1, wherein constructing the gene map, the gene expression map and the gene mutation map based on the tumor cell line histology information including the gene expression amount and the gene mutation information, the proteomics data comprises:
the method comprises the steps of taking genes as nodes of a gene diagram, a gene expression diagram and a gene mutation diagram, determining protein-protein interaction encoded by the genes according to proteomics data, determining connection relations among the genes according to the protein-protein interaction, and constructing connecting edges among the nodes according to the connection relations;
the method comprises the steps of taking tumor cell line histology information as node characteristics, taking gene expression quantity as node characteristics for a gene expression diagram, and taking gene mutation information as node characteristics for a gene mutation diagram.
6. The method for predicting drug sensitivity based on histologic consistency pre-training of claim 1, wherein the tumor cell line coding module employs a graph attention network and the drug small molecule coding module employs a graph attention network.
7. The method for predicting drug susceptibility based on histology consistency pre-training of claim 1, wherein the parameters of the pre-trained tumor cell line coding module, the drug small molecule coding module and the drug susceptibility prediction module are optimized by using cross entropy of the drug susceptibility prediction result output by the drug susceptibility prediction model and the tag data as a total loss function when the drug susceptibility prediction model is optimized under the supervision of the tag data with the gene map and the drug map as inputs.
8. The method for predicting drug susceptibility based on histologic consistency pre-training of claim 1, wherein obtaining drug small molecule data and constructing a drug map comprises: and constructing a drug graph by taking atoms of drug small molecules as nodes and chemical bonds among the atoms as connecting edges.
9. A drug susceptibility prediction apparatus based on a histologic consistency pre-training, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the drug susceptibility prediction method steps based on a histologic consistency pre-training of any of claims 1-8.
CN202211422775.9A 2022-11-15 2022-11-15 Method and device for predicting drug sensitivity based on histology consistency pretraining Active CN116110509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211422775.9A CN116110509B (en) 2022-11-15 2022-11-15 Method and device for predicting drug sensitivity based on histology consistency pretraining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211422775.9A CN116110509B (en) 2022-11-15 2022-11-15 Method and device for predicting drug sensitivity based on histology consistency pretraining

Publications (2)

Publication Number Publication Date
CN116110509A true CN116110509A (en) 2023-05-12
CN116110509B CN116110509B (en) 2023-08-04

Family

ID=86266388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211422775.9A Active CN116110509B (en) 2022-11-15 2022-11-15 Method and device for predicting drug sensitivity based on histology consistency pretraining

Country Status (1)

Country Link
CN (1) CN116110509B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705194A (en) * 2023-06-06 2023-09-05 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN117079716A (en) * 2023-09-13 2023-11-17 江苏运动健康研究院 Deep learning prediction method of tumor drug administration scheme based on gene detection

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015051192A1 (en) * 2013-10-03 2015-04-09 The Board Of Trustees Of The University Of Illinois System and method of predicting personal therapeutic response
CN105005693A (en) * 2015-07-08 2015-10-28 中国科学院合肥物质科学研究院 Genetic material specificity based tumor cell drug sensitivity evaluation method
CN107609326A (en) * 2017-07-26 2018-01-19 同济大学 Drug sensitivity prediction method in the accurate medical treatment of cancer
US20180190381A1 (en) * 2015-06-15 2018-07-05 Nantomics, Llc Systems And Methods For Patient-Specific Prediction Of Drug Responses From Cell Line Genomics
CN109952611A (en) * 2016-08-03 2019-06-28 南托米克斯有限责任公司 Dasatinib response prediction model and its method
CN112599218A (en) * 2020-12-16 2021-04-02 北京深度制耀科技有限公司 Training method and prediction method of drug sensitivity prediction model and related device
CN112768089A (en) * 2021-04-09 2021-05-07 至本医疗科技(上海)有限公司 Method, apparatus and storage medium for predicting drug sensitivity status
CN112863696A (en) * 2021-04-25 2021-05-28 浙江大学 Drug sensitivity prediction method and device based on transfer learning and graph neural network
CN113178234A (en) * 2021-02-23 2021-07-27 北京亿药科技有限公司 Compound function prediction method based on neural network and connection graph algorithm
CN113782089A (en) * 2021-11-15 2021-12-10 浙江大学 Drug sensitivity prediction method and device based on multigroup chemical data fusion
CN114121150A (en) * 2020-08-27 2022-03-01 中国科学院分子细胞科学卓越创新中心 Cancer drug sensitivity prediction method, system, storage medium and terminal
CN114174527A (en) * 2019-05-31 2022-03-11 I·A·路易斯 Metabonomics characterization of microorganisms
CN114255886A (en) * 2022-02-28 2022-03-29 浙江大学 Multi-group similarity guide-based drug sensitivity prediction method and device
CN114373550A (en) * 2022-03-21 2022-04-19 普瑞基准科技(北京)有限公司 Medicine IC50 deep learning model prediction method based on molecular structure and gene expression
CN114450750A (en) * 2019-05-17 2022-05-06 英科智能有限公司 Deep proteomic markers of human biological aging and method for determining biological aging clock
WO2022111385A1 (en) * 2020-11-30 2022-06-02 腾讯科技(深圳)有限公司 Graph neural network-based clinical omics data processing method and apparatus, device, and medium
CN114694770A (en) * 2020-12-30 2022-07-01 中国人民解放军军事科学院军事医学研究院 Method for constructing drug hepatotoxicity prediction model and application thereof
CN114842983A (en) * 2022-06-08 2022-08-02 浙江大学温州研究院 Anti-cancer drug response prediction method and device based on tumor cell line self-supervision learning
CN114999630A (en) * 2022-06-07 2022-09-02 浙江大学 Liver transplantation recipient prognosis prediction device based on multi-source data fusion
CN115116624A (en) * 2022-06-29 2022-09-27 广西大学 Drug sensitivity prediction method and device based on semi-supervised transfer learning
CN115171779A (en) * 2022-07-13 2022-10-11 浙江大学 Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion
CN115274136A (en) * 2022-08-26 2022-11-01 上海交通大学 Tumor cell line drug response prediction method integrating multiomic and essential genes

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015051192A1 (en) * 2013-10-03 2015-04-09 The Board Of Trustees Of The University Of Illinois System and method of predicting personal therapeutic response
US20180190381A1 (en) * 2015-06-15 2018-07-05 Nantomics, Llc Systems And Methods For Patient-Specific Prediction Of Drug Responses From Cell Line Genomics
CN105005693A (en) * 2015-07-08 2015-10-28 中国科学院合肥物质科学研究院 Genetic material specificity based tumor cell drug sensitivity evaluation method
CN109952611A (en) * 2016-08-03 2019-06-28 南托米克斯有限责任公司 Dasatinib response prediction model and its method
CN107609326A (en) * 2017-07-26 2018-01-19 同济大学 Drug sensitivity prediction method in the accurate medical treatment of cancer
CN114450750A (en) * 2019-05-17 2022-05-06 英科智能有限公司 Deep proteomic markers of human biological aging and method for determining biological aging clock
CN114174527A (en) * 2019-05-31 2022-03-11 I·A·路易斯 Metabonomics characterization of microorganisms
CN114121150A (en) * 2020-08-27 2022-03-01 中国科学院分子细胞科学卓越创新中心 Cancer drug sensitivity prediction method, system, storage medium and terminal
WO2022111385A1 (en) * 2020-11-30 2022-06-02 腾讯科技(深圳)有限公司 Graph neural network-based clinical omics data processing method and apparatus, device, and medium
CN112599218A (en) * 2020-12-16 2021-04-02 北京深度制耀科技有限公司 Training method and prediction method of drug sensitivity prediction model and related device
CN114694770A (en) * 2020-12-30 2022-07-01 中国人民解放军军事科学院军事医学研究院 Method for constructing drug hepatotoxicity prediction model and application thereof
CN113178234A (en) * 2021-02-23 2021-07-27 北京亿药科技有限公司 Compound function prediction method based on neural network and connection graph algorithm
CN112768089A (en) * 2021-04-09 2021-05-07 至本医疗科技(上海)有限公司 Method, apparatus and storage medium for predicting drug sensitivity status
CN112863696A (en) * 2021-04-25 2021-05-28 浙江大学 Drug sensitivity prediction method and device based on transfer learning and graph neural network
CN113782089A (en) * 2021-11-15 2021-12-10 浙江大学 Drug sensitivity prediction method and device based on multigroup chemical data fusion
CN114255886A (en) * 2022-02-28 2022-03-29 浙江大学 Multi-group similarity guide-based drug sensitivity prediction method and device
CN114373550A (en) * 2022-03-21 2022-04-19 普瑞基准科技(北京)有限公司 Medicine IC50 deep learning model prediction method based on molecular structure and gene expression
CN114999630A (en) * 2022-06-07 2022-09-02 浙江大学 Liver transplantation recipient prognosis prediction device based on multi-source data fusion
CN114842983A (en) * 2022-06-08 2022-08-02 浙江大学温州研究院 Anti-cancer drug response prediction method and device based on tumor cell line self-supervision learning
CN115116624A (en) * 2022-06-29 2022-09-27 广西大学 Drug sensitivity prediction method and device based on semi-supervised transfer learning
CN115171779A (en) * 2022-07-13 2022-10-11 浙江大学 Cancer driver gene prediction device based on graph attention network and multigroup chemical fusion
CN115274136A (en) * 2022-08-26 2022-11-01 上海交通大学 Tumor cell line drug response prediction method integrating multiomic and essential genes

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANING XI等: "Cancer omic data based explainable AI drug recommendation inference: A traceability perspective for explainability", 《BIOMEDICAL SIGNAL PROCESSING AND CONTROL》, pages 1 - 9 *
赵倩: "临床参数联合影像组学预测非小细胞肺癌免疫治疗疗效的研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, vol. 2021, no. 12, pages 072 - 105 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705194A (en) * 2023-06-06 2023-09-05 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN116705194B (en) * 2023-06-06 2024-06-04 之江实验室 Method and device for predicting drug cancer suppression sensitivity based on graph neural network
CN117079716A (en) * 2023-09-13 2023-11-17 江苏运动健康研究院 Deep learning prediction method of tumor drug administration scheme based on gene detection
CN117079716B (en) * 2023-09-13 2024-04-05 江苏运动健康研究院 Deep learning prediction method of tumor drug administration scheme based on gene detection

Also Published As

Publication number Publication date
CN116110509B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN116110509B (en) Method and device for predicting drug sensitivity based on histology consistency pretraining
CN112863696B (en) Drug sensitivity prediction method and device based on transfer learning and graph neural network
RU2703679C2 (en) Method and system for supporting medical decision making using mathematical models of presenting patients
US11651860B2 (en) Drug efficacy prediction for treatment of genetic disease
Lee et al. Big healthcare data analytics: Challenges and applications
US11075008B2 (en) Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity
Ruan et al. Representation learning for clinical time series prediction tasks in electronic health records
CN117744654A (en) Semantic classification method and system for numerical data in natural language context based on machine learning
CN113012770B (en) Multi-modal deep neural network based prediction of drug-drug interaction events
Huang et al. Drug–drug similarity measure and its applications
CN114255886B (en) Multi-group similarity guide-based drug sensitivity prediction method and device
US20200365238A1 (en) Drug compound identification for target tissue cells
CN116580849B (en) Medical data acquisition and analysis system and method thereof
CN116364299B (en) Disease diagnosis and treatment path clustering method and system based on heterogeneous information network
US20220188654A1 (en) System and method for clinical trial analysis and predictions using machine learning and edge computing
Farrell et al. Interpretable machine learning for high-dimensional trajectories of aging health
CN117422704A (en) Cancer prediction method, system and equipment based on multi-mode data
Yan et al. Predictive intelligence powered attentional stacking matrix factorization algorithm for the computational drug repositioning
Leng et al. Bi-level artificial intelligence model for risk classification of acute respiratory diseases based on Chinese clinical data
Alghushairy et al. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting
Carvalho et al. Integrating domain knowledge into deep learning for skin lesion risk prioritization to assist teledermatology referral
Kumar et al. Deep-learning-enabled multimodal data fusion for lung disease classification
Agoston Big data, artificial intelligence, and machine learning in neurotrauma
Xu et al. Gene mutation classification using CNN and BiGRU network
KR102519848B1 (en) Device and method for predicting biomedical association

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant