CN115472216B - Data integration-based cross-adaptive tumor drug combination recommendation method and system - Google Patents

Data integration-based cross-adaptive tumor drug combination recommendation method and system Download PDF

Info

Publication number
CN115472216B
CN115472216B CN202211417378.2A CN202211417378A CN115472216B CN 115472216 B CN115472216 B CN 115472216B CN 202211417378 A CN202211417378 A CN 202211417378A CN 115472216 B CN115472216 B CN 115472216B
Authority
CN
China
Prior art keywords
drug
list
tumor
protein
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211417378.2A
Other languages
Chinese (zh)
Other versions
CN115472216A (en
Inventor
赵再戌
安琪儿
闵浩巍
郭栋梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Health China Technologies Co Ltd
Original Assignee
Digital Health China Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Digital Health China Technologies Co Ltd filed Critical Digital Health China Technologies Co Ltd
Priority to CN202211417378.2A priority Critical patent/CN115472216B/en
Publication of CN115472216A publication Critical patent/CN115472216A/en
Application granted granted Critical
Publication of CN115472216B publication Critical patent/CN115472216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Toxicology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method and a system for recommending cross-adaptive tumor drug combination based on data integration, which relate to the field of bioinformatics. The tumor cross-adaptive drug combination method and system based on data integration, disclosed by the application, can be used for identifying potential multi-target therapy of tumors and realizing the recommendation of cross-adaptive drug combination by integrating various data, developing a novel bioinformatics process, applying a network-based model to a biogenesis analysis process.

Description

Data integration-based cross-adaptive tumor drug combination recommendation method and system
Technical Field
The application relates to the field of bioinformatics, in particular to a method and a system for recommending cross-adaptive tumor drug combination based on data integration.
Background
The traditional medicine is only developed aiming at a single compound with a single target, complex diseases, similar to tumors, correspond to complex biological processes, and the one-to-one medicine-target action mode cannot achieve obvious curative effect in the treatment of the tumors. Therefore, multi-target analysis is needed for complex diseases such as tumors, and the treatment process is accelerated by the cross-indication combined medication.
Integrating data and knowledge from multiple sources is a key factor in drug design, drug reuse, and success of multi-target therapies. The biological network fused by the multigroup chemical data can highlight the relation behind the tumor treatment effect and simulate the tumor treatment phenomenon.
Disclosure of Invention
Object of the application
In this regard, the present invention aims to identify potential multi-target therapies for tumors and to achieve recommendations for their cross-indication drug combinations by integrating a variety of data and developing novel bioinformatics procedures, applying network-based models to the physiological analysis procedures. Based on the above purpose, the present application discloses the following technical solutions.
(II) technical scheme
The application discloses a method for recommending cross-adaptive tumor drug combination based on data integration, which is characterized by comprising the following steps:
obtaining a target protein list of the tumor;
generating a candidate drug list based on the target protein list;
constructing and optimizing a drug response prediction model;
recommending the optimal drug combination in the candidate drug list based on the drug response prediction model.
In a possible embodiment, the obtaining of the list of tumor-targeting proteins includes:
constructing a specific PPI network of the tumor, and screening a plurality of potential target proteins;
and scoring the target combinations of the potential target proteins to generate a target protein list.
In one possible embodiment, the above method for constructing a tumor specific PPI network and screening several potential targeting proteins comprises:
screening tumor-related genes to generate a related gene list;
obtaining related coding protein of each gene in the related gene list;
obtaining protein interacted with the related coding protein, and constructing a specific PPI network of the tumor;
and selecting part of bridging protein nodes of the specific PPI network of the tumor, and filtering the bridging protein nodes to obtain a plurality of potential target proteins.
In one possible embodiment, the tumor-associated genes are genes that are significantly mutated and that are significantly differentially expressed in tumors.
In one possible embodiment, the protein that interacts with the related encoded protein is selected from the STRING database.
In a possible embodiment, the selecting a part of bridge protein nodes of the specific PPI network of the tumor, and filtering the bridge protein nodes to obtain several potential target proteins includes:
sequencing node values of the tumor-specific PPI network, and selecting the first 20% of bridging protein nodes;
the STICH database and the Drugbank database are used for filtering the first 20% of the bridging protein nodes to obtain a plurality of potential target proteins.
In a possible embodiment, the scoring the target combination of potential target proteins to generate a list of target proteins includes:
calculating the TSDS score and spatial distribution of the target combination of the potential target proteins;
and selecting potential target proteins with scores larger than zero based on the TSDS scores and the spatial distribution to generate a target protein list.
In a possible embodiment, the generating a list of candidate drugs based on the list of target proteins includes:
extracting the medicines which interact with the target proteins in the target protein list to generate a preliminary candidate medicine list;
drugs applied to other diseases are added to the preliminary candidate drug list to generate a candidate drug list.
In one possible embodiment, the above-described drug interacting with the targeted proteins in the above-described list of targeted proteins is extracted from the drug bank database and CTD repository.
In one possible embodiment, the adding of the drugs applied to other diseases to the preliminary drug candidate list is performed based on a matrix three-factor algorithm.
In one possible embodiment, the matrix three factors include disease-genes, drug-targets, and protein-proteins.
In one possible embodiment, the data sources to which the matrix three-factor algorithm applies include DisaseOntology, drugBank, STRING, geneRIF, and TTD.
In a possible embodiment, the constructing and optimizing a drug response prediction model includes:
extracting relevant pathways of the tumor, converting the regulation reaction of the relevant pathways into a logic formula, and constructing a drug response prediction model;
integrating the expression data of the gene transcriptome to optimize the drug response prediction model, and initializing the values of the DP node and the TP node of the drug response prediction model.
In one possible embodiment, the pathways associated with the tumor are extracted from the KEGG database.
In one possible embodiment, the expression data of the integrated gene transcriptome optimizes the drug response prediction model by:
extracting differential expression genes from a disease database to generate a differential expression gene list;
determining the up-regulation or down-regulation of the gene based on the FC value of the differentially expressed gene, and optimizing the drug response prediction model.
In one possible embodiment, the differentially expressed gene is a gene having an FC value greater than 2.
In one possible embodiment, the differentially expressed genes are discarded when the number of times they are up-regulated and down-regulated in a plurality of experiments is equal.
In a possible embodiment, obtaining the optimal drug combination in the drug candidate list based on the drug response prediction model includes:
calculating values of other nodes of the drug response prediction model, and converting the drug response prediction model into ODE;
analyzing the drug-drug combination effect in the candidate drug list based on the ODE to generate a drug combination list;
defining the EFFECT index to score the drug combinations in the drug combination list and recommending the best drug combination.
In one possible embodiment, the other nodes are nodes other than the DP node and the TP node in the drug response prediction model.
As a second aspect of the present application, the present application also discloses a tumor cross-indication drug efficacy assessment system based on data integration, comprising:
the target protein list acquisition module is used for acquiring a target protein list of the tumor;
a candidate drug list generation module for generating a candidate drug list based on the target protein list;
the prediction model construction optimization module is used for constructing and optimizing a drug response prediction model;
and the optimal combination recommending module is used for acquiring the optimal drug combination in the candidate drug list based on the drug response predicting model.
In a possible embodiment, the target protein list obtaining module includes:
the PPI network construction submodule is used for constructing a specific PPI network of the tumor and screening a plurality of potential target proteins;
and the target protein list acquisition submodule is used for scoring the target combination of the potential target proteins to generate a target protein list.
In a possible implementation manner, the PPI network construction sub-module includes:
a related gene list generating unit for screening tumor related genes to generate a related gene list;
a coding protein obtaining unit for obtaining a coding protein related to each gene in the related gene list;
the PPI network construction unit is used for obtaining the protein interacted with the related coding protein and constructing a specific PPI network of the tumor;
and the potential target protein acquisition unit is used for selecting part of bridging protein nodes of the specific PPI network of the tumor, and filtering the bridging protein nodes to acquire a plurality of potential target proteins.
In one possible embodiment, the tumor-associated gene is a gene that is significantly mutated and significantly differentially expressed in the tumor.
In one possible embodiment, the protein that interacts with the related encoded protein is selected from the STRING database.
In one possible embodiment, the potentially targeted protein retrieval unit comprises:
a sorting selector unit, configured to perform node value sorting on the tumor-specific PPI network, and select the top 20% of bridge protein nodes;
and the filtering subunit is used for filtering the first 20% of the bridging protein nodes by using the STICH database and the drug bank database to obtain a plurality of potential target proteins.
In a possible embodiment, the above-mentioned target protein list acquisition submodule comprises:
a score calculating unit for calculating the TSDS score and spatial distribution of the target combination of the potential target proteins;
and the target protein list acquisition unit is used for selecting potential target proteins with scores larger than zero based on the TSDS scores and the spatial distribution to generate a target protein list.
In a possible embodiment, the candidate drug list generating module includes:
the primary candidate drug list submodule is used for extracting drugs which interact with the target proteins in the target protein list to generate a primary candidate drug list;
and the candidate drug list generation submodule is used for adding drugs applied to other diseases to the preliminary candidate drug list to generate a candidate drug list.
In one possible embodiment, the above-described drug interacting with the targeted proteins in the above-described list of targeted proteins is extracted from the drug bank database and CTD repository.
In one possible embodiment, the adding of the drugs applied to other diseases to the preliminary drug candidate list is performed based on a matrix three-factor algorithm.
In one possible embodiment, the matrix of three factors includes disease-gene, drug-target and protein-protein.
In one possible embodiment, the data sources applied by the matrix three-factor algorithm include discaseOntology, drug Bank, STRING, geneRIF, and TTD.
In a possible implementation manner, the prediction model building optimization module includes:
the model construction submodule is used for extracting relevant pathways of tumors, converting the regulation reaction of the relevant pathways into a logic formula and constructing a drug response prediction model;
and the model optimization submodule is used for integrating the expression data of the gene transcriptome to optimize the drug response prediction model and initializing and setting the values of the DP node and the TP node of the drug response prediction model.
In one possible embodiment, the pathways associated with the tumor are extracted from the KEGG database.
In a possible implementation, the model optimization submodule includes:
a difference list generating unit for extracting the difference expression genes from the disease database to generate a difference expression gene list;
a model optimization unit for determining gene up-regulation or down-regulation based on the FC value of the differentially expressed gene, and optimizing the drug response prediction model;
and the initialization setting unit is used for carrying out initialization setting on the values of the DP node and the TP node of the drug response prediction model.
In one possible embodiment, the differentially expressed gene is a gene having an FC value greater than 2.
In one possible embodiment, the method further comprises: and a gene discarding unit for discarding the differentially expressed gene when the number of times of up-and down-regulation of the differentially expressed gene is equal in a plurality of experiments.
In a possible implementation manner, the optimal combination obtaining module includes:
the conversion submodule is used for calculating values of other nodes of the drug response prediction model and converting the drug response prediction model into ODE;
a combination list generation submodule for analyzing the effect of the drug-drug combination in the candidate drug list based on the ODE to generate a drug combination list;
and the optimal combination recommending submodule is used for defining an EFFECT index to score the medicine combination in the medicine combination list and recommending the optimal medicine combination.
In one possible embodiment, the other nodes are nodes other than the DP node and the TP node in the drug response prediction model.
(III) advantageous effects
The tumor cross-adaptive drug combination method and system based on data integration, disclosed by the application, can be used for identifying potential multi-target therapy of tumors and realizing the recommendation of cross-adaptive drug combination by integrating various data, developing a novel bioinformatics process, applying a network-based model to a biogenesis analysis process.
Drawings
The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present application and should not be construed as limiting the scope of the present application.
Fig. 1 is a schematic flow chart of a data integration-based recommendation method for cross-adaptive tumor drug combination disclosed in the present application.
Fig. 2 is a block diagram of a data integration-based tumor cross-indication drug combination recommendation system disclosed in the present application.
Detailed Description
In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application.
Embodiments of the data integration-based tumor cross-indication combination drug method disclosed in the present application are described in detail below with reference to fig. 1. As shown in fig. 1, the method disclosed in this embodiment mainly includes the following steps S100 to S400.
S100, obtaining a target protein list of the tumor; wherein S100 includes S110-S120.
S110, constructing a specific PPI network of the tumor, and screening a plurality of potential target proteins; wherein S110 includes S111-S114.
S111, screening tumor-related genes to generate a related gene list; specifically, for a given tumor type/subtype, a plurality of related genes are provided, and based on the existing knowledge including the existing database or published articles, the tumor-related genes are screened, namely, the genes which are significantly mutated and significantly differentially expressed in the tumor, and a related gene list is generated by the screened tumor-related genes.
S112, obtaining related coding proteins of each gene in the related gene list; specifically, the encoded proteins of several related genes in the related gene list are found out and used as related encoded proteins.
S113, obtaining protein interacted with the related coding protein, and constructing a specific PPI network of the tumor; the STRING database is a database for analyzing the interaction between proteins, and the database contains a plurality of species, a plurality of proteins and a plurality of protein-protein interaction information, wherein the protein interaction information includes direct physical action and indirect functional correlation.
The PPI network is a Protein-Protein Interaction network (Protein-Protein Interaction Networks) which is formed by the Interaction of individual proteins and participates in various links of life processes such as biological signal transmission, gene expression regulation, energy and substance metabolism, cell cycle regulation and the like. The system analyzes the interaction relation of a large number of proteins in a biological system, and has important significance for understanding the working principle of the proteins in the biological system, the reaction mechanism of biological signals and energy substance metabolism under special physiological states such as diseases and the like and the functional relation among the proteins.
Specifically, after obtaining the relevant encoded protein, other relevant proteins interacting with the relevant encoded protein are screened based on the STRING database. And constructing the PPI network by taking the encoded protein and the obtained other related proteins as nodes and taking the interaction relationship between the proteins in the STRING database as an edge.
S114, selecting part of bridging protein nodes of the specific PPI network of the tumor, and filtering the bridging protein nodes to obtain a plurality of potential target proteins; specifically, S114 includes S1141-S1142.
S1141, carrying out node value sequencing on the tumor-specific PPI network, and selecting the first 20% of bridging protein nodes; the node degree refers to the number of edges associated with the node in the graph structure, and is also called association degree, and for the directed graph, the entry degree of the node refers to the number of edges entering the node; out-degrees of a node refer to the number of edges triggered from the node.
Specifically, a plurality of nodes (proteins) in the PPI network are sorted according to the node value, that is, sorted based on the number of edges related to the nodes. The first 20% of the bridging protein nodes were selected.
S1142, filtering the first 20% of the bridging protein nodes by utilizing an STICH database and a drug bank database to obtain a plurality of potential target proteins; wherein, the STICH database is a gene and chemical substance interaction prediction database used for predicting interaction relation between chemical substances and genes.
The drug bank database is a bioinformatics and chemistry informatics database provided by the university of alberta, is a unique bioinformatics and chemistry informatics resource, and combines detailed drug data with comprehensive drug target information.
And then, filtering the first 20% of the bridging protein nodes by using a STICH database and a Drugbank database, and screening the bridging protein nodes with high action coefficients as therapeutic drug proteins, namely potential target proteins.
S120, scoring the target combination of the potential target proteins to generate a target protein list; wherein S120 includes S121-S122.
S121, calculating the TSDS score and the spatial distribution of the target combination of the potential target proteins; specifically, the TSDS scores for three possible target combinations for each potential targeting protein were calculated, and the spatial distribution was calculated to obtain important TSDS proteins (p-value < 0.01), and potential targeting proteins with significant TSDS scores were obtained.
And S122, selecting potential target proteins with scores larger than zero based on the TSDS scores and the spatial distribution, and generating a target protein list.
S200, generating a candidate drug list based on the target protein list; wherein S200 includes S210-S220.
S210, extracting the medicines which interact with the target proteins in the target protein list to generate a primary candidate medicine list; among them, the drug bank database is a database of bioinformatics and chemical informatics truly and reliably combining detailed drug data with comprehensive drug target information at the university of alberta.
The CTD repository, the synthetic Toxicogenomics Database, integrates data on interactions between a large number of chemicals, genes, functional phenotypes and diseases.
Specifically, drugs that interact with the target proteins in the target protein list are extracted from the drug bank database and the CTD repository, and a preliminary drug candidate list is generated based on the extracted drugs.
S220, adding the medicines applied to other diseases into the preliminary candidate medicine list to generate a candidate medicine list; specifically, drugs applied to other diseases are added to the preliminary candidate drug list based on a matrix three-factor algorithm, and the preliminary candidate drug list is updated to generate a candidate drug list. Wherein, the matrix three factors comprise disease-gene, drug-target, protein-protein, and the data sources of the matrix three factor algorithm include DisaseOntology, drug Bank, STRING, geneRIF and TTD.
S300, constructing and optimizing a drug response prediction model; wherein S300 includes S310-S320.
S310, extracting relevant pathways of the tumor, converting the regulation reaction of the relevant pathways into a logic formula, and constructing a drug response prediction model; specifically, relevant pathways of the tumor are extracted from a KEGG database, KEGG KGML files of the relevant pathways are analyzed, the regulation reaction of the relevant pathways is converted into a logic formula, and a drug response prediction model is constructed.
S320, integrating the expression data of the gene transcriptome to optimize the drug response prediction model, and initializing and setting the values of the DP node and the TP node of the drug response prediction model; wherein S320 includes S311-S313.
S311, extracting differential expression genes from a disease database to generate a differential expression gene list; where FC is fold-change, i.e., the fold difference.
Specifically, a gene with an FC value greater than 2 is selected as a differential expression gene, namely a disease-related gene, and a plurality of the differential expression genes are extracted to generate a differential expression gene list.
S312, determining the up-regulation or the down-regulation of the gene based on the FC value of the differential expression gene, and optimizing the drug response prediction model; it should be noted that: when the number of times the differentially expressed gene is up-regulated and down-regulated in a plurality of experiments is equal, the differentially expressed gene is discarded.
And S313, initializing and setting the values of the DP node and the TP node of the drug response prediction model.
S400, recommending the optimal drug combination in the candidate drug list based on the drug response prediction model; wherein S400 includes S410-S430.
S410, calculating values of other nodes of the drug response prediction model, and converting the drug response prediction model into ODE;
s420, analyzing the drug-drug combined use effect in the candidate drug list based on the ODE to generate a drug combination list;
s430, defining an EFFECT index to score the medicine combinations in the medicine combination list, and recommending the optimal medicine combination; wherein the other nodes are nodes except the DP node and the TP node in the drug response prediction model.
Finally, a ranked list of DPs that should be detected in an in vitro experiment (list of detected genes) can be obtained by analyzing the results and considering the number of conventional disease-associated nodes (DNs) in the disease pathway. And determining the expression of the gene panel under the action of different drugs by detecting to verify the effectiveness of the potential drug candidate combination therapy obtained by the method.
An embodiment of the data integration-based tumor cross-indication combination drug recommendation system disclosed in the present application is described in detail below with reference to fig. 2. As shown in fig. 2, the system disclosed in the present embodiment includes:
a target protein list obtaining module 201, configured to obtain a target protein list of a tumor;
a candidate drug list generation module 202, configured to generate a candidate drug list based on the target protein list;
the prediction model construction optimization module 203 is used for constructing and optimizing a drug response prediction model;
an optimal combination recommending module 204, configured to recommend an optimal drug combination in the candidate drug list based on the drug response prediction model.
In one embodiment, the target protein list obtaining module 201 includes:
the PPI network construction submodule is used for constructing a specific PPI network of the tumor and screening a plurality of potential target proteins;
and the target protein list acquisition submodule is used for scoring the target combination of the potential target proteins to generate a target protein list.
In an embodiment, the PPI network construction sub-module includes:
a related gene list generating unit for screening tumor related genes to generate a related gene list;
a coding protein obtaining unit for obtaining a coding protein related to each gene in the related gene list;
the PPI network construction unit is used for obtaining the protein interacted with the related coding protein and constructing a specific PPI network of the tumor;
and the potential target protein acquisition unit is used for selecting part of bridging protein nodes of the specific PPI network of the tumor, and filtering the bridging protein nodes to acquire a plurality of potential target proteins.
In one embodiment, the tumor-associated gene is a gene that is significantly mutated and significantly differentially expressed in the tumor.
In one embodiment, the protein that interacts with the related encoded protein is selected from a STRING database.
In one embodiment, the potentially targeted protein retrieval unit comprises:
a sorting selector unit, configured to perform node value sorting on the tumor-specific PPI network, and select the top 20% of bridge protein nodes;
and the filtering subunit is used for filtering the first 20% of the bridging protein nodes by using the STICH database and the drug bank database to obtain a plurality of potential target proteins.
In one embodiment, the above list of target proteins obtaining submodule includes:
a score calculating unit for calculating the TSDS score and spatial distribution of the target combination of the potential target proteins;
and the target protein list acquisition unit is used for selecting potential target proteins with scores larger than zero based on the TSDS scores and the spatial distribution to generate a target protein list.
In one embodiment, the candidate drug list generating module 202 includes:
the primary candidate drug list submodule is used for extracting drugs which interact with the target proteins in the target protein list to generate a primary candidate drug list;
and the candidate drug list generation submodule is used for adding drugs applied to other diseases to the preliminary candidate drug list to generate a candidate drug list.
In one embodiment, the drug that interacts with the target proteins in the list of target proteins is extracted from the drug bank database and CTD repository.
In one embodiment, the adding of the drugs applied to other diseases to the preliminary drug candidate list is performed based on a matrix three-factor algorithm.
In one embodiment, the matrix of three factors includes disease-gene, drug-target and protein-protein.
In one embodiment, the data sources to which the matrix three-factor algorithm applies include discaseOntology, drugBank, STRING, geneRIF, and TTD.
In an embodiment, the above prediction model building optimization module 203 includes:
the model construction submodule is used for extracting relevant pathways of tumors, converting the regulation reaction of the relevant pathways into a logic formula and constructing a drug response prediction model;
and the model optimization submodule is used for integrating the expression data of the gene transcriptome to optimize the drug response prediction model and initializing and setting the values of the DP node and the TP node of the drug response prediction model.
In one embodiment, the pathways associated with the tumor are extracted from a KEGG database.
In an embodiment, the model optimization submodule includes:
a difference list generating unit for extracting the difference expression genes from the disease database to generate a difference expression gene list;
a model optimization unit for determining gene up-regulation or down-regulation based on the FC value of the differentially expressed gene, and optimizing the drug response prediction model;
and the initialization setting unit is used for carrying out initialization setting on the values of the DP node and the TP node of the drug response prediction model.
In one embodiment, the differentially expressed gene is a gene having an FC value greater than 2.
In one possible embodiment, the method further comprises: and a gene discarding unit for discarding the differentially expressed gene when the number of times of up-and down-regulation of the differentially expressed gene is equal in a plurality of experiments.
In an embodiment, the optimal combination obtaining module 204 includes:
the conversion submodule is used for calculating values of other nodes of the drug response prediction model and converting the drug response prediction model into ODE;
a combination list generation submodule for analyzing the effect of the drug-drug combination in the candidate drug list based on the ODE to generate a drug combination list;
and the optimal combination recommending submodule is used for defining the EFFECT index to score the medicine combination in the medicine combination list and recommending the optimal medicine combination.
In one embodiment, the other nodes are nodes other than the DP node and the TP node in the drug response prediction model.
In the description of the present application, it is to be understood that the terms "central," "longitudinal," "lateral," "front," "back," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientation or positional relationship indicated in the drawings, which are intended to be based on the orientation or positional relationship shown in the drawings, and are used merely for convenience in describing the present application and to simplify the description, but do not indicate or imply that the referenced device or element must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be taken as limiting the scope of the present application.
In this document, "first", "second", and the like are used only for distinguishing one from another, and do not indicate their degree of importance, order, and the like.
The division of modules, units or components herein is merely a logical division, and other divisions may be possible in an actual implementation, for example, a plurality of modules and/or units may be combined or integrated in another system. Modules, units, or components described as separate parts may or may not be physically separate. The components displayed as cells may or may not be physical cells, and may be located in a specific place or distributed in grid cells. Therefore, some or all of the units can be selected according to actual needs to implement the scheme of the embodiment.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the above claims.

Claims (26)

1. The method for recommending the cross-adaptive tumor drug combination based on data integration is characterized by comprising the following steps:
obtaining a target protein list of the tumor;
generating a list of candidate drugs based on the list of targeting proteins;
constructing and optimizing a drug response prediction model;
recommending an optimal drug combination in the list of drug candidates based on the drug response prediction model;
the method for obtaining the target protein list of the tumor comprises the following steps: constructing a specific PPI network of the tumor, and screening a plurality of potential target proteins; scoring the target combinations of potential targeting proteins to generate a list of targeting proteins;
the construction of a tumor specific PPI network and the screening of several potential target proteins include: screening tumor-related genes to generate a related gene list; obtaining related coding protein of each gene in the related gene list; obtaining protein interacted with the related coding protein, and constructing a specific PPI network of the tumor; selecting part of bridging protein nodes of the specific PPI network of the tumor, and filtering the bridging protein nodes to obtain a plurality of potential target proteins;
selecting a part of bridging protein nodes of a specific PPI network of the tumor, and filtering the bridging protein nodes to obtain a plurality of potential target proteins, wherein the method comprises the following steps: sequencing node values of the tumor-specific PPI network, and selecting the first 20% of bridging protein nodes; filtering the first 20% of the bridging protein nodes by using an STICH database and a drug bank database to obtain a plurality of potential target proteins;
the construction and optimization of the drug response prediction model comprises the following steps: extracting relevant pathways of tumors, converting the regulation reaction of the relevant pathways into a logic formula, and constructing a drug response prediction model; integrating the expression data of the gene transcriptome to optimize the drug response prediction model, and initializing the values of a DP node and a TP node of the drug response prediction model;
relevant pathways of the tumor are extracted from a KEGG database;
the drug response prediction model is optimized by the expression data of the integrated gene transcriptome, and specifically comprises the following steps: extracting differential expression genes from a disease database to generate a differential expression gene list; determining gene up-regulation or down-regulation based on the FC value of the differentially expressed gene, optimizing the drug response prediction model.
2. The data integration-based recommendation method for tumor cross-indication drug combinations according to claim 1, wherein the tumor-associated genes are genes that are significantly mutated and significantly differentially expressed in tumors.
3. The data-integration-based recommendation method for tumor cross-indication drug combination according to claim 1, wherein the protein interacting with the related encoded protein is selected from a STRING database.
4. The data integration-based recommendation method for tumor cross-indication drug combinations according to claim 1, wherein scoring the target combinations of potential targeting proteins to generate a list of targeting proteins comprises:
calculating the TSDS score and spatial distribution of the target combination of potential targeting proteins;
and selecting potential target proteins with scores larger than zero based on the TSDS scores and the spatial distribution, and generating a target protein list.
5. The data integration-based recommendation method for tumor cross-indication drug combination according to claim 1, wherein the generating a list of candidate drugs based on the list of targeting proteins comprises:
extracting drugs interacting with the targeted proteins in the targeted protein list to generate a primary candidate drug list;
adding drugs applied to other diseases to the preliminary drug candidate list, generating a drug candidate list.
6. The data-integration-based recommendation method for tumor cross-indication drug combination according to claim 5, wherein the drugs interacting with the targeted proteins in the list of targeted proteins are extracted from a drug bank database and a CTD repository.
7. The data integration-based recommendation method for tumor cross-indication drug combinations according to claim 5, wherein the addition of drugs applied to other diseases to the preliminary drug candidate list is based on a matrix three-factor algorithm.
8. The data integration-based recommendation method for tumor cross-indication drug combinations according to claim 7, wherein the matrix three factors comprise disease-gene, drug-target and protein-protein.
9. The data integration-based recommendation method for drug combinations across indications for a tumor according to claim 7, wherein the data sources for application of the matrix three-factor algorithm include DiseaseOntology, drugBank, STRING, geneRIF and TTD.
10. The data integration-based recommendation method for tumor cross-indication drug combination according to claim 1, wherein the differentially expressed gene is a gene with an FC value greater than 2.
11. The data integration-based recommendation method for tumor cross-indication combination medication according to claim 1, wherein the differentially expressed genes are discarded when the number of times the differentially expressed genes are up-regulated and down-regulated in multiple experiments is equal.
12. The data integration-based tumor cross-indication drug combination recommendation method of claim 1, wherein recommending an optimal drug combination in the drug candidate list based on the drug response prediction model comprises:
calculating values of other nodes of the drug response prediction model, and converting the drug response prediction model into ODE;
generating a drug combination list based on the ODE analyzing the drug-drug combination effect in the candidate drug list;
defining an EFFECT index to score the drug combinations in the drug combination list and recommending the best drug combination.
13. The data integration-based recommendation method for tumor cross-indication combination medication according to claim 12, wherein the other nodes are nodes other than DP node and TP node in the prediction model of drug response.
14. Data integration-based tumor cross-adaptive combined medication recommendation system is characterized by comprising the following steps:
the target protein list acquisition module is used for acquiring a target protein list of the tumor;
a candidate drug list generation module for generating a candidate drug list based on the list of targeted proteins;
the prediction model construction optimization module is used for constructing and optimizing a drug response prediction model;
a best combination recommendation module for recommending a best drug combination in the list of drug candidates based on the drug response prediction model;
the target protein list acquisition module comprises: the PPI network construction submodule is used for constructing a specific PPI network of the tumor and screening a plurality of potential target proteins; the target protein list acquisition submodule is used for scoring the target combination of the potential target proteins to generate a target protein list;
the PPI network construction submodule comprises: a related gene list generating unit for screening tumor related genes to generate a related gene list; a coding protein obtaining unit for obtaining a relevant coding protein of each gene in the relevant gene list; a PPI network construction unit, which is used for obtaining the protein interacted with the related coding protein and constructing a specific PPI network of the tumor; the potential target protein obtaining unit is used for selecting part of bridging protein nodes of the specific PPI network of the tumor, and filtering the bridging protein nodes to obtain a plurality of potential target proteins;
a potentially targeted protein retrieval unit comprising: the sequencing and selecting subunit is used for carrying out node value sequencing on the tumor-specific PPI network and selecting the first 20% of bridging protein nodes; a filtering subunit, configured to filter the first 20% of the bridged protein nodes by using a STICH database and a drug bank database, so as to obtain a plurality of potential target proteins;
the prediction model building and optimizing module comprises: the model construction submodule is used for extracting relevant pathways of tumors, converting the regulation reaction of the relevant pathways into a logic formula and constructing a drug response prediction model; the model optimization submodule is used for integrating the expression data of the gene transcriptome to optimize the drug response prediction model and initializing and setting the values of the DP node and the TP node of the drug response prediction model;
relevant pathways of the tumor are extracted from a KEGG database;
the model optimization submodule comprises: a difference list generating unit for extracting the difference expression genes from the disease database to generate a difference expression gene list; a model optimization unit for determining gene up-regulation or down-regulation based on the FC value of the differentially expressed gene, optimizing the drug response prediction model; and the initialization setting unit is used for carrying out initialization setting on the values of the DP node and the TP node of the drug response prediction model.
15. The data integration-based tumor cross-indication drug combination recommendation system of claim 14, wherein the tumor-associated gene is a gene that is significantly mutated and significantly differentially expressed in tumors.
16. The data-integration-based tumor cross-indication combination recommendation system of claim 14, wherein the protein interacting with the related encoded protein is screened from a STRING database.
17. The data integration-based tumor cross-indication drug combination recommendation system of claim 14, wherein the targeted protein list acquisition submodule comprises:
a score calculation unit for calculating the TSDS score and spatial distribution of the target combination of potential targeting proteins;
and the target protein list acquisition unit is used for selecting potential target proteins with scores larger than zero based on the TSDS score and the spatial distribution to generate a target protein list.
18. The data integration-based tumor cross-indication drug combination recommendation system of claim 14, wherein the drug candidate list generation module comprises:
a preliminary candidate drug list submodule for extracting drugs interacting with the target proteins in the target protein list to generate a preliminary candidate drug list;
a candidate drug list generation sub-module for adding drugs applied to other diseases to the preliminary candidate drug list, generating a candidate drug list.
19. The data integration-based tumor cross-indication drug combination recommendation system of claim 18, wherein the drug interactions with the targeted proteins in the list of targeted proteins are extracted from a drug bank database and a CTD repository.
20. The data integration-based tumor cross-indication drug combination recommendation system of claim 18, wherein the addition of drugs applied to other diseases to the preliminary drug candidate list is based on a matrix three-factor algorithm.
21. The data integration-based tumor cross-indication drug combination recommendation system of claim 20, wherein the matrix three factors comprise disease-gene, drug-target, and protein-protein.
22. The data integration-based tumor cross-indication drug combination recommendation system of claim 20, wherein the data sources for the matrix three-factor algorithm application include discaseOntology, drugBank, STRING, geneRIF, and TTD.
23. The data integration-based tumor cross-indication combination recommendation system of claim 14, wherein the differentially expressed gene is a gene having an FC value greater than 2.
24. The data integration-based tumor cross-indication combination recommendation system of claim 14, further comprising: a gene discarding unit for discarding the differentially expressed gene when the number of times the differentially expressed gene is up-regulated and down-regulated in a plurality of experiments is equal.
25. The data integration-based tumor cross-indication drug combination recommendation system of claim 14, wherein the optimal combination recommendation module comprises:
the conversion submodule is used for calculating values of other nodes of the drug response prediction model and converting the drug response prediction model into ODE;
a combination list generation sub-module for generating a drug combination list based on the ODE analyzing the drug-drug combination effect in the candidate drug list;
and the optimal combination recommending submodule is used for defining an EFFECT index to score the medicine combination in the medicine combination list and recommending the optimal medicine combination.
26. The data integration-based tumor cross-indication drug combination recommendation system of claim 25, wherein the other nodes are nodes other than DP nodes and TP nodes in the drug response prediction model.
CN202211417378.2A 2022-11-14 2022-11-14 Data integration-based cross-adaptive tumor drug combination recommendation method and system Active CN115472216B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211417378.2A CN115472216B (en) 2022-11-14 2022-11-14 Data integration-based cross-adaptive tumor drug combination recommendation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211417378.2A CN115472216B (en) 2022-11-14 2022-11-14 Data integration-based cross-adaptive tumor drug combination recommendation method and system

Publications (2)

Publication Number Publication Date
CN115472216A CN115472216A (en) 2022-12-13
CN115472216B true CN115472216B (en) 2023-03-24

Family

ID=84338227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211417378.2A Active CN115472216B (en) 2022-11-14 2022-11-14 Data integration-based cross-adaptive tumor drug combination recommendation method and system

Country Status (1)

Country Link
CN (1) CN115472216B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113373225A (en) * 2021-06-10 2021-09-10 谱天(天津)生物科技有限公司 Combined analysis method for clinical sample gene and protein high-throughput detection result
CN115064213A (en) * 2022-08-18 2022-09-16 神州医疗科技股份有限公司 Multi-group-chemistry combined analysis method and system based on tumor sample

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5668002A (en) * 1990-08-31 1997-09-16 The Wistar Institute DNA and polypeptide for tumor-associated antigen CO-029
EP3297566A4 (en) * 2015-05-22 2019-02-20 CSTS Health Care Inc. Biomarker-driven molecularly targeted combination therapies based on knowledge representation pathway analysis
EP3424524A3 (en) * 2017-07-04 2019-02-27 CureVac AG Cancer rna-vaccine
CN111640508B (en) * 2020-05-28 2023-08-01 上海市生物医药技术研究院 Method and application of pan-tumor targeted drug sensitivity state assessment model constructed based on high-throughput sequencing data and clinical phenotypes

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113373225A (en) * 2021-06-10 2021-09-10 谱天(天津)生物科技有限公司 Combined analysis method for clinical sample gene and protein high-throughput detection result
CN115064213A (en) * 2022-08-18 2022-09-16 神州医疗科技股份有限公司 Multi-group-chemistry combined analysis method and system based on tumor sample

Also Published As

Publication number Publication date
CN115472216A (en) 2022-12-13

Similar Documents

Publication Publication Date Title
Choobdar et al. Assessment of network module identification across complex diseases
KR101953762B1 (en) Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data
Liu et al. Network-based analysis of complex diseases
CN102007407A (en) Genome identification system
Cheng et al. Advantages of CEMiTool for gene co-expression analysis of RNA-seq data
Rappoport et al. MONET: Multi-omic module discovery by omic selection
Weber et al. Reference-based comparison of adaptive immune receptor repertoires
Gnanadesigan et al. An integrated network topology and deep learning model for prediction of Alzheimer disease candidate genes
Mallik et al. WeCoMXP: Weighted connectivity measure integrating Co-methylation, Co-expression and protein-protein interactions for gene-module detection
Long et al. From function to translation: Decoding genetic susceptibility to human diseases via artificial intelligence
CN115472216B (en) Data integration-based cross-adaptive tumor drug combination recommendation method and system
Cao et al. uniPort: a unified computational framework for single-cell data integration with optimal transport
Stransky et al. Modeling cancer: integration of" omics" information in dynamic systems
Gobalan et al. Applications of bioinformatics in genomics and proteomics
Sun et al. Platform-integrated mRNA isoform quantification
US20070088509A1 (en) Method and system for selecting a marker molecule
Deng et al. Improving the prediction of disease-associated genes by integrating annotated gene sets
Taşan et al. A resource of quantitative functional annotation for homo sapiens genes
Jha et al. Qualitative assessment of functional module detectors on microarray and RNASeq data
Daoud et al. A multi-criteria decision making approach for predicting cancer cell sensitivity to drugs
Shi et al. Multi-omics clustering based on interpretable and discriminative features for cancer subtyping
Dohmen et al. Identifying tumor cells at the single cell level
Trajkovski Functional interpretation of gene expression data
Cava et al. Integration of networks and pathways with StarBioTrek package.
Pak et al. DGDRP: drug-specific gene selection for drug response prediction via re-ranking through propagating and learning biological network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant