CN116564408A - Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning - Google Patents

Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning Download PDF

Info

Publication number
CN116564408A
CN116564408A CN202310486650.0A CN202310486650A CN116564408A CN 116564408 A CN116564408 A CN 116564408A CN 202310486650 A CN202310486650 A CN 202310486650A CN 116564408 A CN116564408 A CN 116564408A
Authority
CN
China
Prior art keywords
graph
gene
knowledge
synthetic lethal
genes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310486650.0A
Other languages
Chinese (zh)
Other versions
CN116564408B (en
Inventor
郑杰
张可
刘勇
吴敏
冯艺苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ShanghaiTech University
Original Assignee
ShanghaiTech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ShanghaiTech University filed Critical ShanghaiTech University
Priority to CN202310486650.0A priority Critical patent/CN116564408B/en
Publication of CN116564408A publication Critical patent/CN116564408A/en
Application granted granted Critical
Publication of CN116564408B publication Critical patent/CN116564408B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Bioethics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a synthetic lethal gene pair prediction method, a device, equipment and a medium based on knowledge-graph reasoning, wherein the method comprises the following steps: acquiring a synthetic lethal knowledge graph and a known synthetic lethal gene pair; combining the synthetic lethal knowledge graph with a synthetic lethal graph formed by known synthetic lethal gene pairs to generate corresponding iso-graph, so as to construct a prediction model for predicting a plurality of partner genes of the preset initial gene based on the iso-graph and the preset initial gene; an optimized predictive model is trained based on a multi-class loss function. The invention fully utilizes the structure of KG to predict SL relationship and interprets the prediction process on the premise of not sampling neighbors, and defines the SL prediction problem as the recommendation problem of partner genes. Experiments show that the sum of the performances of KR4SL on NDCG, precision and Recall is superior to all baseline models in three data partitioning scenes.

Description

Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning
Technical Field
The application relates to the technical field of biological information, in particular to a synthetic lethal gene pair prediction method, a device, equipment and a medium based on knowledge-graph reasoning.
Background
Many important gene interactions are involved in cancer. Thus, identification of gene interactions is critical for the discovery of targets for anticancer drugs. Synthetic lethality (Synthetic lethality, SL) is a gene interaction relationship in that the inactivation of a single gene does not affect the viability of the cell, whereas the simultaneous inactivation of two genes results in cell death. Synthetic lethal relationships between genes provide a promising strategy for cancer treatment. By targeting genes that are not essential in normal cells but that are synthetically lethal to genes with cancer specific alterations, cancer cells can be selectively killed without damaging normal cells. Some wet laboratory techniques for large scale SL screening have been developed, such as RNA interference and CRISPR. However, these techniques have problems of high cost, off-target effect, unsuccessful gene knockout, and the like. To address these problems and expedite SL-based drug target discovery, many bioinformatic approaches for SL prediction and analysis have been developed over the last decade.
Existing calculation methods for prediction SL can be divided into three categories: statistical inference, web-based methods, and supervised machine learning methods. Statistical methods mine SL gene pairs based on predefined assumptions or rules. Network-based methods predict SL relationships by constructing a biological network and analyzing the topological features of genes in the network. Both types of methods have good interpretability, but manual selection of assumptions or topological features is relatively subjective and cannot utilize known SL pairs. Most supervised machine learning approaches lack an interpretability aspect, and the mechanism of SL tends to be unclear. The inclusion of a priori Knowledge in a Knowledge Graph (KG) into a supervised model may improve its interpretability. The existing KG-based method generally randomly samples neighbors and predicts based on node embedded similarity, so that the characteristics really important for prediction in KG cannot be found, namely, some important priori knowledge is ignored, so that the structural information of KG cannot be fully utilized and the prediction of a model cannot be well explained.
Therefore, development of an interpretable predictive model based on KG is needed to make full use of the semantic structure of KG to perform SL prediction and give an interpretation of the prediction result. The knowledge graph reasoning-based method utilizes the connectivity of paths between two nodes to infer the relationship of the two nodes, wherein important paths can be used as the explanation of the prediction of the two nodes. The relationship path is a special sequence composed of edge relationships in KG, and the directed graph composed of all possible relationship paths between two nodes is called a relationship directed graph.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, an object of the present application is to provide a knowledge-graph inference-based synthetic lethal gene pair prediction method, apparatus, device and medium for solving how to base on an interpretable prediction model of KG, making full use of the semantic structure of KG to perform SL prediction and giving an interpretation of the prediction result.
To achieve the above and other related objects, a first aspect of the present application provides a synthetic lethal gene pair prediction method based on knowledge-graph reasoning, including: acquiring a synthetic lethal knowledge graph and a known synthetic lethal gene pair; combining the synthetic lethal knowledge graph with a synthetic lethal graph formed by the known synthetic lethal gene pairs to generate corresponding iso-graphs, so as to construct a prediction model for predicting a plurality of partner genes of the preset starting genes based on the iso-graphs and the preset starting genes; the predictive model is trained and optimized based on a multi-class loss function.
In some embodiments of the first aspect of the present application, the method further comprises: after a synthetic lethal knowledge map and a known synthetic lethal gene pair are extracted from a SynLethDB synthetic lethal database, selecting a plurality of entities and a plurality of seed edge relations associated with a gene regulation mechanism from the synthetic lethal knowledge map; and expanding the side relationship based on a preset data set to obtain an expanded synthetic lethal knowledge graph.
In some embodiments of the first aspect of the present application, constructing a prediction model for predicting a number of partner genes of the preset starting gene based on the iso-pattern and the preset starting gene, comprising: constructing a directed graph based on all gene pairs of which the synthetic lethal knowledge graph is the initial gene and the preset initial gene; calculating semantic information representation transmitted from the initial genes to each node of each layer in the knowledge graph based on the relation directed graph; based on semantic information representation transmitted from a starting gene to each node of each layer in a knowledge graph, candidate partner genes in each layer of neighbor nodes are calculated, pairing possibility between each candidate partner gene and the starting gene is calculated, and a plurality of candidate partner genes with high pairing possibility are selected as partner genes of the starting gene.
In some embodiments of the first aspect of the present application, constructing a directed graph of all pairs of genes whose starting genes are the same as the preset starting genes based on the synthetic lethal knowledge graph, comprising:
definition of the initiation Gene g q And an isomerism diagram G; initial gene g q And a node in the heterographK-hop relation directed graph of +.>From the initial gene g q Starting from this, the starting gene G is found in the isomerism map G q Is marked as +.>Based on subgraph->Searching all neighbors for each neighbor node, so recursively searching K rounds to get sub-graph +.>Subgraph->Is the initiation gene g q And all nodes of the K-th layer->A union of K-hop relationship directed graphs; among all nodes of the K-th round, all gene nodes are taken as the initial gene g q SL candidate partner genes of (c).
In some embodiments of the first aspect of the present application, calculating a semantic information representation of each node of each layer in the knowledge-graph propagating from the starting gene based on the relational directed graph, comprising: constructing a relationship directed graph of the current layer based on the relationship directed graph of the previous layer in the knowledge graph so as to propagate semantic information from the previous layer to the current layer; aggregating all messages propagated to the same target node through an attention mechanism; sequence information from the upper layer to all sides of the current layer in the knowledge-graph is optimized based on the gating loop unit.
In some embodiments of the first aspect of the present application, the constructing a relationship directed graph of the current layer based on the relationship directed graph of the previous layer in the knowledge graph to propagate semantic information from the previous layer to the current layer includes: definition of the definitionFrom the initial gene g q Propagated to target node e i Semantic information of (2); for a triplet (e) from step (K-1) to step (K) i ,r io ,e o ) Slave node e i Propagated to node e o The semantic information of (2) is: /> wherein ,is r io Embedded representation at k-th layer, T i and To E is respectively i and eo Text representation of->Is a parameter that can be learned and is,represents the gene g from the start q To the (K-1) layer node e i Semantic information of (a).
In some embodiments of the first aspect of the present application, the aggregation of all messages propagated to the same target node by the attention mechanism is represented as: wherein ,is the initiation gene g q And node e in the heterograph o A K-hop relationship directed graph of (2); />Is for triplet (e i ,r io ,e o ) Is a concentration factor of (2); /> Andare all learnable parameters.
In some embodiments of the first aspect of the present application, the optimizing the sequence information from the top layer to the current layer in the knowledge-graph based on the gating loop unit includes using one GRU (Gated Recurrent Unit) gating loop unit to further strengthen the sequence information from the (K-1) th to the K-th steps, including:
wherein ,are all learnable parameters; />Represents the ratio g q Propagation through k steps to e o Semantic information representation of (2); />Representing the flow from g before passing through the GRU q Propagation to e o Semantic representation of (2); />Are all learnable parameters, r k 、f k Respectively representing a reset gate and an update gate, n k Representing the updated value after the GRU.
In some embodiments of the first aspect of the present application, the multi-class loss function is:
wherein ,/>Is all gene pairs involved in training, +.>Are all in g q A gene pair that is a starting gene; />Represents g after exponential transformation q and gp This scores the genes.
To achieve the above and other related objects, a second aspect of the present application provides a synthetic lethal gene pair prediction apparatus based on knowledge-graph inference, comprising: the data acquisition module is used for acquiring a synthetic lethal knowledge graph and a known synthetic lethal gene pair; the model construction module is used for combining the synthetic lethal knowledge graph and the synthetic lethal graph formed by the known synthetic lethal gene pairs to generate corresponding different patterns so as to construct a prediction model for predicting a plurality of partner genes of the preset initial genes based on the different patterns and the preset initial genes; the model training module is used for training and optimizing the prediction model based on a multi-classification loss function.
To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the synthetic lethal gene pair prediction method based on knowledge-graph inference.
To achieve the above and other related objects, a fourth aspect of the present application provides a computer apparatus, comprising: a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the computer equipment executes the synthetic lethal gene pair prediction method based on knowledge graph reasoning.
As described above, the synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning have the following beneficial effects: according to the invention, on the premise that neighbors do not need to be sampled, the SL relation is fully predicted by utilizing the KG structure and the prediction process is explained, the SL prediction problem is defined as the recommended problem of the partner genes, namely, one initial gene in a given SL gene pair is given, all possible genes of the model are scored, and a plurality of the partner genes with the forefront scores are selected as the predicted partner genes. Experiments show that the sum of the performances of KR4SL on NDCG, precision and Recall is superior to all baseline models in three data partitioning scenes.
Drawings
FIG. 1 is a schematic flow chart of a synthetic lethal gene pair prediction method based on knowledge-graph reasoning according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a prediction model for obtaining a plurality of partner genes for predicting the preset starting genes based on the iso-graph and the preset starting genes in an embodiment of the present application.
FIG. 3A is a schematic diagram of an iso-patterning during an experiment in an embodiment of the present application.
Fig. 3B is a schematic diagram showing the structure of the semantic information encoder (Semantic information encoder) during experiments in an embodiment of the present application.
Fig. 3C is a schematic diagram of a decoder (scanning decoder) during an experiment in an embodiment of the present application.
FIG. 3D is a schematic representation of the construction of a synthetic lethal gene pair from node ATM and node TP53 in one embodiment of the present application.
Fig. 4A shows the performance of the three types of indicators in the push-through scene according to an embodiment of the present application.
Fig. 4B shows the performance of three types of indicators in a generalized scene according to an embodiment of the present application.
FIG. 5 is a schematic diagram showing the structure of a synthetic lethal gene pair prediction device according to one embodiment of the present application.
Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
Other advantages and effects of the present application will become apparent to those skilled in the art from the present disclosure, when the following description of the embodiments is taken in conjunction with the accompanying drawings. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
It is noted that in the following description, reference is made to the accompanying drawings, which describe several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "upper," and the like, may be used herein to facilitate a description of one element or feature as illustrated in the figures as being related to another element or feature.
In this application, unless specifically stated and limited otherwise, the terms "mounted," "connected," "secured," "held," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
Furthermore, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including" specify the presence of stated features, operations, elements, components, items, categories, and/or groups, but do not preclude the presence, presence or addition of one or more other features, operations, elements, components, items, categories, and/or groups. The terms "or" and/or "as used herein are to be construed as inclusive, or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; A. b and C). An exception to this definition will occur only when a combination of elements, functions or operations are in some way inherently mutually exclusive.
In order to solve the problems in the background technology, the invention provides a knowledge-graph-inference-based synthetic lethal gene pair prediction method, a knowledge-graph-inference-based synthetic lethal gene pair prediction system, a knowledge-graph-based synthetic lethal gene pair prediction terminal and a knowledge-graph-based synthetic lethal gene pair prediction medium, and aims to fully utilize the structure of KG to predict SL relations and explain a prediction process on the premise of not sampling neighbors. KR4SL defines the SL prediction problem as a recommended problem for partner genes, i.e. given the starting genes in the SL gene pair, the model is scored for all possible genes, and the top number of scores is selected as the predicted partner gene. Experiments show that the performance of KR4SL on indexes such as NDCG, precision and Recall is superior to all baseline models under three data division scenes.
In short, the invention can effectively construct a relationship directed graph for a plurality of gene pairs and make inferences on the graph, predict potential SL partner genes and make explanations. Specifically: first, for multiple pairs of genes with the same starting gene, the model will construct a relationship directed graph for those pairs of genes simultaneously without randomly sampling neighbors, and reasoning in those graphs starting from the starting gene. Secondly, in the reasoning process of each layer, the structural information of the relation directed graph and the text semantic information of the entities in the graph are combined to serve as semantic information to be propagated, and the semantic information is further enhanced by learning the sequence information of the relation paths in the relation directed graph. And finally, information aggregation is carried out by adopting an attention mechanism, and a path with high weight is selected as interpretation after model training is finished.
In order to make the objects, technical solutions and advantages of the present invention more apparent, further detailed description of the technical solutions in the embodiments of the present invention will be given by the following examples with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Before explaining the present invention in further detail, terms and terminology involved in the embodiments of the present invention will be explained, and the terms and terminology involved in the embodiments of the present invention are applicable to the following explanation:
<1> synthetic lethality (Synthetic Lethality, SL): refers to the phenomenon that two non-lethal genes are inactivated simultaneously, resulting in cell death, and can be defined as a phenomenon that when either one of the A gene and the B gene is mutated, the viability is also obtained, but when both genes are mutated simultaneously, death is caused.
<2> Wet laboratory (Wet-Lab): refers to traditional experiments based on experimental agents, and the relative concept Dry laboratory (Dry-Lab) refers to computer-based simulation experiments.
The embodiment of the invention provides a synthetic lethal gene pair prediction method based on knowledge graph reasoning, a system of the synthetic lethal gene pair prediction method based on knowledge graph reasoning and a storage medium for storing an executable program for realizing the synthetic lethal gene pair prediction method based on knowledge graph reasoning. In terms of the implementation of the knowledge-based reasoning synthetic lethal gene pair prediction method, the embodiment of the present invention will describe an exemplary implementation scenario of the knowledge-based reasoning synthetic lethal gene pair prediction.
As shown in FIG. 1, a schematic flow chart of a synthetic lethal gene pair prediction method based on knowledge-graph reasoning is shown in the embodiment of the invention. The synthetic lethal gene pair prediction method based on knowledge-graph reasoning in the embodiment mainly comprises the following steps:
step S1: and obtaining a synthetic lethal knowledge graph and a known synthetic lethal gene pair.
In the embodiment of the invention, the synthetic lethal knowledge map and the known synthetic lethal gene pair are extracted from a SynLethDB synthetic lethal database. It should be noted that SynLethDB is a database concerning synthetic lethality, and has been widely used as data of gold standard; synLethDB2.0 contains a total of 50868 SL gene pairs involving 5 species and a knowledge-graph (SynLethKG) for the SL gene pairs. The present example utilized 35374 pair gene pairs involving 9746 genes as tag data, with the SynLethKG containing 11 relationships and 27 entities.
More preferably, after the synthetic lethal knowledge map and the known synthetic lethal gene pair are extracted from the synLethDB synthetic lethal database, selecting a plurality of entities and a plurality of side relations associated with a gene regulation mechanism from the synthetic lethal knowledge map; and expanding the side relationship based on a preset data set to obtain an expanded synthetic lethal knowledge graph.
For example, 3 entities and 4 side relationships associated with the gene regulatory mechanism can be selected from the synthetic lethal knowledge graph (synLethKG), and the 4 side relationships are expanded to 32 side relationships by using the Onstoprotein data set, so as to finally obtain a knowledge graph consisting of 3 types of 42547 nodes and 32 types of 381761 sides.
It is understood that the 3 entities are genes, gene ontologies and pathways, respectively. Gene regulation can be divided into 4 levels: level 1 gene regulation is represented by negative feedback regulation and is regulated by substrate or product concentration; the 2-level gene regulation is chain regulation and is regulated by signal molecules; the 3-level gene regulation is one-to-many regulation, represented by transcription factors, and one node regulates dozens of hundreds of targets; level 4 gene regulation is program regulation, is genome level time-dependent regulation, and controls transcriptome by changing expression group.
In some examples, the method further includes extracting a textual representation of each node in the knowledge-graph based on a pre-trained language model, the specific extraction process including: and extracting text representation for each node by using the text description of each node in the knowledge graph as an input parameter and using a pre-trained BERT-based language model CORR in the biomedical corpus so as to enrich semantic information of the text representation.
Step S2: combining the synthetic lethal knowledge graph with a synthetic lethal graph formed by the known synthetic lethal gene pairs to generate a corresponding iso-graph, so as to construct a prediction model for predicting a plurality of partner genes of the preset starting gene based on the iso-graph and the preset starting gene.
It should be noted that there may be different types of nodes and edges in the iso-graph, which have independent ID spaces and features. A heterogeneous graph is typically composed of a series of sub-graphs, one sub-graph corresponding to each relationship defined by a string triplet (source node type, edge type, target node type).
In some examples, a prediction model for predicting a plurality of partner genes of the preset starting genes is constructed based on the iso-graph and the preset starting genes, and the process is as shown in fig. 2:
step S21: and constructing a directed graph based on the synthetic lethal knowledge graph as a starting gene and all gene pairs of the preset starting gene.
Specifically, the initiation gene g is defined q And an isomerism diagram G; initial gene g q And a node in the heterographK-hop relation directed graph of +. >From the initial gene g q Starting from this, the starting gene G is found in the isomerism map G q Is marked as +.>Based on subgraph->Searching all neighbors for each neighbor node so that K rounds of searching recursively obtain childrenFigure->Subgraph->Is the initiation gene g q And all nodes of the K-th layer->A union of K-hop relationship directed graphs; among all nodes of the K-th round, all gene nodes are taken as the initial gene g q SL candidate partner genes of (c).
Step S22: and calculating semantic information representation transmitted from the initial genes to each node of each layer in the knowledge graph based on the relation directed graph. The specific process is as follows:
step S22a: and constructing a relation directed graph of the current layer based on the relation directed graph of the upper layer in the knowledge graph so as to propagate semantic information from the upper layer to the current layer.
In a directed graph based on the relation of the (K-1) th stepTo construct a relation directed graph of the K th step +.>The following description is given for the sake of example: definitions->From the initial gene g q Propagated to target node e i Semantic information of (2); for a triplet (e) from step (K-1) to step (K) i ,r io ,e o ) Slave node e i Propagated to node e o The semantic information of (2) is:
wherein ,is r io Embedded representation at k-th layer, T i and To E is respectively i and eo Text representation of->Is a parameter which can be learned, +.>Represents the gene g from the start q To the (K-1) layer node e i Semantic information of (a).
Step S22b: all messages propagated to the same target node are aggregated by the attention mechanism.
Specifically, aggregating all messages propagated to the same target node through the attention mechanism is represented as:
wherein ,is the initiation gene g q And node e in the heterograph o A K-hop relationship directed graph of (2); />Is for triplet (e i ,r io ,e o ) Attention coefficient of (a), i.e.)> and />Are all learnable parameters.
Step S22c: sequence information from the upper layer to all sides of the current layer in the knowledge-graph is optimized based on the gating loop unit.
Specifically, one GRU (Gated Recurrent Unit) gating loop was used to further enhance the sequence information from (K-1) to all sides of the K-th step, including:
wherein ,are all learnable parameters; />Represents the ratio g q Propagation through k steps to e o Semantic information representation of (2); />Representing the flow from g before passing through the GRU q Propagation to e o Semantic representation of (2); />Are all learnable parameters, r k 、f k Respectively representing a reset gate and an update gate, n k Representing the updated value after the GRU.
Step S23: based on semantic information representation transmitted from a starting gene to each node of each layer in a knowledge graph, candidate partner genes in each layer of neighbor nodes are calculated, pairing possibility between each candidate partner gene and the starting gene is calculated, and a plurality of candidate partner genes with high pairing possibility are selected as partner genes of the starting gene.
After the message passing through the K layers, all gene nodes in the K-layer neighbor nodes can be selected as candidate partner genes. For example, node g p G is g q A gene node in the K-th layer neighbor node of (2), then g p That is g q Can be g through a full junction layer p Calculating a final score:
wherein ,Wff and bff Are all learnable parameters; this score reflects g p Become g q The higher the probability of a partner gene, the higher the probability of a partner gene. After the scores of all candidate genes are arranged in descending order, the first N are selected as g q Partner gene.
Step S3: the predictive model is trained and optimized based on a multi-class loss function.
The multi-class loss function in the embodiment of the invention is expressed as follows:
wherein ,is all gene pairs involved in training, +.>Are all in g q A gene pair that is a starting gene;represents g after exponential transformation q and gp This scores the genes.
In order to facilitate the technical features and technical effects of the present invention to be further understood by those skilled in the art, the present invention will be explained in more detail below with reference to experimental procedures and experimental results.
FIG. 3A shows the combination of a synthetic lethal pattern (Known SL graph) and a synthetic lethal knowledge-graph (KG) of Known synthetic lethal gene pairs during an experiment to generate the resulting iso-pattern. Wherein node DNA damage response refers to a DNA damage response, which is one of the basic physiological mechanisms of an organism, which is aimed at protecting the genome of the organism. The node DNA repair is DNA repair, and is a reaction of cells after the cells are damaged to DNA; node BRCA1 is a gene directly related to hereditary breast cancer; node cell cycle is the cell cycle, which refers to the whole process that a cell undergoes from the completion of one division to the end of the next division; apoptotic process refers to the process of apoptosis; node ABL1 is a proto-oncogene; node CDK6 is cell division protein kinase 6; node ATM is an ataxia telangiectasia mutated gene; node CDK1 is cyclin dependent kinase 1; node TP53 is a tumor suppressor gene.
Fig. 3B shows a schematic diagram of the structure of the semantic information encoder (Semantic information encoder) during an experiment. Starting from a starting gene, recursively searching for a plurality of layers of neighbor nodes in the heterogeneous graph, and taking the gene node in the node of the last layer as a candidate partner gene. In the process from the k-1 step to the k step, firstly, calculating semantic information transmitted on each edge by utilizing structural information on the heterogram and text information of an entity in KG, then, carrying out attention message aggregation (Attentive Aggregation) on triples with the same target node, and finally, strengthening sequence information through a GRU to obtain semantic information representation of the k layer.
Fig. 3C shows a schematic diagram of a decoder (scanning decoder) during an experiment. For each candidate gene node of the K-th layer, a full-join (FF) layer is used to obtain the final score. After ranking these scores in descending order, the first N were selected as partner genes.
Fig. 3D shows the final explanation, taking DNA repair as an example: node ATM is a new partner gene for node TP53 because both node ATM and the known SL partner genes (ABL 1 and BRCA 1) are involved in biological processes (i.e., DNA repair).
The experimental scenario was set as follows: to evaluate the performance of the model, two experimental scenarios were set up.
Direct push type scene: given the known SL map and the synthetic lethal knowledge-map KG, unknown pairs of SL genes (or SL relationships) are deduced. In this case, the dataset is divided by gene pairs, and genes in the test set may be present in the training set.
Inductive scenario: all genes tested were not seen during the training. In this case, the data sets are divided by genes, the gene sets related to the training set and the gene sets related to the test set are not intersected with each other, and the gene sets related to the training and the test are also not intersected with each other in the different patterns for training and the different patterns for test. This setup may further check the generalization ability of the model.
Experimental comparison results are shown in fig. 4A and 4B: the synthetic lethal gene pair prediction method based on knowledge-graph reasoning (KR 4SL for short) provided by the embodiment of the invention is superior to the existing basic model in three indexes (NDCG@N, precision@N and recall@N, N=10, 20, 50) of two scenes, and is particularly in a generalized scene. Each value in the table is the result of training five times, the best result for each column is indicated in bold, "-" indicates that the value is 0.
The synthetic lethal gene based on knowledge-graph reasoning in the invention is used for explaining the implementation process and principle of the prediction method in detail. Hereinafter, the prediction apparatus, device and medium will be further described with respect to synthetic lethal gene based on knowledge-graph inference.
Fig. 5 shows a schematic structural diagram of a synthetic lethal gene pair prediction device based on knowledge-graph reasoning in the embodiment of the invention. The synthetic lethal gene pair prediction apparatus 500 according to an embodiment of the present invention includes: a data acquisition module 501, a model construction module 502 and a model training module 503.
The data acquisition module 501 is configured to acquire a synthetic lethal knowledge profile and a known synthetic lethal gene pair. The model construction module 502 is configured to combine the synthetic lethal knowledge graph and the synthetic lethal graph formed by the known synthetic lethal gene pair to generate a corresponding iso-graph, so as to construct a prediction model for predicting a plurality of partner genes of the preset starting gene based on the iso-graph and the preset starting gene. Model training module 503 is used to train and optimize the predictive model based on a multi-class loss function.
In some examples, the data acquisition module 501, after extracting synthetic lethal knowledge patterns and known synthetic lethal gene pairs from the SynLethDB synthetic lethal database, selects several entities and several seed edge relationships associated with a gene regulation mechanism from the synthetic lethal knowledge patterns; and expanding the side relationship based on a preset data set to obtain an expanded synthetic lethal knowledge graph.
In some examples, the model construction module 502 constructs a prediction model for predicting a plurality of partner genes of the preset starting genes based on the iso-graph and the preset starting genes, and the process specifically includes: constructing a directed graph based on all gene pairs of which the synthetic lethal knowledge graph is the initial gene and the preset initial gene; calculating semantic information representation transmitted from the initial genes to each node of each layer in the knowledge graph based on the relation directed graph; based on semantic information representation transmitted from a starting gene to each node of each layer in a knowledge graph, candidate partner genes in each layer of neighbor nodes are calculated, pairing possibility between each candidate partner gene and the starting gene is calculated, and a plurality of candidate partner genes with high pairing possibility are selected as partner genes of the starting gene.
In some examples, constructing a directed graph of all pairs of genes for which a starting gene is the same as the preset starting gene based on the synthetic lethal knowledge profile, comprising: definition of the definitionInitial Gene g q And an isomerism diagram G; initial gene g q And a node in the heterographK-hop relation directed graph of +. >From the initial gene g q Starting from this, the starting gene G is found in the isomerism map G q Is marked as +.>Based on subgraph->Searching all neighbors for each neighbor node, so recursively searching K rounds to get sub-graph +.>Subgraph->Is the initiation gene g q And all nodes of the K-th layer->A union of K-hop relationship directed graphs; among all nodes of the K-th round, all gene nodes are taken as the initial gene g q SL candidate partner genes of (c).
In some examples, calculating a semantic information representation of each node of each layer in the knowledge-graph that propagates from the starting gene based on the relational graph includes: constructing a relationship directed graph of the current layer based on the relationship directed graph of the previous layer in the knowledge graph so as to propagate semantic information from the previous layer to the current layer; aggregating all messages propagated to the same target node through an attention mechanism; sequence information from the upper layer to all sides of the current layer in the knowledge-graph is optimized based on the gating loop unit.
In some examples, what isConstructing a relationship directed graph of a current layer based on a relationship directed graph of a previous layer in the knowledge graph to propagate semantic information from the previous layer to the current layer, comprising: definition of the definition From the initial gene g q Propagated to target node e i Semantic information of (2); for a triplet (e) from step (K-1) to step (K) i ,r io ,e o ) Slave node e i Propagated to node e o The semantic information of (2) is: /> wherein ,/>Is r io Embedded representation at k-th layer, T i and To E is respectively i and eo Text representation of->Is a parameter that can be learned.
In some examples, the aggregation of all messages propagated to the same target node through the attention mechanism is represented as:
wherein ,/>Is the initiation gene g q And node e in the heterograph o A K-hop relationship directed graph of (2); />Is for triplet (e i ,r io ,e o ) Is a concentration factor of (2); and />Are all learnable parameters.
In some examples, the gating loop-based unit optimizes sequence information from a top layer to all sides of a current layer in the knowledge-graph, including using one GRU (Gated Recurrent Unit) gating loop unit to further strengthen sequence information from (K-1) th to (K-th) th sides, including:
wherein ,are all learnable parameters; />Represents the ratio g q Propagation through k steps to e o Semantic information representation of (a).
In some examples, the multi-class loss function used by the model training module 503 is:
wherein ,/>Is all gene pairs involved in training, +.>Are all in g q A gene pair that is a starting gene; />Represents g after exponential transformation q and gp This scores the genes.
It should be noted that: the synthetic lethal gene pair prediction device based on knowledge-graph inference provided in the above embodiment only illustrates the division of each program module when performing the synthetic lethal gene pair prediction based on knowledge-graph inference, and in practical application, the process allocation may be completed by different program modules according to needs, that is, the internal structure of the device is divided into different program modules, so as to complete all or part of the processes described above. In addition, the synthetic lethal gene pair prediction device based on knowledge-graph reasoning provided in the above embodiment and the synthetic lethal gene pair prediction method based on knowledge-graph reasoning belong to the same concept, and the specific implementation process is detailed in the method embodiment, which is not described herein.
The method for predicting the synthetic lethal gene pair based on the knowledge-graph inference provided by the embodiment of the invention can be implemented by adopting a terminal side or a server side, and referring to fig. 5, for a hardware structure of a predicted terminal of the synthetic lethal gene pair based on the knowledge-graph inference, an optional hardware structure schematic diagram of a predicted terminal 500 of the synthetic lethal gene pair based on the knowledge-graph inference provided by the embodiment of the invention is shown, where the terminal 500 can be a mobile phone, a computer device, a tablet device, a personal digital processing device, a factory background processing device, and the like. The synthetic lethal gene pair prediction terminal 500 based on knowledge-graph reasoning includes: at least one processor 501, memory 502, at least one network interface 504, and a user interface 506. The various components in the device are coupled together by a bus system 505. It is understood that bus system 505 is used to enable connected communications between these components. The bus system 505 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for clarity of illustration the various buses are labeled as bus systems in fig. 5.
The user interface 506 may include, among other things, a display, keyboard, mouse, trackball, click gun, keys, buttons, touch pad, or touch screen, etc.
It is to be appreciated that memory 502 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), which serves as an external cache, among others. By way of example, and not limitation, many forms of RAM are available, such as static random Access Memory (SRAM, staticRandom Access Memory), synchronous static random Access Memory (SSRAM, synchronous Static RandomAccess Memory). The memory described by embodiments of the present invention is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory 502 in the embodiment of the present invention is used to store various kinds of data to support the operation of the synthetic lethal gene on the prediction terminal 500 based on knowledge-graph inference. Examples of such data include: any executable program for operating on the knowledge-graph inference based synthetic lethal gene pair prediction terminal 500, such as an operating system 5021 and an application 5022; the operating system 5021 contains various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks. The application 5022 may include various application programs such as a media player (MediaPlayer), a Browser (Browser), etc. for implementing various application services. The synthetic lethal gene pair prediction method based on knowledge-graph reasoning provided by the embodiment of the invention can be contained in an application 5022.
The method disclosed in the above embodiment of the present invention may be applied to the processor 501 or implemented by the processor 501. The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 501. The processor 501 may be a general purpose processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 501 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present invention. The general purpose processor 501 may be a microprocessor or any conventional processor or the like. The steps of the accessory optimization method provided by the embodiment of the invention can be directly embodied as the execution completion of the hardware decoding processor or the execution completion of the hardware and software module combination execution in the decoding processor. The software modules may be located in a storage medium having memory and a processor reading information from the memory and performing the steps of the method in combination with hardware.
In an exemplary embodiment, the synthetic lethal gene pair prediction terminal 400 based on knowledge-graph inference may be used by one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable LogicDevice) to perform the aforementioned methods.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by computer program related hardware. The aforementioned computer program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
In the embodiments provided herein, the computer-readable storage medium may include read-only memory, random-access memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, U-disk, removable hard disk, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. In addition, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable and data storage media do not include connections, carrier waves, signals, or other transitory media, but are intended to be directed to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.
In summary, the present application provides a method, a device, a terminal and a medium for predicting synthetic lethal gene pairs based on knowledge graph reasoning, which fully utilizes the structure of KG to predict SL relationship and explain the prediction process under the premise of not sampling neighbors, defines the SL prediction problem as the recommended problem of partner genes, namely, given one initial gene in SL gene pairs, scoring all possible genes of a model, and selecting a plurality of partner genes with the forefront scores as the predicted partner genes. Experiments show that the sum of the performances of KR4SL on NDCG, precision and Recall is superior to all baseline models in three data partitioning scenes. Therefore, the method effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles of the present application and their effectiveness, and are not intended to limit the application. Modifications and variations may be made to the above-described embodiments by those of ordinary skill in the art without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications and variations which may be accomplished by persons skilled in the art without departing from the spirit and technical spirit of the disclosure be covered by the claims of this application.

Claims (12)

1. A synthetic lethal gene pair prediction method based on knowledge-graph reasoning is characterized by comprising the following steps:
acquiring a synthetic lethal knowledge graph and a known synthetic lethal gene pair;
combining the synthetic lethal knowledge graph with a synthetic lethal graph formed by the known synthetic lethal gene pairs to generate corresponding iso-graphs, so as to construct a prediction model for predicting a plurality of partner genes of the preset starting genes based on the iso-graphs and the preset starting genes;
the predictive model is trained and optimized based on a multi-class loss function.
2. The knowledge-graph inference-based synthetic lethal gene pair prediction method according to claim 1, wherein said method further comprises: after a synthetic lethal knowledge map and a known synthetic lethal gene pair are extracted from a SynLethDB synthetic lethal database, selecting a plurality of entities and a plurality of seed edge relations associated with a gene regulation mechanism from the synthetic lethal knowledge map; and expanding the side relationship based on a preset data set to obtain an expanded synthetic lethal knowledge graph.
3. The knowledge-graph-inference-based synthetic lethal gene pair prediction method according to claim 1, wherein a prediction model for predicting a plurality of partner genes of the preset starting gene is constructed based on the heterograms and the preset starting gene, and the method comprises the steps of:
Constructing a directed graph based on all gene pairs of which the synthetic lethal knowledge graph is the initial gene and the preset initial gene;
calculating semantic information representation transmitted from the initial genes to each node of each layer in the knowledge graph based on the relation directed graph;
based on semantic information representation transmitted from a starting gene to each node of each layer in a knowledge graph, candidate partner genes in each layer of neighbor nodes are calculated, pairing possibility between each candidate partner gene and the starting gene is calculated, and a plurality of candidate partner genes with high pairing possibility are selected as partner genes of the starting gene.
4. The synthetic lethal gene pair prediction method based on knowledge-graph inference of claim 3, wherein the constructing a directed graph based on the synthetic lethal knowledge graph for all pairs of the initial genes and the preset initial genes comprises:
definition of the initiation Gene g q And an isomerism diagram G; initial gene g q And a node in the heterographK-hop relationship directed graph of (2) isFrom the initial gene g q Starting from this, the starting gene G is found in the isomerism map G q Is marked asBased on subgraph- >Searching all neighbors for each neighbor node, so recursively searching K rounds to get sub-graph +.>Subgraph->Is the initiation gene g q And all nodes of the K-th layer->A union of K-hop relationship directed graphs; among all nodes of the K-th round, all gene nodes are taken as the initial gene g q SL candidate partner genes of (c).
5. A synthetic lethal gene pair prediction method based on knowledge-graph inference as claimed in claim 3, wherein said calculating semantic information representation of each node of each layer in the knowledge-graph, which is propagated from the initial gene, based on said relational directed graph, comprises:
constructing a relationship directed graph of the current layer based on the relationship directed graph of the previous layer in the knowledge graph so as to propagate semantic information from the previous layer to the current layer;
aggregating all messages propagated to the same target node through an attention mechanism;
sequence information from the upper layer to all sides of the current layer in the knowledge-graph is optimized based on the gating loop unit.
6. The knowledge-graph inference based synthetic lethal gene pair prediction method according to claim 5, wherein said knowledge-graph based upper layer relationship directed graph constructs a current layer relationship directed graph to propagate semantic information from an upper layer to a current layer, comprising: definition of the definition From the initial gene g q Propagated to target node e i Semantic information of (2); for a triplet (e) from step (K-1) to step (K) i ,r io ,e o ) Slave node e i Propagated to node e o The semantic information of (2) is: wherein ,/>Is r io Embedded representation at k-th layer, T i and To E is respectively i and eo Text representation of->Is a parameter which can be learned, +.>Represents the gene g from the start q To the (K-1) layer node e i Semantic information of (a).
7. The synthetic lethal gene pair prediction method based on knowledge-graph inference of claim 5, wherein said aggregating all messages propagated to the same target node by an attention mechanism is expressed as:
wherein ,is the initiation gene g q And node e in the heterograph o A K-hop relationship directed graph of (2); />Is for triplet (e i ,r io ,e o ) Is a concentration factor of (2); /> and />Are all learnable parameters.
8. The knowledge-graph inference based synthetic lethal gene pair prediction method according to claim 5, wherein said gating loop unit optimizes the sequence information from the upper layer to the current layer in the knowledge graph, including using one GRU (Gated Recurrent Unit) gating loop unit to further strengthen the sequence information from the (K-1) th to the (K) th sides, comprising:
wherein ,are all learnable parameters; />Represents the ratio g q Propagation through k steps to e o Semantic information representation of (2); />Representing the flow from g before passing through the GRU q Propagation to e o Semantic representation of (2); />Are all learnable parameters, r k 、f k Respectively representing a reset gate and an update gate, n k Representing the updated value after the GRU.
9. The knowledge-graph-inference-based synthetic lethal gene pair prediction method according to claim 1, wherein said multiclass loss function is:
wherein ,is all gene pairs involved in training, +.>Are all in g q A gene pair that is a starting gene; />Represents g after exponential transformation q and gp This scores the genes.
10. The utility model provides a synthetic lethal gene pair prediction device based on knowledge-graph reasoning which characterized in that includes:
the data acquisition module is used for acquiring a synthetic lethal knowledge graph and a known synthetic lethal gene pair;
the model construction module is used for combining the synthetic lethal knowledge graph and the synthetic lethal graph formed by the known synthetic lethal gene pairs to generate corresponding different patterns so as to construct a prediction model for predicting a plurality of partner genes of the preset initial genes based on the different patterns and the preset initial genes;
The model training module is used for training and optimizing the prediction model based on a multi-classification loss function.
11. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the synthetic lethal gene pair prediction method based on knowledge-graph inference of any one of claims 1 to 9.
12. A computer device, comprising: a processor and a memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory, so that the computer device performs the synthetic lethal gene pair prediction method based on knowledge-graph inference according to any one of claims 1 to 9.
CN202310486650.0A 2023-04-28 2023-04-28 Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning Active CN116564408B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310486650.0A CN116564408B (en) 2023-04-28 2023-04-28 Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310486650.0A CN116564408B (en) 2023-04-28 2023-04-28 Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning

Publications (2)

Publication Number Publication Date
CN116564408A true CN116564408A (en) 2023-08-08
CN116564408B CN116564408B (en) 2024-03-01

Family

ID=87487277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310486650.0A Active CN116564408B (en) 2023-04-28 2023-04-28 Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning

Country Status (1)

Country Link
CN (1) CN116564408B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079712A (en) * 2023-08-30 2023-11-17 中国农业科学院农业信息研究所 Method, device, equipment and medium for excavating biosynthesis gene cluster
CN117116355A (en) * 2023-08-30 2023-11-24 中国农业科学院农业信息研究所 Method, device, equipment and medium for excavating excellent multiple-effect genes

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288091A (en) * 2020-10-30 2021-01-29 西南电子技术研究所(中国电子科技集团公司第十研究所) Knowledge inference method based on multi-mode knowledge graph
US20210174906A1 (en) * 2019-12-06 2021-06-10 Accenture Global Solutions Limited Systems And Methods For Prioritizing The Selection Of Targeted Genes Associated With Diseases For Drug Discovery Based On Human Data
CN113010691A (en) * 2021-03-30 2021-06-22 电子科技大学 Knowledge graph inference relation prediction method based on graph neural network
CN113626612A (en) * 2021-08-13 2021-11-09 第四范式(北京)技术有限公司 Prediction method and system based on knowledge graph reasoning
EP3913543A2 (en) * 2020-12-21 2021-11-24 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for training multivariate relationship generation model, electronic device and medium
CN113987203A (en) * 2021-10-27 2022-01-28 湖南大学 Knowledge graph reasoning method and system based on affine transformation and bias modeling
CN114595344A (en) * 2022-05-09 2022-06-07 北京市农林科学院信息技术研究中心 Crop variety management-oriented knowledge graph construction method and device
US20220207343A1 (en) * 2020-12-22 2022-06-30 International Business Machines Corporation Entity disambiguation using graph neural networks
CN114969369A (en) * 2022-05-30 2022-08-30 大连民族大学 Knowledge graph human cancer lethal prediction method based on mixed network and knowledge graph construction method
CN115240777A (en) * 2022-08-10 2022-10-25 上海科技大学 Synthetic lethal gene prediction method, device, terminal and medium based on graph neural network
CN115240778A (en) * 2022-08-10 2022-10-25 上海科技大学 Synthetic lethal gene partner recommendation method, device, terminal and medium based on comparative learning
WO2022222037A1 (en) * 2021-04-20 2022-10-27 中国科学院深圳先进技术研究院 Interpretable recommendation method based on graph neural network inference
WO2023065545A1 (en) * 2021-10-19 2023-04-27 平安科技(深圳)有限公司 Risk prediction method and apparatus, and device and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210174906A1 (en) * 2019-12-06 2021-06-10 Accenture Global Solutions Limited Systems And Methods For Prioritizing The Selection Of Targeted Genes Associated With Diseases For Drug Discovery Based On Human Data
CN112288091A (en) * 2020-10-30 2021-01-29 西南电子技术研究所(中国电子科技集团公司第十研究所) Knowledge inference method based on multi-mode knowledge graph
EP3913543A2 (en) * 2020-12-21 2021-11-24 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for training multivariate relationship generation model, electronic device and medium
US20220207343A1 (en) * 2020-12-22 2022-06-30 International Business Machines Corporation Entity disambiguation using graph neural networks
CN113010691A (en) * 2021-03-30 2021-06-22 电子科技大学 Knowledge graph inference relation prediction method based on graph neural network
WO2022222037A1 (en) * 2021-04-20 2022-10-27 中国科学院深圳先进技术研究院 Interpretable recommendation method based on graph neural network inference
CN113626612A (en) * 2021-08-13 2021-11-09 第四范式(北京)技术有限公司 Prediction method and system based on knowledge graph reasoning
WO2023065545A1 (en) * 2021-10-19 2023-04-27 平安科技(深圳)有限公司 Risk prediction method and apparatus, and device and storage medium
CN113987203A (en) * 2021-10-27 2022-01-28 湖南大学 Knowledge graph reasoning method and system based on affine transformation and bias modeling
CN114595344A (en) * 2022-05-09 2022-06-07 北京市农林科学院信息技术研究中心 Crop variety management-oriented knowledge graph construction method and device
CN114969369A (en) * 2022-05-30 2022-08-30 大连民族大学 Knowledge graph human cancer lethal prediction method based on mixed network and knowledge graph construction method
CN115240778A (en) * 2022-08-10 2022-10-25 上海科技大学 Synthetic lethal gene partner recommendation method, device, terminal and medium based on comparative learning
CN115240777A (en) * 2022-08-10 2022-10-25 上海科技大学 Synthetic lethal gene prediction method, device, terminal and medium based on graph neural network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MINCAI LAI 等: "Predicting Synthetic Lethality in Human Cancers via Multi-Graph Ensemble Neural Network", 《2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC)》 *
杨瑞达;林欣;杨燕;贺樑;窦亮: "基于混合增强智能的知识图谱推理技术研究", 计算机应用与软件, no. 06 *
陈德华;殷苏娜;乐嘉锦;王梅;潘乔;朱立峰;: "一种面向临床领域时序知识图谱的链接预测模型", 计算机研究与发展, no. 12 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079712A (en) * 2023-08-30 2023-11-17 中国农业科学院农业信息研究所 Method, device, equipment and medium for excavating biosynthesis gene cluster
CN117116355A (en) * 2023-08-30 2023-11-24 中国农业科学院农业信息研究所 Method, device, equipment and medium for excavating excellent multiple-effect genes
CN117116355B (en) * 2023-08-30 2024-02-20 中国农业科学院农业信息研究所 Method, device, equipment and medium for excavating excellent multiple-effect genes
CN117079712B (en) * 2023-08-30 2024-02-20 中国农业科学院农业信息研究所 Method, device, equipment and medium for excavating pathway gene cluster

Also Published As

Publication number Publication date
CN116564408B (en) 2024-03-01

Similar Documents

Publication Publication Date Title
CN116564408B (en) Synthetic lethal gene pair prediction method, device, equipment and medium based on knowledge-graph reasoning
Zhao et al. IRWNRLPI: integrating random walk and neighborhood regularized logistic matrix factorization for lncRNA-protein interaction prediction
Zhu et al. Recursively imputed survival trees
CN107391512B (en) Method and device for predicting knowledge graph
US20160321357A1 (en) Discovery informatics system, method and computer program
CN106021990B (en) A method of biological gene is subjected to classification and Urine scent with specific character
US11514498B2 (en) System and method for intelligent guided shopping
Gong et al. Novel heuristic density-based method for community detection in networks
Lagani et al. Structure-based variable selection for survival data
Choi et al. Identifying disease-gene associations using a convolutional neural network-based model by embedding a biological knowledge graph with entity descriptions
US20230197205A1 (en) Bioretrosynthetic method and system based on and-or tree and single-step reaction template prediction
Cannataro et al. Data management of protein interaction networks
Yu et al. DDOT: a Swiss army knife for investigating data-driven biological ontologies
Zhou et al. Summarisation of weighted networks
Price et al. Survey: Enhancing protein complex prediction in PPI networks with GO similarity weighting
CN110837567A (en) Method and system for embedding knowledge graph
CN110610763A (en) KaTZ model-based metabolite and disease association relation prediction method
CN116324810A (en) Potential policy distribution for assumptions in a network
Sun et al. A graph neural network-based interpretable framework reveals a novel DNA fragility–associated chromatin structural unit
Di Mauro et al. Bandit-based Monte-Carlo structure learning of probabilistic logic programs
Razi et al. Identifying gene subnetworks associated with clinical outcome in ovarian cancer using network based coalition game
Ji et al. HAM-FMD: mining functional modules in protein–protein interaction networks using ant colony optimization and multi-agent evolution
CN115080587A (en) Electronic component replacing method, device and medium based on knowledge graph
Chen et al. A community finding method for weighted dynamic online social network based on user behavior
Yoo et al. The Five‐Gene‐Network Data Analysis with Local Causal Discovery Algorithm Using Causal Bayesian Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant