CN115563312A - Medicine-disease-target triple target entity completion method and application - Google Patents
Medicine-disease-target triple target entity completion method and application Download PDFInfo
- Publication number
- CN115563312A CN115563312A CN202211302748.8A CN202211302748A CN115563312A CN 115563312 A CN115563312 A CN 115563312A CN 202211302748 A CN202211302748 A CN 202211302748A CN 115563312 A CN115563312 A CN 115563312A
- Authority
- CN
- China
- Prior art keywords
- target
- drug
- disease
- hyperbolic
- triple
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention provides a drug-disease-target triple target entity completion method and application, wherein under the assumption that a hyperbolic space is reasonably used to improve the accuracy of a translation-based drug-disease-target triple target entity completion task, a Lorentz space is introduced, a complete Lorentz linear conversion layer is defined in the space, a hyperbolic triple decoder is designed based on the Lorentz space to capture a hidden hierarchical structure in an isomeric drug-disease-target network, a hyperbolic drug and a target encoder are designed, and related field knowledge is introduced to further improve the prediction performance of candidate targets. The experimental result shows that under the condition of eliminating the interference including the domain knowledge and the experimental setting, the decoder based on the hyperbolic translation has better performance than the corresponding decoder method based on the Euclidean translation, and has better prediction effect. Meanwhile, experiments prove that the domain knowledge is really helpful for further improving the performance of the hyperbolic triple decoder in the current task.
Description
Technical Field
The invention relates to the fields of biomedicine and artificial intelligence, in particular to a medicine-disease-target triple target entity complementing method and application.
Background
In recent years, more and more biomedical databases have been developed, which generally comprise many heterogeneous biological networks, which generally comprise complex relationships between many drugs, targets and disease entities, and which provide the possibility of large-scale, network-embedded machine learning-based exploration of unknown relationships between these entities. However, most of the existing methods mainly capture the relationship between two of the three entities, drug, target and disease. More importantly, the methods ignore multi-relationship hierarchical topological information implied in a heterogeneous network consisting of the entities, and capturing the information may be helpful for exploring the relationship among the entities.
Prior art typically analyzes relationships between entities in heterogeneous drug-disease-target networks in Euclidean space, e.g., roman et al collected a data set DTINet containing drug-disease-target associations, and then used network diffusion algorithms and induction matrices to complement strategies that omitted inferring unknown interactions between drugs and targets (Luo Y, ZHao X, ZHou J, et al; 2017). Chen et al used a network-based embedded candidate ranking algorithm to map genes, diseases and their associated ontological data into the same vector space for screening of highly trusted disease-gene pairs (Chen J, althragi a, hoehndorf r., 2021). Moon et al extracted drug-disease-target triples from a heterogeneous network established based on DTINet, and then proposed a euclidean-translation-based knowledge-graph triple complementation method, DDTE, to establish the relationship between candidate targets and a given drug and disease (Moon C, jin C, dong X, et al 2021). However, heterogeneous networks behind these methods may contain implicit hierarchies that are embedded into complex interactions between biological and chemical entities, and these important relationships are difficult to capture by the Euclidean-space-based graphical learning (network embedding) method.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
In view of the defects of the prior art, the invention aims to provide a triple target entity completion method of a drug-disease-target and an application thereof, and aims to solve the problem that in the existing Euclidean space, an implicit hierarchical structure in a heterogeneous network cannot be effectively utilized by a method for deducing a target by a given drug and a disease based on a drug-disease-target heterogeneous network.
The technical scheme of the invention is as follows:
a method for complementing drug-disease-target triple target entities, which comprises the following steps:
constructing a heterogeneous network with drug-disease-target entities, and collecting characteristic data for the drugs and the target entities in the heterogeneous network;
splitting the constructed heterogeneous network into a medicine-disease-target point triple set according to triple generation logic;
respectively encoding the collected characteristic data corresponding to the medicine and the target point entity in the heterogeneous network by using a hyperbolic medicine and target point encoder to generate a hyperbolic medicine embedded lookup table and a hyperbolic target point embedded lookup table;
sending the drug-disease-target triple, the hyperbolic drug embedding lookup table, the hyperbolic target embedding lookup table and the disease number into a hyperbolic triple decoder, and calculating Lorentz distance similarity represented by the embedding of the target entity and the drug-disease combination in each triple for training a model;
and hyperbolic embedding of the combination representation of the drug-disease to be detected and all candidate targets into a hyperbolic triple decoder to obtain the probability sequence of the candidate targets of the drug and the disease to be detected.
The method for complementing the triple target entities of the drug, the disease and the target is characterized in that the method for constructing the heterogeneous network with the triple target entities of the drug, the disease and the target comprises the following specific steps: constructing a drug-disease-target heterogeneous network by using known data, wherein the drug, disease and target entities in the known data are used as nodes of the heterogeneous network, and known relations between entities are used as edges between the nodes of the heterogeneous network.
The medicine-disease-target triple target entity completion method comprises the steps of collecting Morgan expansion connection fingerprint characteristic data with the radius of 3 for medicine entities in the heterogeneous network, and collecting amino acid sequence similarity characteristic data and gene ontology-protein interaction network characteristic data for the target entities.
The medicine-disease-target triple target entity completion method is characterized in that the triple generation logic is as follows: and taking the known relation among the medicines, the diseases and the target entities as the edge among the medicine-disease-target network, and if the edges exist among a group of medicines, diseases and target entities, forming a medicine-disease-target triple.
The medicine-disease-target triple target entity complementing method is characterized in that a medicine-disease-target heterogeneous network is split into a triple set containing original network hidden hierarchical information according to the triple generating logic.
The drug-disease-target triple target entity completion method is based on a hyperbolic space, the hyperbolic space is a Lorentz space, and the hyperbolic drug and target encoder and the hyperbolic triple decoder work on the Lorentz space with constant negative curvature.
The entity completion method of the triple target of the medicine, the disease and the target comprises the steps of (1) defining a complete Lorentz linear conversion layer on the Lorentz space, wherein the complete Lorentz linear conversion layer is used as a correspondence of the linear conversion layer in the Euclidean space; changing dimensions or positions of features/coordinates in a hyperbolic space through the full lorentz linear conversion layer while ensuring that inputs and inputs remain on the hyperbolic space.
The triple target entity completion method of the medicine-disease-target is characterized in that the complete Lorentz linear conversion layer has the formula:
wherein, λ is a fixed hyper-parameter used for controlling the magnitude of the numerical scale in the operation process; σ is a Sigmoid activation function;andis the weight of the intra-layer transition matrix; b is trainable bias; e is a fixed value for ensuringGreater than 0; c is the negative curvature of the lorentz space; x and y are the input and output, respectively, of the complete lorentz linear conversion layer.
After the model is trained, according to the drug-disease combination to be detected, hyperbolic embedding corresponding to the candidate target to be matched with the model, hyperbolic embedding of the drug in the combination and a corresponding disease number are obtained, and the information is sent to a hyperbolic triple decoder to obtain a similarity probability score of each candidate target.
The application of the drug-disease-target triple target entity complementation method is characterized in that the drug-disease-target triple target entity complementation method is applied to predicting targets corresponding to drugs and diseases.
Has the advantages that: the invention provides a drug-disease-target triple target entity completion method and application, wherein under the assumption that a hyperbolic space is reasonably used to improve the accuracy of a translation-based drug-disease-target triple target entity completion task, a Lorentz space is introduced, a complete Lorentz linear conversion layer is defined in the space, a hyperbolic triple decoder is designed based on the Lorentz space to capture a hidden hierarchical structure in an isomeric drug-disease-target network, a hyperbolic drug and a target encoder are designed, and related field knowledge is introduced to further improve the prediction performance of candidate targets. The experimental result shows that under the condition of eliminating the interference comprising the domain knowledge and the experimental setting, the decoder based on the hyperbolic translation has better performance than the corresponding decoder method based on the Euclidean translation, and has better prediction effect. Meanwhile, experiments prove that the domain knowledge is really helpful for further improving the performance of the hyperbolic triple decoder in the current task.
Drawings
Fig. 1 is a schematic flow chart of a method for complementing a triple target entity of a drug-disease-target according to an embodiment of the present invention.
Fig. 2 is a schematic processing diagram of a hyperbolic space-based triple target entity completion method for a drug-disease-target according to an embodiment of the present invention.
Detailed Description
The invention provides a medicine-disease-target triple target entity completion method and application, and in order to make the purpose, technical scheme and effect of the invention clearer and clearer, the invention is further described in detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a medicine-disease-target triple target entity completion method, which comprises the following steps as shown in figure 1:
s1, constructing a heterogeneous network with a drug-disease-target entity, and collecting characteristic data for the drug and the target entity in the heterogeneous network;
s2, the constructed heterogeneous network is divided into a medicine-disease-target spot triple set according to triple generation logic;
s3, respectively encoding the collected characteristic data corresponding to the medicine and the target point entity in the heterogeneous network by using a hyperbolic medicine and target point encoder to generate a hyperbolic medicine embedded lookup table and a hyperbolic target point embedded lookup table;
s4, sending the medicine-disease-target triple, the hyperbolic medicine embedding lookup table, the hyperbolic target embedding lookup table and the disease number into a hyperbolic triple decoder, and calculating Lorentz distance similarity represented by target entity embedding and medicine-disease combination in each triple for training a model;
and S5, hyperbolic embedding of the combination representation of the drugs and the diseases to be detected and all candidate targets into a hyperbolic triple decoder to obtain the probability sequence of the candidate targets of the drugs and the diseases to be detected.
Aiming at the problem that the implicit hierarchical structure in the heterogeneous network cannot be effectively utilized by a method for deducing the target point by giving a drug and a disease based on a drug-disease-target heterogeneous network in a European space, the invention provides an assumption that a flow pattern with larger capacity capture hierarchical structure is mathematically proved by introducing a hyperbolic space, and the method for deducing the target point can obtain higher accuracy. To demonstrate this hypothesis, the present invention first represents the candidate target prediction problem as a hyperbolic translation-based triple target entity complementation task of drug-disease-target. Firstly, a complete lorentz linear transformation which is one of hyperbolic spatial feature transformation forms is introduced into a hyperbolic triple decoder, and with the help of the transformation, the decoder can better learn the medicine-disease-target triple split from a heterogeneous network to sense the implicit hierarchical structure contained in the network, and the similarity degree between candidate target embedding and given medicine and disease representation is measured through the lorentz distance. Meanwhile, an optional hyperbolic drug and target point encoder is provided and used for injecting additional drug and target point field knowledge in the learning process of the hyperbolic triple decoder, and further improvement of the performance of the decoder is achieved. By the scheme, the hierarchical structure possibly contained in the heterogeneous drug-disease-target network can be captured, and the domain knowledge of additional drugs and targets can be injected into the hyperbolic space, so that the candidate target prediction performance better than that of a corresponding Euclidean translation-based model is obtained.
In some embodiments, in step S1, the specific steps of constructing the heterogeneous network with the drug-disease-target entity are: constructing a drug-disease-target heterogeneous network by using known data, wherein the drug, disease and target entities in the known data are used as nodes of the heterogeneous network, and known relations between entities are used as edges between the nodes of the heterogeneous network. The embodiment of the invention firstly utilizes the existing data to construct an isomeric medicine-disease-target network, in the original data, the medicine, disease and target entities are taken as the nodes of the network, and the known relationship among the entities is taken as the edge among the nodes of the network.
In some embodiments, in step S1, characteristic data of Morgan extended connectivity fingerprints with a radius of 3 (ECFP 6) is collected for drug entities involved in the heterogeneous network, and amino acid sequence similarity characteristic data and gene ontology-protein interaction (PPI) network characteristic data are collected for target entities involved.
In some embodiments, in step S2, the triplet generation logic is: and taking the known relationship among the drugs, the diseases and the target entities as the edges among the drug-disease-target networks, and if a group of drugs, diseases and target entities have edges, forming a drug-disease-target triple.
In some embodiments, the drug-disease-target heterogeneous network is split into a set of triplets containing underlying hierarchical information of the original network according to the triplet generation logic.
In particular, the drug-disease-target data set may contain inferred target-disease and drug-disease associations (i.e., associations further inferred from known literature-related associations), such that a triplet is obtained if and only if there is an association (edge) between a group of drugs, diseases, and targets, and the triplet obtained in this manner has a higher confidence.
In some embodiments, in step S3, the collected european style feature data corresponding to the drug and the target related to the heterogeneous network are respectively sent to the hyperbolic drug and target encoder, so as to obtain a hyperbolic drug and target embedded lookup table containing domain knowledge.
In some embodiments, the method is based on a hyperbolic space, the hyperbolic space being a lorentz space, the hyperbolic drug and target encoder and the hyperbolic triple decoder each operating on a lorentz space with constant negative curvature.
In some embodiments, a complete lorentz linear conversion layer is defined on the lorentz space as a correspondence of linear conversion layers in the euclidean space; the dimensions or locations of features/coordinates in the hyperbolic space are changed through the complete lorentz linear conversion layer while ensuring that inputs and inputs remain on the hyperbolic space.
Specifically, the hyperbolic drug and target encoder and the hyperbolic triple decoder both work in a lorentz space with constant negative curvature, the lorentz space is one of isomorphic hyperbolic spaces, a complete lorentz linear transformation (layer) can be defined in the space, the complete lorentz linear transformation (layer) corresponds to a linear transformation layer in an Euclidean space, and through the linear layer, the dimension or position of features/coordinates in the hyperbolic space can be changed, and meanwhile, input and output are guaranteed to be still kept in the hyperbolic space.
Specifically, the specific formula of the complete lorentz linear conversion layer is as follows:
wherein, λ is a fixed hyper-parameter used for controlling the magnitude of the numerical scale in the operation process; σ is a Sigmoid activation function;andis the weight of the intra-layer transformation matrix; b is trainable bias; e is a fixed value for ensuringGreater than 0; c is the negative curvature of the lorentz space; x and y are the input and output, respectively, of the complete lorentz linear conversion layer.
The hyperbolic encoder for processing the similarity between the ECFP6 and the target point sequence is realized based on the linear layer, however, the input of the linear layer is hyperbolic feature/coordinate, the similarity between the ECFP6 and the target point sequence is Euclidean feature, a mapping is needed to be established to map the two types of feature data to a hyperbolic space, and the mapping formula is as follows:
taking the mapping ECFP6 as an example, the obtained hyperbolic ECFP6 can be represented asWhereinIs the origin of the lorentz space, (..,..) is the feature stitching operation, andis a Lorentzian inner product operation, which is defined as follows: where subscript t represents the first dimension of the input feature and subscript s represents all dimensions of the input feature except the first dimension:
in a similar way, hyperbolic target point sequence similarity data can be obtained, and the generated medicine and the hyperbolic target point characteristics are sequentially stored in a lookup table to form a corresponding hyperbolic embedded lookup table. In addition, the embodiment of the invention also provides another target point characteristic data: in order to encode the GO-PPI network to generate a hyperbolic target embedded lookup table based on the GO-PPI network, the invention uses a complete Lorentz chart convolutional layer, and the convolutional layer can also ensure that the characteristics do not deviate from a Lorentz space in the operation process and is defined as follows:
in particular, byTo process GO and PPI networks separately, whereinThe initial hyperbolic characteristic of each node in the two networks is obtained, so that hyperbolic embedding of each target point under GO and PPI networks is obtained respectively, then the complete Lorentz chart convolution layer is used again to perform characteristic fusion on embedding of each target point under GO and PPI networks, so that a hyperbolic target point embedding lookup table under GO-PPI networks can be obtained, and all the hyperbolic lookup embedding tables are used for providing domain knowledge for corresponding drugs and target point entities in a hyperbolic triple decoder.
In some embodiments, in step S4, the drug-disease-target triples and the generated drug and target hyperbolic embedded lookup table are fed into a hyperbolic triplet decoder, and the lorentz distance similarity between the target entity embedding and the drug-disease combination representation in each triplet is calculated to train the model.
In particular, the hyperbolic triple decoder is also based on a defined complete lorentzian linear conversion layer, in which fllines specific to the type of disease are arranged, with the same number of diseases as in the tripletsar n,n (x) As a trainable hyperbolic panning bias for each disease, the bias is used to convert/process hyperbolic drug insertions, and the converted drug insertions can be used as a representation of the current drug-disease combination, which is computed in the decoder as lorentz distance similarity to candidate target hyperbolic insertions to achieve ranking of candidate targets, whereas in a certain drug-disease combination, the computation formula of a certain candidate target similarity score p is as follows:
wherein D i ,D′ k ,T j Respectively represent drugs, diseases and target entities,is an expression of a certain drug-disease combination, andandrespectively drug type-specific and target type-specific bias,is the boundary over-parameter.
In addition, the hyperbolic drug and target encoder jointly train end to end in the framework, in order to optimize the training process, the embodiment of the invention uses a negative sampling technology, generates N negative samples for each known drug-disease-target triple and adds the N negative samples into the training process, and the negative samples are obtained by replacing the real target in the known triple as other random targets.
In some embodiments, the hyperbolic embeddings of all candidate targets and drug-disease combinations of candidate targets to be inferred are fed into a hyperbolic triple decoder to obtain a probabilistic ranking of similarity of candidate targets for a given drug and disease. Specifically, after the model is trained, hyperbolic embedding corresponding to candidate targets needing to be matched with the model, hyperbolic embedding of the drugs in the model and diseases (serial numbers) are obtained according to the drug-disease combination to be detected, and the information is sent to a hyperbolic triple decoder to obtain a similarity probability score of each candidate target. The higher the score, the more closely the target is associated with the current drug-target combination.
Fig. 2 is a schematic processing block diagram of the hyperbolic space-based drug-disease-target triple target entity completion method according to an embodiment of the present invention, and according to the hyperbolic space-based drug-disease-target triple target entity completion method, european-style feature data of drug-disease-target triples, drugs, and targets are respectively sent to the data preprocessing block, the hyperbolic drug encoder block, the hyperbolic target encoder block, and the hyperbolic triple decoder block, so as to finally obtain the interaction probability of the drug-target pair.
Compared with the target point completion method based on European translation and capable of not effectively capturing the hierarchical structure implied by the heterogeneous drug-disease-target point network, the medicine-disease-target point triple target point entity completion method based on the hyperbolic space provided by the embodiment of the invention has the advantages that on the assumption of the existing hyperbolic space mathematical conclusion, the implied hierarchical structures are effectively learned by introducing a Lorentz space of the isomorphic hyperbolic space and based on the complete Lorentz linear transformation defined on the space, and the technical effect of obviously improving the candidate target point prediction performance can be brought. The invention provides an assumption that capturing the hidden hierarchical structure of a biological network behind triples can be helpful for improving model performance by introducing a hyperbolic space in a translation-based medicine-disease-target triple target entity completion task; in the framework for the task mentioned in the embodiment of the invention, the Lorentz space is introduced into the triple decoder for sensing the heterogeneous biological network structure, so that the obvious target prediction performance improvement is obtained, and the proposed hypothesis is proved; the embodiment of the invention also designs a plurality of hyperbolic space-based drug and target point encoders for injecting effective priori knowledge into the hyperbolic triple decoder, so that the performance of the triple decoder can be further improved.
In some embodiments, other drugs and target field knowledge which can be used for the drug-disease-target triple target entity completion task can be found, a proper hyperbolic space encoder is designed, and the influence of hyperbolic encoder combinations corresponding to the different drugs and target field knowledge on the final performance of the model can be explored.
In some embodiments, in a heterogeneous network including complex interactions between biological entities and chemical entities, not only hierarchical structures but also other sub-structures, such as ring structures, may be included, so that other non-euclidean spaces, such as spherical spaces, which may be used exclusively for capturing such structures, may also be introduced in an attempt to adaptively capture corresponding sub-structures of the heterogeneous network.
The embodiment of the invention also provides an application of the medicine-disease-target triple target entity completion method, and the medicine-disease-target triple target entity completion method is applied to prediction of targets corresponding to medicines and diseases.
The following is a further explanation of the method for complementing triple target entities of drug-disease-target according to the present invention by specific embodiments:
example 1
Firstly, setting an experimental environment and data, and building a neural network model framework by using Python 3.6.13+ Pythrch 1.10.2 in order to verify the assumption that hyperbolic space validity is introduced into the current task. The experiment was performed in the environment of Linux 4.18, intel Xeon Platinum 8360Y,40G RAM, NVIDIA A100-SXM using the Riemannian Adam algorithm with a learning rate of 0.005 as an optimizer during training.
The data set is divided according to the drug-target pairs, that is, triplets with the same drug-target pairs are distributed into the same set (training set, verification set or test set), in this case, part of the drugs and targets in the test set do not appear in the training set, which can better check the performance of the model in inferring the targets. In other words, while the same given drug-disease combination may appear in the test set as in the training set, what is done in the test set must be an inference of a target that the given drug never has seen.
In addition, all triplets corresponding to 60%:20%:20% of all drug-target pair types will be assigned to the training set, the verification set and the test set, respectively, and this process will be repeated five times, and for each individual process, the entire data set will be randomly shuffled before the data is segmented to ensure that the drug-target pair types corresponding to the triplets obtained for each set are different at each segmentation. The average evaluation results of these five independent runs were recorded, each run using the validation set to adjust the model hyper-parameters, and the model was tested in the test set to obtain performance evaluation results. Table 1 is a list of important parameters in the example model:
table 1 important parameters in the model
The experimental performance indexes are as follows: in order to verify the effectiveness of the hypothesis in the translational-based drug-disease-target triple target entity completion task, the experiment mainly uses MRR (Mean Recocal Rank), hits @1, hits @3 and hits @10 which are commonly used in knowledge map completion to evaluate the prediction performance of different methods on candidate targets, and the larger the values of the indexes are, the more accurate the model prediction is. Embodiments of the present invention select MRR as the primary assessment indicator because it has better robustness than the latter.
To further prove the hypothesis proposed by the present invention, a representative euclidean translation-based drug-disease-target triple entity completion method DDTE is added to the comparison in the examples, and complete euclidean correspondence of the hyperbolic triple decoder in the examples is achieved, more specifically, all hyperbolic operations in the original model are modified into euclidean operations (e.g., modifying a complete lorentz linear layer into a fully connected euclidean layer), and the overall parameters are kept consistent, that is, the two are mainly different in whether hyperbolic operations are used and optimization is performed. In order to further remove the uncontrollable influence of domain knowledge on the model performance, it is first ensured that all methods involved in the comparison do not use a drug and target encoder (i.e. do not add any relevant domain knowledge), and instead, the acquisition of the drug and target embedding in the triplet decoder is modified to use a self-learning drug and target embedding representation contained in the decoder, and this type of embedding is actually obtained by learning the structural information of the heterogeneous network behind the model training directly.
Based on two representative physical embedding dimensions 16 (table 2) and 128 (table 3) (16 dimensions are representative hyperbolic embedding dimensions, and 128 dimensions are common european embedding dimensions), the present embodiment compares the performance of the above standard method for removing drug and target encoders and the two mentioned comparison methods on a compiled heterogeneous drug-disease-target network with hierarchical structure, as shown in tables 2 and 3, and it can be seen that the standard method without drug and target encoders in the present embodiment is generally better than the european correspondence of DDTE and standard method without encoder, which also do not use domain knowledge. On the main evaluation index MRR, the Euclidean correspondence of the standard method without the encoder is obviously better than that of DDTE, and simultaneously, the standard method without the encoder obtains additional performance improvement of 5.5% (16 dimensions) and 5.8% (128 dimensions) compared with the Euclidean correspondence, which can effectively illustrate the feasibility and effectiveness of introducing the hyperbolic space in the current task.
TABLE 2
TABLE 3
In addition, in order to prove the effectiveness of introducing external domain knowledge into the current task by the hyperbolic drug and target point encoder provided in the embodiment, based on the same setting as the above experiment, on the basis of the standard method without an encoder, five variants with different combinations of hyperbolic encoders are additionally added in a comparison experiment:
a. adding a hyperbolic ECFP6 encoder and a target sequence similarity encoder (ECFP-SEQ);
b. adding a hyperbolic ECFP6 encoder and a GO-PPI network encoder (ECFP-NET);
c. adding only hyperbolic ECFP6 encoders (ECFP-NONE);
d. only adding a hyperbolic target sequence similarity encoder (NON-SEQ);
e. only hyperbolic GO-PPI network encoders (NON-NET) are added.
The results of comparing the properties are shown in Table 4 (16D) and Table 5 (128D).
TABLE 4
TABLE 5
It can be seen that, in this embodiment, compared with the standard method without an encoder, no matter which representative dimension is used, the use of each hyperbolic encoder is beneficial to further improve the candidate target prediction performance, and the combination of ECFP-SEQ has the largest performance improvement amplitude.
In summary, the invention provides a drug-disease-target triple target entity completion method and application, and under the assumption that the hyperbolic space is reasonably used to improve the accuracy of a translation-based drug-disease-target triple target entity completion task, a lorentz space is introduced, a complete lorentz linear conversion layer is defined in the space, a hyperbolic triple decoder is designed based on the lorentz linear conversion layer to capture a hidden hierarchical structure in an heterogeneous drug-disease-target network, a hyperbolic drug and target encoder is designed, and related field knowledge is introduced to further improve the prediction performance of candidate targets. The experimental result shows that under the condition of eliminating the interference including the domain knowledge and the experimental setting, the decoder based on the hyperbolic translation has better performance than the corresponding decoder method based on the Euclidean translation, and has better prediction effect. Meanwhile, experiments prove that the domain knowledge is really helpful for further improving the prediction performance of the hyperbolic triple decoder in the current task.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.
Claims (10)
1. A method for complementing drug-disease-target triple target entities, which is characterized by comprising the following steps:
constructing a heterogeneous network with drug-disease-target entities, and collecting characteristic data for the drugs and the target entities in the heterogeneous network;
dividing the constructed heterogeneous network into a medicine-disease-target point triple set according to triple generation logic;
respectively encoding the collected characteristic data corresponding to the medicines and the target point entities in the heterogeneous network by using a hyperbolic medicine and target point encoder to generate a hyperbolic medicine embedded lookup table and a hyperbolic target point embedded lookup table;
sending the drug-disease-target triple, the hyperbolic drug embedding lookup table, the hyperbolic target embedding lookup table and the disease number into a hyperbolic triple decoder, and calculating Lorentz distance similarity represented by the embedding of the target entity and the drug-disease combination in each triple for training a model;
and (3) hyperbolic embedding of the combination expression of the drug and the disease to be detected and all candidate targets into a hyperbolic triple decoder so as to obtain probability sequencing of the candidate targets of the drug and the disease to be detected.
2. The method for complementing drug-disease-target triple target entities according to claim 1, wherein the step of constructing the heterogeneous network with the drug-disease-target entities comprises the following steps: constructing a drug-disease-target heterogeneous network by using known data, wherein the drug, disease and target entities in the known data are used as nodes of the heterogeneous network, and known relations between entities are used as edges between the nodes of the heterogeneous network.
3. The method of claim 1, wherein the Morgan extended link fingerprint feature data with radius of 3 is collected for drug entities in the heterogeneous network, and the amino acid sequence similarity feature data and the gene ontology-protein interaction network feature data are collected for target entities.
4. The method of claim 1, wherein the triplet generation logic is: and taking the known relationship among the drugs, the diseases and the target entities as the edges among the drug-disease-target networks, and if a group of drugs, diseases and target entities have edges, forming a drug-disease-target triple.
5. The method according to claim 4, wherein the drug-disease-target triple entity completion method is characterized in that a drug-disease-target heterogeneous network is split into a triple set containing the implicit hierarchical information of the original network according to the triple generation logic.
6. The method of claim 1, wherein the method is based on a hyperbolic space, the hyperbolic space is a lorentz space, and the hyperbolic drug and target encoder and the hyperbolic triple decoder each operate on a lorentz space with constant negative curvature.
7. The method for the entity complementation of drug-disease-target triple targets according to claim 6, wherein a complete Lorentz linear conversion layer is defined in the Lorentz space as the correspondence of the linear conversion layer in the Euclidean space; the dimensions or locations of features/coordinates in the hyperbolic space are changed through the complete lorentz linear conversion layer while ensuring that inputs and inputs remain on the hyperbolic space.
8. The method for triple target entity completion of drug-disease-target according to claim 7, wherein the formula of the complete Lorentz linear conversion layer is:
wherein, λ is a fixed hyper-parameter used for controlling the magnitude of the numerical scale in the operation process; σ is a Sigmoid activation function;andis the weight of the intra-layer transition matrix; b is trainable bias; e is a fixed value for ensuringGreater than 0; c is the negative curvature of the lorentz space; x and y are the input and output, respectively, of the complete lorentz linear conversion layer.
9. The method according to claim 1, wherein after the training of the model is completed, hyperbolic embedding corresponding to the candidate target to be matched with the model, hyperbolic embedding of the drug in the model and a corresponding disease number are obtained for the drug-disease combination to be tested, and the information is sent to a hyperbolic triple decoder to obtain a similarity probability score of each candidate target.
10. The use of the triple target entity complement method of drug-disease-target according to any of claims 1 to 9 for predicting the corresponding target of drugs and diseases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211302748.8A CN115563312A (en) | 2022-10-24 | 2022-10-24 | Medicine-disease-target triple target entity completion method and application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211302748.8A CN115563312A (en) | 2022-10-24 | 2022-10-24 | Medicine-disease-target triple target entity completion method and application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115563312A true CN115563312A (en) | 2023-01-03 |
Family
ID=84746567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211302748.8A Pending CN115563312A (en) | 2022-10-24 | 2022-10-24 | Medicine-disease-target triple target entity completion method and application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563312A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116842958A (en) * | 2023-09-01 | 2023-10-03 | 北京邮电大学 | Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof |
-
2022
- 2022-10-24 CN CN202211302748.8A patent/CN115563312A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116842958A (en) * | 2023-09-01 | 2023-10-03 | 北京邮电大学 | Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof |
CN116842958B (en) * | 2023-09-01 | 2024-02-06 | 北京邮电大学 | Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lucca et al. | CC-integrals: Choquet-like copula-based aggregation functions and its application in fuzzy rule-based classification systems | |
Graziani et al. | Concept attribution: Explaining CNN decisions to physicians | |
Dennis et al. | AGFS: Adaptive Genetic Fuzzy System for medical data classification | |
Saadi et al. | Investigation of effectiveness of shuffled frog-leaping optimizer in training a convolution neural network | |
WO2019178291A1 (en) | Methods for data segmentation and identification | |
CN106021990A (en) | Method for achieving classification and self-recognition of biological genes by means of specific characters | |
CN116386899A (en) | Graph learning-based medicine disease association relation prediction method and related equipment | |
CN115563312A (en) | Medicine-disease-target triple target entity completion method and application | |
CN115393632A (en) | Image classification method based on evolutionary multi-target neural network architecture structure | |
Salamai et al. | Lesion-aware visual transformer network for Paddy diseases detection in precision agriculture | |
AV et al. | Evaluation of Recurrent Neural Network Models for Parkinson's Disease Classification Using Drawing Data | |
CN113764034A (en) | Method, device, equipment and medium for predicting potential BGC in genome sequence | |
Livieris et al. | An advanced conjugate gradient training algorithm based on a modified secant equation | |
Tahir et al. | Protein subcellular localization in human and hamster cell lines: employing local ternary patterns of fluorescence microscopy images | |
Aung et al. | Modularity based ABC algorithm for detecting communities in complex networks | |
Yang et al. | The computational drug repositioning without negative sampling | |
Dong et al. | A novel multi-criteria discounting combination approach for multi-sensor fusion | |
Chen et al. | A novel selective ensemble classification of microarray data based on teaching-learning-based optimization | |
CN112766410A (en) | Rotary kiln firing state identification method based on graph neural network feature fusion | |
Wang | CPSO: Chaotic Particle Swarm Optimization for Cluster Analysis | |
Iranmanesh et al. | Inferring gene regulatory network using path consistency algorithm based on conditional mutual information and genetic algorithm | |
Shi et al. | Semi-supervised learning protein complexes from protein interaction networks | |
Zhang et al. | High dimensional missing data imputation for classification problems: A hybrid model based on K-nearest neighbor and genetic algorithm | |
CN117577214B (en) | Compound blood brain barrier permeability prediction method based on stack learning algorithm | |
Zhang et al. | Drug Repositioning Method Based on Deep Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |