CN115563312A - Medicine-disease-target triple target entity completion method and application - Google Patents

Medicine-disease-target triple target entity completion method and application Download PDF

Info

Publication number
CN115563312A
CN115563312A CN202211302748.8A CN202211302748A CN115563312A CN 115563312 A CN115563312 A CN 115563312A CN 202211302748 A CN202211302748 A CN 202211302748A CN 115563312 A CN115563312 A CN 115563312A
Authority
CN
China
Prior art keywords
target
drug
disease
hyperbolic
triple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211302748.8A
Other languages
Chinese (zh)
Inventor
岳杨
雷皇书
石玮琳
石远平
张芳育
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Kangzhou Big Data Group Co ltd
Chongqing Kangzhou Zhitong Pharmaceutical Technology Co ltd
Original Assignee
Chongqing Kangzhou Big Data Group Co ltd
Chongqing Kangzhou Zhitong Pharmaceutical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Kangzhou Big Data Group Co ltd, Chongqing Kangzhou Zhitong Pharmaceutical Technology Co ltd filed Critical Chongqing Kangzhou Big Data Group Co ltd
Priority to CN202211302748.8A priority Critical patent/CN115563312A/en
Publication of CN115563312A publication Critical patent/CN115563312A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention provides a drug-disease-target triple target entity completion method and application, wherein under the assumption that a hyperbolic space is reasonably used to improve the accuracy of a translation-based drug-disease-target triple target entity completion task, a Lorentz space is introduced, a complete Lorentz linear conversion layer is defined in the space, a hyperbolic triple decoder is designed based on the Lorentz space to capture a hidden hierarchical structure in an isomeric drug-disease-target network, a hyperbolic drug and a target encoder are designed, and related field knowledge is introduced to further improve the prediction performance of candidate targets. The experimental result shows that under the condition of eliminating the interference including the domain knowledge and the experimental setting, the decoder based on the hyperbolic translation has better performance than the corresponding decoder method based on the Euclidean translation, and has better prediction effect. Meanwhile, experiments prove that the domain knowledge is really helpful for further improving the performance of the hyperbolic triple decoder in the current task.

Description

Medicine-disease-target triple target entity completion method and application
Technical Field
The invention relates to the fields of biomedicine and artificial intelligence, in particular to a medicine-disease-target triple target entity complementing method and application.
Background
In recent years, more and more biomedical databases have been developed, which generally comprise many heterogeneous biological networks, which generally comprise complex relationships between many drugs, targets and disease entities, and which provide the possibility of large-scale, network-embedded machine learning-based exploration of unknown relationships between these entities. However, most of the existing methods mainly capture the relationship between two of the three entities, drug, target and disease. More importantly, the methods ignore multi-relationship hierarchical topological information implied in a heterogeneous network consisting of the entities, and capturing the information may be helpful for exploring the relationship among the entities.
Prior art typically analyzes relationships between entities in heterogeneous drug-disease-target networks in Euclidean space, e.g., roman et al collected a data set DTINet containing drug-disease-target associations, and then used network diffusion algorithms and induction matrices to complement strategies that omitted inferring unknown interactions between drugs and targets (Luo Y, ZHao X, ZHou J, et al; 2017). Chen et al used a network-based embedded candidate ranking algorithm to map genes, diseases and their associated ontological data into the same vector space for screening of highly trusted disease-gene pairs (Chen J, althragi a, hoehndorf r., 2021). Moon et al extracted drug-disease-target triples from a heterogeneous network established based on DTINet, and then proposed a euclidean-translation-based knowledge-graph triple complementation method, DDTE, to establish the relationship between candidate targets and a given drug and disease (Moon C, jin C, dong X, et al 2021). However, heterogeneous networks behind these methods may contain implicit hierarchies that are embedded into complex interactions between biological and chemical entities, and these important relationships are difficult to capture by the Euclidean-space-based graphical learning (network embedding) method.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
In view of the defects of the prior art, the invention aims to provide a triple target entity completion method of a drug-disease-target and an application thereof, and aims to solve the problem that in the existing Euclidean space, an implicit hierarchical structure in a heterogeneous network cannot be effectively utilized by a method for deducing a target by a given drug and a disease based on a drug-disease-target heterogeneous network.
The technical scheme of the invention is as follows:
a method for complementing drug-disease-target triple target entities, which comprises the following steps:
constructing a heterogeneous network with drug-disease-target entities, and collecting characteristic data for the drugs and the target entities in the heterogeneous network;
splitting the constructed heterogeneous network into a medicine-disease-target point triple set according to triple generation logic;
respectively encoding the collected characteristic data corresponding to the medicine and the target point entity in the heterogeneous network by using a hyperbolic medicine and target point encoder to generate a hyperbolic medicine embedded lookup table and a hyperbolic target point embedded lookup table;
sending the drug-disease-target triple, the hyperbolic drug embedding lookup table, the hyperbolic target embedding lookup table and the disease number into a hyperbolic triple decoder, and calculating Lorentz distance similarity represented by the embedding of the target entity and the drug-disease combination in each triple for training a model;
and hyperbolic embedding of the combination representation of the drug-disease to be detected and all candidate targets into a hyperbolic triple decoder to obtain the probability sequence of the candidate targets of the drug and the disease to be detected.
The method for complementing the triple target entities of the drug, the disease and the target is characterized in that the method for constructing the heterogeneous network with the triple target entities of the drug, the disease and the target comprises the following specific steps: constructing a drug-disease-target heterogeneous network by using known data, wherein the drug, disease and target entities in the known data are used as nodes of the heterogeneous network, and known relations between entities are used as edges between the nodes of the heterogeneous network.
The medicine-disease-target triple target entity completion method comprises the steps of collecting Morgan expansion connection fingerprint characteristic data with the radius of 3 for medicine entities in the heterogeneous network, and collecting amino acid sequence similarity characteristic data and gene ontology-protein interaction network characteristic data for the target entities.
The medicine-disease-target triple target entity completion method is characterized in that the triple generation logic is as follows: and taking the known relation among the medicines, the diseases and the target entities as the edge among the medicine-disease-target network, and if the edges exist among a group of medicines, diseases and target entities, forming a medicine-disease-target triple.
The medicine-disease-target triple target entity complementing method is characterized in that a medicine-disease-target heterogeneous network is split into a triple set containing original network hidden hierarchical information according to the triple generating logic.
The drug-disease-target triple target entity completion method is based on a hyperbolic space, the hyperbolic space is a Lorentz space, and the hyperbolic drug and target encoder and the hyperbolic triple decoder work on the Lorentz space with constant negative curvature.
The entity completion method of the triple target of the medicine, the disease and the target comprises the steps of (1) defining a complete Lorentz linear conversion layer on the Lorentz space, wherein the complete Lorentz linear conversion layer is used as a correspondence of the linear conversion layer in the Euclidean space; changing dimensions or positions of features/coordinates in a hyperbolic space through the full lorentz linear conversion layer while ensuring that inputs and inputs remain on the hyperbolic space.
The triple target entity completion method of the medicine-disease-target is characterized in that the complete Lorentz linear conversion layer has the formula:
Figure BDA0003905465880000031
wherein, λ is a fixed hyper-parameter used for controlling the magnitude of the numerical scale in the operation process; σ is a Sigmoid activation function;
Figure BDA0003905465880000032
and
Figure BDA0003905465880000033
is the weight of the intra-layer transition matrix; b is trainable bias; e is a fixed value for ensuring
Figure BDA0003905465880000034
Greater than 0; c is the negative curvature of the lorentz space; x and y are the input and output, respectively, of the complete lorentz linear conversion layer.
After the model is trained, according to the drug-disease combination to be detected, hyperbolic embedding corresponding to the candidate target to be matched with the model, hyperbolic embedding of the drug in the combination and a corresponding disease number are obtained, and the information is sent to a hyperbolic triple decoder to obtain a similarity probability score of each candidate target.
The application of the drug-disease-target triple target entity complementation method is characterized in that the drug-disease-target triple target entity complementation method is applied to predicting targets corresponding to drugs and diseases.
Has the advantages that: the invention provides a drug-disease-target triple target entity completion method and application, wherein under the assumption that a hyperbolic space is reasonably used to improve the accuracy of a translation-based drug-disease-target triple target entity completion task, a Lorentz space is introduced, a complete Lorentz linear conversion layer is defined in the space, a hyperbolic triple decoder is designed based on the Lorentz space to capture a hidden hierarchical structure in an isomeric drug-disease-target network, a hyperbolic drug and a target encoder are designed, and related field knowledge is introduced to further improve the prediction performance of candidate targets. The experimental result shows that under the condition of eliminating the interference comprising the domain knowledge and the experimental setting, the decoder based on the hyperbolic translation has better performance than the corresponding decoder method based on the Euclidean translation, and has better prediction effect. Meanwhile, experiments prove that the domain knowledge is really helpful for further improving the performance of the hyperbolic triple decoder in the current task.
Drawings
Fig. 1 is a schematic flow chart of a method for complementing a triple target entity of a drug-disease-target according to an embodiment of the present invention.
Fig. 2 is a schematic processing diagram of a hyperbolic space-based triple target entity completion method for a drug-disease-target according to an embodiment of the present invention.
Detailed Description
The invention provides a medicine-disease-target triple target entity completion method and application, and in order to make the purpose, technical scheme and effect of the invention clearer and clearer, the invention is further described in detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a medicine-disease-target triple target entity completion method, which comprises the following steps as shown in figure 1:
s1, constructing a heterogeneous network with a drug-disease-target entity, and collecting characteristic data for the drug and the target entity in the heterogeneous network;
s2, the constructed heterogeneous network is divided into a medicine-disease-target spot triple set according to triple generation logic;
s3, respectively encoding the collected characteristic data corresponding to the medicine and the target point entity in the heterogeneous network by using a hyperbolic medicine and target point encoder to generate a hyperbolic medicine embedded lookup table and a hyperbolic target point embedded lookup table;
s4, sending the medicine-disease-target triple, the hyperbolic medicine embedding lookup table, the hyperbolic target embedding lookup table and the disease number into a hyperbolic triple decoder, and calculating Lorentz distance similarity represented by target entity embedding and medicine-disease combination in each triple for training a model;
and S5, hyperbolic embedding of the combination representation of the drugs and the diseases to be detected and all candidate targets into a hyperbolic triple decoder to obtain the probability sequence of the candidate targets of the drugs and the diseases to be detected.
Aiming at the problem that the implicit hierarchical structure in the heterogeneous network cannot be effectively utilized by a method for deducing the target point by giving a drug and a disease based on a drug-disease-target heterogeneous network in a European space, the invention provides an assumption that a flow pattern with larger capacity capture hierarchical structure is mathematically proved by introducing a hyperbolic space, and the method for deducing the target point can obtain higher accuracy. To demonstrate this hypothesis, the present invention first represents the candidate target prediction problem as a hyperbolic translation-based triple target entity complementation task of drug-disease-target. Firstly, a complete lorentz linear transformation which is one of hyperbolic spatial feature transformation forms is introduced into a hyperbolic triple decoder, and with the help of the transformation, the decoder can better learn the medicine-disease-target triple split from a heterogeneous network to sense the implicit hierarchical structure contained in the network, and the similarity degree between candidate target embedding and given medicine and disease representation is measured through the lorentz distance. Meanwhile, an optional hyperbolic drug and target point encoder is provided and used for injecting additional drug and target point field knowledge in the learning process of the hyperbolic triple decoder, and further improvement of the performance of the decoder is achieved. By the scheme, the hierarchical structure possibly contained in the heterogeneous drug-disease-target network can be captured, and the domain knowledge of additional drugs and targets can be injected into the hyperbolic space, so that the candidate target prediction performance better than that of a corresponding Euclidean translation-based model is obtained.
In some embodiments, in step S1, the specific steps of constructing the heterogeneous network with the drug-disease-target entity are: constructing a drug-disease-target heterogeneous network by using known data, wherein the drug, disease and target entities in the known data are used as nodes of the heterogeneous network, and known relations between entities are used as edges between the nodes of the heterogeneous network. The embodiment of the invention firstly utilizes the existing data to construct an isomeric medicine-disease-target network, in the original data, the medicine, disease and target entities are taken as the nodes of the network, and the known relationship among the entities is taken as the edge among the nodes of the network.
In some embodiments, in step S1, characteristic data of Morgan extended connectivity fingerprints with a radius of 3 (ECFP 6) is collected for drug entities involved in the heterogeneous network, and amino acid sequence similarity characteristic data and gene ontology-protein interaction (PPI) network characteristic data are collected for target entities involved.
In some embodiments, in step S2, the triplet generation logic is: and taking the known relationship among the drugs, the diseases and the target entities as the edges among the drug-disease-target networks, and if a group of drugs, diseases and target entities have edges, forming a drug-disease-target triple.
In some embodiments, the drug-disease-target heterogeneous network is split into a set of triplets containing underlying hierarchical information of the original network according to the triplet generation logic.
In particular, the drug-disease-target data set may contain inferred target-disease and drug-disease associations (i.e., associations further inferred from known literature-related associations), such that a triplet is obtained if and only if there is an association (edge) between a group of drugs, diseases, and targets, and the triplet obtained in this manner has a higher confidence.
In some embodiments, in step S3, the collected european style feature data corresponding to the drug and the target related to the heterogeneous network are respectively sent to the hyperbolic drug and target encoder, so as to obtain a hyperbolic drug and target embedded lookup table containing domain knowledge.
In some embodiments, the method is based on a hyperbolic space, the hyperbolic space being a lorentz space, the hyperbolic drug and target encoder and the hyperbolic triple decoder each operating on a lorentz space with constant negative curvature.
In some embodiments, a complete lorentz linear conversion layer is defined on the lorentz space as a correspondence of linear conversion layers in the euclidean space; the dimensions or locations of features/coordinates in the hyperbolic space are changed through the complete lorentz linear conversion layer while ensuring that inputs and inputs remain on the hyperbolic space.
Specifically, the hyperbolic drug and target encoder and the hyperbolic triple decoder both work in a lorentz space with constant negative curvature, the lorentz space is one of isomorphic hyperbolic spaces, a complete lorentz linear transformation (layer) can be defined in the space, the complete lorentz linear transformation (layer) corresponds to a linear transformation layer in an Euclidean space, and through the linear layer, the dimension or position of features/coordinates in the hyperbolic space can be changed, and meanwhile, input and output are guaranteed to be still kept in the hyperbolic space.
Specifically, the specific formula of the complete lorentz linear conversion layer is as follows:
Figure BDA0003905465880000071
wherein, λ is a fixed hyper-parameter used for controlling the magnitude of the numerical scale in the operation process; σ is a Sigmoid activation function;
Figure BDA0003905465880000072
and
Figure BDA0003905465880000073
is the weight of the intra-layer transformation matrix; b is trainable bias; e is a fixed value for ensuring
Figure BDA0003905465880000074
Greater than 0; c is the negative curvature of the lorentz space; x and y are the input and output, respectively, of the complete lorentz linear conversion layer.
The hyperbolic encoder for processing the similarity between the ECFP6 and the target point sequence is realized based on the linear layer, however, the input of the linear layer is hyperbolic feature/coordinate, the similarity between the ECFP6 and the target point sequence is Euclidean feature, a mapping is needed to be established to map the two types of feature data to a hyperbolic space, and the mapping formula is as follows:
Figure BDA0003905465880000081
Figure BDA0003905465880000082
taking the mapping ECFP6 as an example, the obtained hyperbolic ECFP6 can be represented as
Figure BDA0003905465880000083
Wherein
Figure BDA0003905465880000084
Is the origin of the lorentz space, (..,..) is the feature stitching operation, and
Figure BDA0003905465880000089
is a Lorentzian inner product operation, which is defined as follows: where subscript t represents the first dimension of the input feature and subscript s represents all dimensions of the input feature except the first dimension:
Figure BDA0003905465880000085
in a similar way, hyperbolic target point sequence similarity data can be obtained, and the generated medicine and the hyperbolic target point characteristics are sequentially stored in a lookup table to form a corresponding hyperbolic embedded lookup table. In addition, the embodiment of the invention also provides another target point characteristic data: in order to encode the GO-PPI network to generate a hyperbolic target embedded lookup table based on the GO-PPI network, the invention uses a complete Lorentz chart convolutional layer, and the convolutional layer can also ensure that the characteristics do not deviate from a Lorentz space in the operation process and is defined as follows:
Figure BDA0003905465880000086
in particular, by
Figure BDA0003905465880000087
To process GO and PPI networks separately, wherein
Figure BDA0003905465880000088
The initial hyperbolic characteristic of each node in the two networks is obtained, so that hyperbolic embedding of each target point under GO and PPI networks is obtained respectively, then the complete Lorentz chart convolution layer is used again to perform characteristic fusion on embedding of each target point under GO and PPI networks, so that a hyperbolic target point embedding lookup table under GO-PPI networks can be obtained, and all the hyperbolic lookup embedding tables are used for providing domain knowledge for corresponding drugs and target point entities in a hyperbolic triple decoder.
In some embodiments, in step S4, the drug-disease-target triples and the generated drug and target hyperbolic embedded lookup table are fed into a hyperbolic triplet decoder, and the lorentz distance similarity between the target entity embedding and the drug-disease combination representation in each triplet is calculated to train the model.
In particular, the hyperbolic triple decoder is also based on a defined complete lorentzian linear conversion layer, in which fllines specific to the type of disease are arranged, with the same number of diseases as in the tripletsar n,n (x) As a trainable hyperbolic panning bias for each disease, the bias is used to convert/process hyperbolic drug insertions, and the converted drug insertions can be used as a representation of the current drug-disease combination, which is computed in the decoder as lorentz distance similarity to candidate target hyperbolic insertions to achieve ranking of candidate targets, whereas in a certain drug-disease combination, the computation formula of a certain candidate target similarity score p is as follows:
Figure BDA0003905465880000091
Figure BDA0003905465880000092
wherein D i ,D′ k ,T j Respectively represent drugs, diseases and target entities,
Figure BDA0003905465880000093
is an expression of a certain drug-disease combination, and
Figure BDA0003905465880000094
and
Figure BDA0003905465880000095
respectively drug type-specific and target type-specific bias,
Figure BDA0003905465880000096
is the boundary over-parameter.
In addition, the hyperbolic drug and target encoder jointly train end to end in the framework, in order to optimize the training process, the embodiment of the invention uses a negative sampling technology, generates N negative samples for each known drug-disease-target triple and adds the N negative samples into the training process, and the negative samples are obtained by replacing the real target in the known triple as other random targets.
In some embodiments, the hyperbolic embeddings of all candidate targets and drug-disease combinations of candidate targets to be inferred are fed into a hyperbolic triple decoder to obtain a probabilistic ranking of similarity of candidate targets for a given drug and disease. Specifically, after the model is trained, hyperbolic embedding corresponding to candidate targets needing to be matched with the model, hyperbolic embedding of the drugs in the model and diseases (serial numbers) are obtained according to the drug-disease combination to be detected, and the information is sent to a hyperbolic triple decoder to obtain a similarity probability score of each candidate target. The higher the score, the more closely the target is associated with the current drug-target combination.
Fig. 2 is a schematic processing block diagram of the hyperbolic space-based drug-disease-target triple target entity completion method according to an embodiment of the present invention, and according to the hyperbolic space-based drug-disease-target triple target entity completion method, european-style feature data of drug-disease-target triples, drugs, and targets are respectively sent to the data preprocessing block, the hyperbolic drug encoder block, the hyperbolic target encoder block, and the hyperbolic triple decoder block, so as to finally obtain the interaction probability of the drug-target pair.
Compared with the target point completion method based on European translation and capable of not effectively capturing the hierarchical structure implied by the heterogeneous drug-disease-target point network, the medicine-disease-target point triple target point entity completion method based on the hyperbolic space provided by the embodiment of the invention has the advantages that on the assumption of the existing hyperbolic space mathematical conclusion, the implied hierarchical structures are effectively learned by introducing a Lorentz space of the isomorphic hyperbolic space and based on the complete Lorentz linear transformation defined on the space, and the technical effect of obviously improving the candidate target point prediction performance can be brought. The invention provides an assumption that capturing the hidden hierarchical structure of a biological network behind triples can be helpful for improving model performance by introducing a hyperbolic space in a translation-based medicine-disease-target triple target entity completion task; in the framework for the task mentioned in the embodiment of the invention, the Lorentz space is introduced into the triple decoder for sensing the heterogeneous biological network structure, so that the obvious target prediction performance improvement is obtained, and the proposed hypothesis is proved; the embodiment of the invention also designs a plurality of hyperbolic space-based drug and target point encoders for injecting effective priori knowledge into the hyperbolic triple decoder, so that the performance of the triple decoder can be further improved.
In some embodiments, other drugs and target field knowledge which can be used for the drug-disease-target triple target entity completion task can be found, a proper hyperbolic space encoder is designed, and the influence of hyperbolic encoder combinations corresponding to the different drugs and target field knowledge on the final performance of the model can be explored.
In some embodiments, in a heterogeneous network including complex interactions between biological entities and chemical entities, not only hierarchical structures but also other sub-structures, such as ring structures, may be included, so that other non-euclidean spaces, such as spherical spaces, which may be used exclusively for capturing such structures, may also be introduced in an attempt to adaptively capture corresponding sub-structures of the heterogeneous network.
The embodiment of the invention also provides an application of the medicine-disease-target triple target entity completion method, and the medicine-disease-target triple target entity completion method is applied to prediction of targets corresponding to medicines and diseases.
The following is a further explanation of the method for complementing triple target entities of drug-disease-target according to the present invention by specific embodiments:
example 1
Firstly, setting an experimental environment and data, and building a neural network model framework by using Python 3.6.13+ Pythrch 1.10.2 in order to verify the assumption that hyperbolic space validity is introduced into the current task. The experiment was performed in the environment of Linux 4.18, intel Xeon Platinum 8360Y,40G RAM, NVIDIA A100-SXM using the Riemannian Adam algorithm with a learning rate of 0.005 as an optimizer during training.
The data set is divided according to the drug-target pairs, that is, triplets with the same drug-target pairs are distributed into the same set (training set, verification set or test set), in this case, part of the drugs and targets in the test set do not appear in the training set, which can better check the performance of the model in inferring the targets. In other words, while the same given drug-disease combination may appear in the test set as in the training set, what is done in the test set must be an inference of a target that the given drug never has seen.
In addition, all triplets corresponding to 60%:20%:20% of all drug-target pair types will be assigned to the training set, the verification set and the test set, respectively, and this process will be repeated five times, and for each individual process, the entire data set will be randomly shuffled before the data is segmented to ensure that the drug-target pair types corresponding to the triplets obtained for each set are different at each segmentation. The average evaluation results of these five independent runs were recorded, each run using the validation set to adjust the model hyper-parameters, and the model was tested in the test set to obtain performance evaluation results. Table 1 is a list of important parameters in the example model:
table 1 important parameters in the model
Figure BDA0003905465880000121
The experimental performance indexes are as follows: in order to verify the effectiveness of the hypothesis in the translational-based drug-disease-target triple target entity completion task, the experiment mainly uses MRR (Mean Recocal Rank), hits @1, hits @3 and hits @10 which are commonly used in knowledge map completion to evaluate the prediction performance of different methods on candidate targets, and the larger the values of the indexes are, the more accurate the model prediction is. Embodiments of the present invention select MRR as the primary assessment indicator because it has better robustness than the latter.
To further prove the hypothesis proposed by the present invention, a representative euclidean translation-based drug-disease-target triple entity completion method DDTE is added to the comparison in the examples, and complete euclidean correspondence of the hyperbolic triple decoder in the examples is achieved, more specifically, all hyperbolic operations in the original model are modified into euclidean operations (e.g., modifying a complete lorentz linear layer into a fully connected euclidean layer), and the overall parameters are kept consistent, that is, the two are mainly different in whether hyperbolic operations are used and optimization is performed. In order to further remove the uncontrollable influence of domain knowledge on the model performance, it is first ensured that all methods involved in the comparison do not use a drug and target encoder (i.e. do not add any relevant domain knowledge), and instead, the acquisition of the drug and target embedding in the triplet decoder is modified to use a self-learning drug and target embedding representation contained in the decoder, and this type of embedding is actually obtained by learning the structural information of the heterogeneous network behind the model training directly.
Based on two representative physical embedding dimensions 16 (table 2) and 128 (table 3) (16 dimensions are representative hyperbolic embedding dimensions, and 128 dimensions are common european embedding dimensions), the present embodiment compares the performance of the above standard method for removing drug and target encoders and the two mentioned comparison methods on a compiled heterogeneous drug-disease-target network with hierarchical structure, as shown in tables 2 and 3, and it can be seen that the standard method without drug and target encoders in the present embodiment is generally better than the european correspondence of DDTE and standard method without encoder, which also do not use domain knowledge. On the main evaluation index MRR, the Euclidean correspondence of the standard method without the encoder is obviously better than that of DDTE, and simultaneously, the standard method without the encoder obtains additional performance improvement of 5.5% (16 dimensions) and 5.8% (128 dimensions) compared with the Euclidean correspondence, which can effectively illustrate the feasibility and effectiveness of introducing the hyperbolic space in the current task.
TABLE 2
Figure BDA0003905465880000131
TABLE 3
Figure BDA0003905465880000132
In addition, in order to prove the effectiveness of introducing external domain knowledge into the current task by the hyperbolic drug and target point encoder provided in the embodiment, based on the same setting as the above experiment, on the basis of the standard method without an encoder, five variants with different combinations of hyperbolic encoders are additionally added in a comparison experiment:
a. adding a hyperbolic ECFP6 encoder and a target sequence similarity encoder (ECFP-SEQ);
b. adding a hyperbolic ECFP6 encoder and a GO-PPI network encoder (ECFP-NET);
c. adding only hyperbolic ECFP6 encoders (ECFP-NONE);
d. only adding a hyperbolic target sequence similarity encoder (NON-SEQ);
e. only hyperbolic GO-PPI network encoders (NON-NET) are added.
The results of comparing the properties are shown in Table 4 (16D) and Table 5 (128D).
TABLE 4
Figure BDA0003905465880000141
TABLE 5
Figure BDA0003905465880000142
It can be seen that, in this embodiment, compared with the standard method without an encoder, no matter which representative dimension is used, the use of each hyperbolic encoder is beneficial to further improve the candidate target prediction performance, and the combination of ECFP-SEQ has the largest performance improvement amplitude.
In summary, the invention provides a drug-disease-target triple target entity completion method and application, and under the assumption that the hyperbolic space is reasonably used to improve the accuracy of a translation-based drug-disease-target triple target entity completion task, a lorentz space is introduced, a complete lorentz linear conversion layer is defined in the space, a hyperbolic triple decoder is designed based on the lorentz linear conversion layer to capture a hidden hierarchical structure in an heterogeneous drug-disease-target network, a hyperbolic drug and target encoder is designed, and related field knowledge is introduced to further improve the prediction performance of candidate targets. The experimental result shows that under the condition of eliminating the interference including the domain knowledge and the experimental setting, the decoder based on the hyperbolic translation has better performance than the corresponding decoder method based on the Euclidean translation, and has better prediction effect. Meanwhile, experiments prove that the domain knowledge is really helpful for further improving the prediction performance of the hyperbolic triple decoder in the current task.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A method for complementing drug-disease-target triple target entities, which is characterized by comprising the following steps:
constructing a heterogeneous network with drug-disease-target entities, and collecting characteristic data for the drugs and the target entities in the heterogeneous network;
dividing the constructed heterogeneous network into a medicine-disease-target point triple set according to triple generation logic;
respectively encoding the collected characteristic data corresponding to the medicines and the target point entities in the heterogeneous network by using a hyperbolic medicine and target point encoder to generate a hyperbolic medicine embedded lookup table and a hyperbolic target point embedded lookup table;
sending the drug-disease-target triple, the hyperbolic drug embedding lookup table, the hyperbolic target embedding lookup table and the disease number into a hyperbolic triple decoder, and calculating Lorentz distance similarity represented by the embedding of the target entity and the drug-disease combination in each triple for training a model;
and (3) hyperbolic embedding of the combination expression of the drug and the disease to be detected and all candidate targets into a hyperbolic triple decoder so as to obtain probability sequencing of the candidate targets of the drug and the disease to be detected.
2. The method for complementing drug-disease-target triple target entities according to claim 1, wherein the step of constructing the heterogeneous network with the drug-disease-target entities comprises the following steps: constructing a drug-disease-target heterogeneous network by using known data, wherein the drug, disease and target entities in the known data are used as nodes of the heterogeneous network, and known relations between entities are used as edges between the nodes of the heterogeneous network.
3. The method of claim 1, wherein the Morgan extended link fingerprint feature data with radius of 3 is collected for drug entities in the heterogeneous network, and the amino acid sequence similarity feature data and the gene ontology-protein interaction network feature data are collected for target entities.
4. The method of claim 1, wherein the triplet generation logic is: and taking the known relationship among the drugs, the diseases and the target entities as the edges among the drug-disease-target networks, and if a group of drugs, diseases and target entities have edges, forming a drug-disease-target triple.
5. The method according to claim 4, wherein the drug-disease-target triple entity completion method is characterized in that a drug-disease-target heterogeneous network is split into a triple set containing the implicit hierarchical information of the original network according to the triple generation logic.
6. The method of claim 1, wherein the method is based on a hyperbolic space, the hyperbolic space is a lorentz space, and the hyperbolic drug and target encoder and the hyperbolic triple decoder each operate on a lorentz space with constant negative curvature.
7. The method for the entity complementation of drug-disease-target triple targets according to claim 6, wherein a complete Lorentz linear conversion layer is defined in the Lorentz space as the correspondence of the linear conversion layer in the Euclidean space; the dimensions or locations of features/coordinates in the hyperbolic space are changed through the complete lorentz linear conversion layer while ensuring that inputs and inputs remain on the hyperbolic space.
8. The method for triple target entity completion of drug-disease-target according to claim 7, wherein the formula of the complete Lorentz linear conversion layer is:
Figure FDA0003905465870000021
wherein, λ is a fixed hyper-parameter used for controlling the magnitude of the numerical scale in the operation process; σ is a Sigmoid activation function;
Figure FDA0003905465870000022
and
Figure FDA0003905465870000023
is the weight of the intra-layer transition matrix; b is trainable bias; e is a fixed value for ensuring
Figure FDA0003905465870000024
Greater than 0; c is the negative curvature of the lorentz space; x and y are the input and output, respectively, of the complete lorentz linear conversion layer.
9. The method according to claim 1, wherein after the training of the model is completed, hyperbolic embedding corresponding to the candidate target to be matched with the model, hyperbolic embedding of the drug in the model and a corresponding disease number are obtained for the drug-disease combination to be tested, and the information is sent to a hyperbolic triple decoder to obtain a similarity probability score of each candidate target.
10. The use of the triple target entity complement method of drug-disease-target according to any of claims 1 to 9 for predicting the corresponding target of drugs and diseases.
CN202211302748.8A 2022-10-24 2022-10-24 Medicine-disease-target triple target entity completion method and application Pending CN115563312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211302748.8A CN115563312A (en) 2022-10-24 2022-10-24 Medicine-disease-target triple target entity completion method and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211302748.8A CN115563312A (en) 2022-10-24 2022-10-24 Medicine-disease-target triple target entity completion method and application

Publications (1)

Publication Number Publication Date
CN115563312A true CN115563312A (en) 2023-01-03

Family

ID=84746567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211302748.8A Pending CN115563312A (en) 2022-10-24 2022-10-24 Medicine-disease-target triple target entity completion method and application

Country Status (1)

Country Link
CN (1) CN115563312A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842958A (en) * 2023-09-01 2023-10-03 北京邮电大学 Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116842958A (en) * 2023-09-01 2023-10-03 北京邮电大学 Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof
CN116842958B (en) * 2023-09-01 2024-02-06 北京邮电大学 Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof

Similar Documents

Publication Publication Date Title
Lucca et al. CC-integrals: Choquet-like copula-based aggregation functions and its application in fuzzy rule-based classification systems
Graziani et al. Concept attribution: Explaining CNN decisions to physicians
Dennis et al. AGFS: Adaptive Genetic Fuzzy System for medical data classification
Saadi et al. Investigation of effectiveness of shuffled frog-leaping optimizer in training a convolution neural network
WO2019178291A1 (en) Methods for data segmentation and identification
CN106021990A (en) Method for achieving classification and self-recognition of biological genes by means of specific characters
CN116386899A (en) Graph learning-based medicine disease association relation prediction method and related equipment
CN115563312A (en) Medicine-disease-target triple target entity completion method and application
CN115393632A (en) Image classification method based on evolutionary multi-target neural network architecture structure
Salamai et al. Lesion-aware visual transformer network for Paddy diseases detection in precision agriculture
AV et al. Evaluation of Recurrent Neural Network Models for Parkinson's Disease Classification Using Drawing Data
CN113764034A (en) Method, device, equipment and medium for predicting potential BGC in genome sequence
Livieris et al. An advanced conjugate gradient training algorithm based on a modified secant equation
Tahir et al. Protein subcellular localization in human and hamster cell lines: employing local ternary patterns of fluorescence microscopy images
Aung et al. Modularity based ABC algorithm for detecting communities in complex networks
Yang et al. The computational drug repositioning without negative sampling
Dong et al. A novel multi-criteria discounting combination approach for multi-sensor fusion
Chen et al. A novel selective ensemble classification of microarray data based on teaching-learning-based optimization
CN112766410A (en) Rotary kiln firing state identification method based on graph neural network feature fusion
Wang CPSO: Chaotic Particle Swarm Optimization for Cluster Analysis
Iranmanesh et al. Inferring gene regulatory network using path consistency algorithm based on conditional mutual information and genetic algorithm
Shi et al. Semi-supervised learning protein complexes from protein interaction networks
Zhang et al. High dimensional missing data imputation for classification problems: A hybrid model based on K-nearest neighbor and genetic algorithm
CN117577214B (en) Compound blood brain barrier permeability prediction method based on stack learning algorithm
Zhang et al. Drug Repositioning Method Based on Deep Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination