CN116129989A - Method, device, terminal equipment and medium for predicting drug relevance - Google Patents

Method, device, terminal equipment and medium for predicting drug relevance Download PDF

Info

Publication number
CN116129989A
CN116129989A CN202310157524.0A CN202310157524A CN116129989A CN 116129989 A CN116129989 A CN 116129989A CN 202310157524 A CN202310157524 A CN 202310157524A CN 116129989 A CN116129989 A CN 116129989A
Authority
CN
China
Prior art keywords
lncrna
feature vector
detected
target drug
drug
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310157524.0A
Other languages
Chinese (zh)
Inventor
邓磊
胡小文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202310157524.0A priority Critical patent/CN116129989A/en
Publication of CN116129989A publication Critical patent/CN116129989A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The application is applicable to the technical field of biological information, and provides a method, a device, terminal equipment and a medium for predicting drug relevance, wherein a bipartite graph is constructed according to a relevance pair formed by lncRNA and a drug, and vector representation of the lncRNA and the drug is initialized; performing neighbor node aggregation on the vector representation by using a neural network model to obtain initial feature vectors of the lncRNA and the drug; constructing a local structure neighbor and a global semantic neighbor, and constructing a first contrast learning loss and a second contrast learning loss according to the initial feature vector; according to the contrast learning loss, updating the initial feature vector by combining a BPR loss function to obtain an intermediate feature vector of the lncRNA and the drug; if the intermediate feature vector meets the update termination condition, constructing a relevance prediction model by using the intermediate feature vector, and predicting relevance of the lncRNA and the drug. The method and the device can improve the accuracy of prediction of the relevance of the lncRNA and the drug.

Description

Method, device, terminal equipment and medium for predicting drug relevance
Technical Field
The application belongs to the technical field of biological information, and particularly relates to a method, a device, terminal equipment and a medium for predicting drug relevance.
Background
In recent years, there has been increasing evidence that ncrnas (non-coding RNAs, a functional RNA molecule) can affect drug efficiency by modulating genes associated with drug sensitivity, such as inducing alternative signaling pathways. Long non-coding RNA (lncRNA, a ncRNA) is an RNA molecule that is more than 200 nucleotides in length. lncRNA plays a key role in many biological processes such as epigenetic regulation, cell cycle regulation, cell differentiation, transcription and post-transcriptional regulation, and genomic splicing.
Numerous related studies have shown that lncRNA regulates human disease through the co-action of a range of biomolecules in organisms. Their mutations and dysfunctions are closely related to human diseases such as nervous system diseases, hematopathy, cardiovascular diseases and various cancers. With the development of sequencing technology, more and more lncRNA molecules are detected and analyzed in terms of sensitivity and depth, especially their role in drug sensitivity. Studies show that lncRNA can regulate drug sensitivity related genes, induce alternative signal pathways and further influence drug efficacy. For example, lncrrnanorad (non-coding RNA activated by DNA damage) inhibits proliferation of osteosarcoma HOS (human osteosarcoma cells)/DDP (human lung adenocarcinoma cells) and increases their sensitivity to cisplatin by targeting miR-410-3 p. The chemotherapy of the gall bladder cancer induces the sensitivity of the gall bladder cancer cells through a key regulatory factor lncRNA1 (GBCDrlnc 1), so that the identification of the association of the lncRNA and the sensitivity of the medicine has important significance for the development of the medicine. However, traditional biological assay-based methods tend to consume significant amounts of time and labor, and are highly blind, resulting in inaccurate predictions of lncRNA and drug association.
Disclosure of Invention
The embodiment of the application provides a method, a device, terminal equipment and a medium for predicting the relevance of a drug, which can solve the problem that the prediction of the relevance of the lncRNA and the drug is inaccurate at present.
In a first aspect, an embodiment of the present application provides a method for predicting drug association, including:
step 1, constructing a correlation bipartite graph according to a correlation pair formed by the lncRNA to be detected and a target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively;
step 2, running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
step 3, constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
step 4, constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
Step 5, calculating comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and respectively and reversely transmitting and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain an intermediate feature vector of the lncRNA to be detected and an intermediate feature vector of the target drug;
step 6, if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition, taking the intermediate feature vector of the lncRNA to be detected as the final feature vector of the lncRNA to be detected and taking the intermediate feature vector of the target drug as the final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the step 2, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the step 2, and returning to the step 2;
step 7, constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
and step 8, predicting the relevance of the lncRNA to be detected and the target drug by using a relevance prediction model.
Optionally, the neural network model in step 2 is a graph roll-up neural network model.
Optionally, in step 2, running a neural network model on the associated bipartite graph, and respectively performing neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug, where the method includes:
by calculation formula
Figure BDA0004092975270000031
Figure BDA0004092975270000032
Obtaining an initial feature vector e of the lncRNA to be detected n Initial feature vector e of target drug d; wherein ,Nnn Representing neighbor node set, nn of lncRNA to be measured d A set of neighbor nodes representing the target drug,
Figure BDA0004092975270000033
embedding of node vector representing lncRNA to be tested in layer I of graph roll-up neural network,/->
Figure BDA0004092975270000034
Embedding a node vector representing a target drug in a first layer of a graph convolutional neural network, wherein L represents the total layer number of the graph convolutional neural network and +.>
Figure BDA0004092975270000035
Embedding of node vector representing lncRNA to be tested in layer 1+1 of graph roll-up neural network, < >>
Figure BDA0004092975270000036
The node vector representing the target drug is embedded at layer 1+1 of the graph convolutional neural network.
Optionally, in step 3, constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, including:
by calculation formula
Figure BDA0004092975270000037
Obtaining local structure neighbor learning loss of lncRNA to be detected
Figure BDA0004092975270000038
Local structural neighbor learning loss of target drug +.>
Figure BDA0004092975270000039
wherein ,
Figure BDA00040929752700000310
Initial eigenvector e representing lncRNA to be tested d At the output of the kth layer of the graph roll-up neural network,
Figure BDA00040929752700000311
an initial feature vector e representing the target drug n At the output of the kth layer of the graph-rolled neural network, k represents an even number, τ represents the hyper-parameter of the softmax function, +.>
Figure BDA00040929752700000312
Vector representation representing lncRNA to be tested, +.>
Figure BDA00040929752700000313
Vector representation of target drug, n_num represents total number of lncRNA obtained in step 1, d_num represents total number of drug obtained in step 1,/o>
Figure BDA00040929752700000314
Vector representation representing the ith lncRNA at layer 0 of the graph convolutional neural network, +.>
Figure BDA00040929752700000315
Vector representation representing the jth drug at layer 0 of the graph convolutional neural network, i=1, 2,..n_num, j=1, 2,..d_num;
by calculation formula
Figure BDA0004092975270000041
Obtaining a first contrast studyLoss L local The method comprises the steps of carrying out a first treatment on the surface of the Where α represents a hyper-parameter for balancing weights.
Optionally, in step 4, constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, including:
by calculation formula
Figure BDA0004092975270000042
Obtaining global semantic neighbor contrast learning loss of lncRNA to be detected
Figure BDA0004092975270000043
Global semantic neighbor contrast learning penalty of target drug >
Figure BDA0004092975270000044
wherein ,ci Representing prototype of lncRNA to be tested, c j Representing prototypes of the drug, C representing a collection of prototypes;
by calculation formula
Figure BDA0004092975270000045
Obtaining a second contrast learning loss L glocal The method comprises the steps of carrying out a first treatment on the surface of the Where β represents the hyper-parameter for the balance weight.
Optionally, step 5 includes:
by calculation formula
L=L BPR1 L local2 L glocal3 ||θ|| 2
Figure BDA0004092975270000046
Obtaining comprehensive loss L; wherein lambda is 123 All represent hyper-parameters of the balance weights, θ represents parameters of the graph convolution neural network, σ represents a nonlinear activation function, τ represents paired training data,
Figure BDA0004092975270000047
Figure BDA0004092975270000048
Figure BDA0004092975270000049
represents the lncRNA to be tested: n and target drug: d, d + Has relevance between->
Figure BDA00040929752700000410
Represents the lncRNA to be tested: n and sampling drugs: d, d - No correlation exists between the two;
initial feature vector e of lncRNA to be detected by utilizing comprehensive loss L n And an initial feature vector e of the target drug d Performing back propagation update to obtain an intermediate feature vector e of the lncRNA to be detected n Intermediate eigenvector e of' and target drug d '。
Optionally, the expression of the relevance prediction model in step 7 is as follows:
Figure BDA00040929752700000411
wherein ,
Figure BDA0004092975270000051
represents the lncRNA to be tested: n and target drug: and d, an association score.
Optionally, before performing step 6, the prediction method further includes:
according to the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug, calculating an AUC value and an AUPR value respectively;
If the AUC value and the AUPR value reach the maximum value, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition; otherwise the first set of parameters is selected,
and determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug do not meet the preset updating termination condition.
In a second aspect, embodiments of the present application provide a device for predicting drug association, including:
the initialization module is used for constructing a correlation bipartite graph according to a correlation pair which is obtained in advance and consists of the lncRNA to be detected and the target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively;
the aggregation module is used for running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
the first contrast learning loss module is used for constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
the second contrast learning loss module is used for constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
The intermediate feature vector module is used for calculating comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug;
the final feature vector module is used for judging whether the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition; if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition, the intermediate feature vector of the lncRNA to be detected is used as a final feature vector of the lncRNA to be detected, and the intermediate feature vector of the target drug is used as a final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the aggregation module, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the aggregation module, and returning to the execution aggregation module;
the prediction model module is used for constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
And the prediction module is used for predicting the relevance of the lncRNA to be detected and the target drug by using the relevance prediction model.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for predicting drug relevance described above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor, implements a method for predicting drug relevance as described above.
The scheme of the application has the following beneficial effects:
in some embodiments of the present application, according to an initial feature vector of a lncRNA to be detected and an initial feature vector of a target drug, a first contrast learning loss and a second contrast learning loss are constructed, and then according to the first contrast learning loss, the second contrast learning loss and the BPR loss, the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug are respectively updated by back propagation, so that a more accurate feature vector can be obtained, thereby improving the accuracy of prediction of relevance between the lncRNA to be detected and the target drug.
Other advantages of the present application will be described in detail in the detailed description section that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for predicting drug relevance according to an embodiment of the present disclosure;
FIG. 2a is a graph of ROC comparing the predicted drug association method provided by one embodiment of the present application with other prior art performance;
FIG. 2b is a visual diagram of a method for predicting drug relevance and other prior art performance versus drug node feature vector aggregation according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a device for predicting drug association according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Aiming at the problem that the prediction of the relevance between the lncRNA and the drug is inaccurate at present, the application provides a method, a device, a terminal device and a medium for predicting the relevance between the drug, wherein a first contrast learning loss and a second contrast learning loss are constructed according to an initial feature vector of the lncRNA to be detected and an initial feature vector of a target drug, and then the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug are respectively subjected to back propagation update according to the first contrast learning loss, the second contrast learning loss and the BPR loss, so that more accurate feature vectors can be obtained, and the accuracy of the prediction of the relevance between the lncRNA to be detected and the target drug is improved.
As shown in fig. 1, the method for predicting drug association provided in the present application mainly includes the following steps:
step 1, constructing a correlation bipartite graph according to a correlation pair formed by the lncRNA to be detected and the target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively.
In the examples of the present application, the above-mentioned association pair of lncRNA to be tested and the target drug can be obtained from the RNAacrDrug database (containing RNAs from multiple sets of chemical data related to drug sensitivity). The lncRNA to be measured is any one of a plurality of lncRNA obtained, and the target drug is any one of a plurality of drug obtained.
It should be noted that, in the embodiment of the present application, the bipartite graph may be constructed by a common bipartite graph construction method.
The vector representation of the lncRNA to be detected and the vector representation of the target drug are randomly initialized to obtain more accurate feature vectors later, and if the operation is not performed, the feature vectors of the lncRNA to be detected and the feature vectors of the target drug are always zero, so that the update of the feature vectors is meaningless.
And 2, running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug.
In some embodiments of the present application, the neural network model is a graph roll-up neural network model.
Specifically, by a calculation formula
Figure BDA0004092975270000091
Figure BDA0004092975270000092
Obtaining an initial feature vector e of the lncRNA to be detected n Initial characterization of target drugQuantity e d; wherein ,Nnn Representing neighbor node set, nn of lncRNA to be measured d A set of neighbor nodes representing the target drug,
Figure BDA0004092975270000093
embedding of node vector representing lncRNA to be tested in layer I of graph roll-up neural network,/->
Figure BDA0004092975270000099
Embedding a node vector representing a target drug in a first layer of a graph convolutional neural network, wherein L represents the total layer number of the graph convolutional neural network and +.>
Figure BDA0004092975270000095
Embedding of node vector representing lncRNA to be tested in layer 1+1 of graph roll-up neural network, < >>
Figure BDA0004092975270000096
The node vector representing the target drug is embedded at layer 1+1 of the graph convolutional neural network.
It should be noted that, the initial feature vector e of the lncRNA to be tested n Initial feature vector e of target drug d In the layer aggregation stage, the vector representation of the lncRNA node to be detected of each graph volume is laminated
Figure BDA0004092975270000097
And vector representation of the target drug->
Figure BDA0004092975270000098
The vector representation accuracy can be improved by the aggregation obtained by a weighted sum method.
And step 3, constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug.
The above-mentioned local structure neighbors represent the adjacent lncRNA nodes to be measured or adjacent target drug nodes in the spatial structure in the associated bipartite graph in step 1.
It should be noted that, the above-mentioned local structure neighbors are used to describe the high-order association in the association bipartite graph.
Feature vector e of layer I in graph roll-up neural network (l) Is the weighted sum of the first neighbor of each node (lncRNA node to be tested or drug target node).
Since even information propagation of bipartite graphs naturally aggregates information from homogeneous structural neighbors (structural neighbor nodes of the same type), embedding of homogeneous neighbors (nodes of the same type) can be obtained from the output of even layers of the graph-rolling network, and then the obtained feature vectors are utilized to simulate higher-order relationships between local neighbor nodes.
And 4, constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug.
The global semantic neighbors are expressed in the correlated bipartite graph in the step 1, are not adjacent in spatial structure, but have possible correlation (similar node effects, but no direct correlation on the bipartite graph) of the lncRNA nodes or target drug nodes to be detected, and the global semantic neighbors are mainly constructed to relieve the influence of data sparsity on experimental results and reduce the influence of noise generated in the construction process of local structure neighbors on prediction effects.
To construct the appropriate global semantic neighbor contrast learning objective, in some embodiments of the present application, prototype contrast learning objectives are developed by learning potential prototypes for each node (lncRNA node to be tested or target drug node) to identify global semantic neighbors. Based on the greater likelihood that similar lncRNA nodes to be tested or target drug nodes are located in adjacent feature spaces, a prototype can be defined as the center of a cluster consisting of a set of semantic neighbors, i.e., potential prototypes can be learned by a clustering algorithm (e.g., k-nearest neighbor algorithm).
And 5, calculating comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and respectively and reversely propagating and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug.
The expression of the above BPR (Bayesian Personalized Ranking, bayesian personalized ordering) loss function is as follows:
Figure BDA0004092975270000101
wherein, sigma is a nonlinear activation function,
Figure BDA0004092975270000102
Figure BDA0004092975270000103
representing training data pairs->
Figure BDA0004092975270000104
Represents the observed lncRNA to be tested: n and target drug d + There is a correlation between- >
Figure BDA0004092975270000105
Sample drug (randomly initialized drug node, drug not associated with lncRNA: n): d, d - And (3) testing lncRNA: n has no experimentally verified correlation.
In some embodiments of the present application, after step 5 is performed, the intermediate vector is determined as follows:
and a step a, calculating an AUC value and an AUPR value respectively according to the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug.
The AUC value (area surrounded by the axis under the ROC curve (receiver operation characteristic curve)) and the AUPR value (area surrounded by the axis under the RC curve) are calculated herein to determine whether the intermediate feature vector of the lncRNA to be tested and the intermediate feature vector of the target drug at that time have satisfied the update termination condition (the optimal intermediate feature vector of the lncRNA to be tested and the optimal intermediate feature vector of the target drug). When the update termination condition is not met, the step is repeatedly executed until model fitting (solving the optimal intermediate feature vector of the lncRNA to be detected and the optimal intermediate feature vector of the target drug) is carried out, and the accuracy of the feature vector is improved, so that the accuracy of relevance prediction is improved.
It should be noted that, calculating the AUC value and the AUPR value belongs to common general knowledge, and the calculation process is not described herein.
Step b, if the AUC value and the AUPR value reach the maximum value, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition; otherwise, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug do not meet the preset updating termination condition.
Step 6, if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition, taking the intermediate feature vector of the lncRNA to be detected as the final feature vector of the lncRNA to be detected and taking the intermediate feature vector of the target drug as the final feature vector of the target drug; otherwise the first set of parameters is selected,
and (2) taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the step (2), taking the intermediate feature vector of the target drug as the vector representation of the target drug in the step (2), and returning to the step (2).
And 7, constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug.
Specifically, the expression of the constructed relevance prediction model is as follows:
Figure BDA0004092975270000111
wherein ,
Figure BDA0004092975270000112
represents the lncRNA to be tested: n and target drug: and d, an association score.
And step 8, predicting the relevance of the lncRNA to be detected and the target drug by using a relevance prediction model.
The relevance prediction model constructed in the step 7 is input with the final feature vector of the lncRNA to be detected and the final feature vector of the target drug obtained after the steps 1-6 are executed, so that the relevance score of the lncRNA to be detected and the target drug is obtained, and the higher the relevance score of the lncRNA to be detected and the target drug is, the higher the relevance between the lncRNA to be detected and the target drug is.
The specific process of constructing the first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in step 3 (constructing the local structural neighbor of the association pair based on the association bipartite graph and constructing the first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug) is illustrated as follows.
Step 3.1, through a calculation formula
Figure BDA0004092975270000121
Obtaining local structure neighbor learning loss of lncRNA to be detected
Figure BDA0004092975270000122
Local structural neighbor learning loss of target drug +.>
Figure BDA0004092975270000123
wherein ,
Figure BDA0004092975270000124
initial eigenvector e representing lncRNA to be tested d Output of the kth layer of the neural network is rolled up in the graph,/->
Figure BDA0004092975270000125
An initial feature vector e representing the target drug n At the output of the kth layer of the graph-rolled neural network, k represents an even number, τ represents the hyper-parameter of the softmax function, +.>
Figure BDA0004092975270000126
Vector representation representing lncRNA to be tested, +. >
Figure BDA0004092975270000127
Vector representation of target drug, n_num represents total number of lncRNA obtained in step 1, d_num represents total number of drug obtained in step 1,/o>
Figure BDA0004092975270000128
Vector representation representing the ith lncRNA at layer 0 of the graph convolutional neural network, +.>
Figure BDA0004092975270000129
Vector representation of the jth drug at layer 0 of the graph convolutional neural network, i=1, 2.
Specifically, the feature vector of the node (the lncRNA node to be detected or the target drug node in the associated bipartite graph) and the feature vector output by the even layer graph convolution network model are taken as positive samples, other feature vectors are taken as negative samples, and finally the distance between the positive samples is minimized by using an InfoNCE (self-supervision contrast learning) loss function.
Step 3.2, through a calculation formula
Figure BDA00040929752700001210
Obtaining a first contrast learning loss L local
Where α represents a hyper-parameter for balancing weights.
Next, an exemplary description is given to a specific process of constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in step 4 (constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug).
Step 4.1, through a calculation formula
Figure BDA0004092975270000131
Obtaining global semantic neighbor contrast learning loss of lncRNA to be detected
Figure BDA0004092975270000132
Global semantic neighbor contrast learning penalty of target drug>
Figure BDA0004092975270000133
wherein ,ci Representing prototype of lncRNA to be tested, c j Representing prototypes of the drug, C representing a collection of prototypes. The prototype represents the center of a cluster in the global semantic structure neighborhood where all lncRNA or drugs have possible associations.
Step 4.2, through a calculation formula
Figure BDA0004092975270000134
Obtaining a second contrast learning loss L glocal
Where β represents the hyper-parameter for the balance weight.
The analysis process of step 4 is exemplarily described below.
Global semantic neighbors are constructed primarily to maximize the following likelihood functions (a function about parameters in the statistical model, representing likelihood in model parameters):
Figure BDA0004092975270000135
where Θ represents model parameters and a represents an incidence matrix. c i and cj Respectively representing the lncRNA to be tested: n and target drug: potential prototypes of d, p (·) represent likelihood functions that need to be maximized.
In the embodiment of the present application, based on an EM (Expectation-Maximization algorithm) optimization algorithm and an InfoNCE loss function, nodes in the same cluster are defined as positive samples, and nodes of different clusters are regarded as negative samples.
In the embodiment of the present application, in order to optimize the contrast learning manner of the global semantic neighbors, the lower bound of the likelihood function is obtained through the Jensen inequality, specifically as follows:
Figure BDA0004092975270000136
wherein ,Q(ci |e n ) Representation c i Distribution of Q (c) i |e d ) Representation c j Is a distribution of (a).
When e n (e d ) After being observed, the above equation is optimized using the EM optimization algorithm:
in E-step (one of the steps of the EM optimization algorithm), E n and ed Is fixed, so the K-means algorithm can be applied to vector representations of the lncRNA and target drug to be tested to estimate Q (c i |e n) and Q(cj |e d ). If lncRNA is to be tested: n belongs to cluster i and the target drug d belongs to cluster j, then the corresponding cluster center c i and cj Prototypes of n and d, respectively.
For prototype c i and cj Which is distributed as a norm
Figure BDA0004092975270000141
and
Figure BDA0004092975270000142
The distribution of other prototypes is
Figure BDA0004092975270000143
and
Figure BDA0004092975270000144
It is then possible to obtain:
Figure BDA0004092975270000145
in all clusters, assuming that the distribution of lncRNA to be tested and the target drug is an isotropic gaussian distribution, then one can obtain:
Figure BDA0004092975270000146
wherein ,(en -c i ) 2 =2-2e l ·c l . Assuming that each gaussian has the same derivative, represented by the temperature hyper-parameter σ, there is:
Figure BDA0004092975270000147
clustering the embedding of the lncRNA to be detected and the target drug by using a K-means algorithm to respectively obtain K clusters of the lncRNA to be detected and the target drug. Obtaining
Figure BDA0004092975270000148
and
Figure BDA0004092975270000149
After that, the final comparative learning loss (second comparative learning loss) can be obtained:
Figure BDA00040929752700001410
The following describes an exemplary procedure of step 5 (calculating a comprehensive loss according to the first contrast learning loss, the second contrast learning loss, and the BPR loss function, and updating the initial feature vector of the lncRNA to be measured and the initial feature vector of the target drug by using the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be measured and the intermediate feature vector of the target drug by counter propagation, respectively).
Step 5.1, through the calculation formula
L=L BPR1 L local2 L glocal3 ||θ|| 2
Figure BDA0004092975270000151
Obtaining comprehensive loss L; wherein lambda is 123 Each representing a hyper-parameter of the balance weight, θ representing a parameter of the graph convolution neural network (a kernel function of the graph convolution network), σ representing a nonlinear activation function, τ representing pairs of training data,
Figure BDA0004092975270000152
Figure BDA0004092975270000153
represents the lncRNA to be tested: n and target drug: d, d + Has relevance between->
Figure BDA0004092975270000154
Represents the lncRNA to be tested: n and sampling drugs: d, d - There is no correlation between them.
Step 5.2, utilizing the comprehensive loss L to respectively detect initial feature vectors e of the lncRNA to be detected n And an initial feature vector e of the target drug d Performing back propagation update to obtain an intermediate feature vector e of the lncRNA to be detected n Intermediate eigenvector e of' and target drug d '。
It should be noted that the back propagation by using the loss is common knowledge, and the process thereof is not described here.
In some embodiments of the present application, there is also provided validity verification of a prediction method of drug association, the result being as follows:
AUC AUPR
NDSGCL_N 0.9103 0.9178
NDSGCL_L 0.9285 0.9444
NDSGCL_G 0.9549 0.9267
NDSGCL 0.9734 0.9800
in particular, to verify the impact of local structure neighbor contrast learning and global semantic neighbor contrast learning on model performance, in embodiments of the present application, different variants are constructed, where NDSGCL-N indicates that no contrast learning method is used, NDSGCL-L indicates that only local structure neighbors are used, NDSGCL-G indicates that only global semantic neighbors are used, NDSGCL indicates that both local structure neighbors and global semantic neighbors are used, and the results are shown in the table above. It can be deduced from the above table that the local structure neighbors can be used alone to effectively extract higher order relations between nodes, but the improvement of AUC is not great due to the introduction of noise in the process of constructing positive and negative samples; also, using global semantic neighbors alone can significantly alleviate data sparsity, but the boost to AUPR is not significant because higher-order associations between nodes cannot be fully exploited. Combining these two methods is critical to improving the predictive accuracy of NDSGCL.
To further evaluate the performance of the predictive methods of drug association provided herein, in some embodiments of the present application, a comparison is made in conjunction with other currently most advanced methods, as shown in fig. 2a and 2 b. The ordinate of fig. 2a represents true rate, the abscissa of fig. 2a represents false positive rate, and in fig. 2a, GCN represents graph roll-up network method (migration to lncRNA and drug association prediction problem); the lightGCN represents an optimization of the graph convolutional network, and the method abandons the characteristic change and nonlinear activation of the traditional graph convolutional network and only keeps the node aggregation of the graph convolutional network; GCL-ED represents a contrast learning method of a data augmentation method based on random missing of edges; GCL-ND represents a contrast learning method of a data augmentation method of random loss of disease nodes; MLRDFM indicates that MLRDFM integrates the similarity of four miRNAs and two diseases, and the association of miRNAs and diseases is predicted by a deep decomposition machine method; NDSGCL refers to a method for predicting drug association provided herein.
Init in fig. 2b represents a characteristic representation of the original drug node; the lightGCN represents an optimization of the graph convolution network, abandons characteristic change and nonlinear activation of the traditional graph convolution network, and only keeps node aggregation of the graph convolution network; GCL-ED represents a contrast learning method of a data augmentation method based on random missing of edges; GCL-ND represents a contrast learning method of a data augmentation method of random loss of disease nodes; NDSGCL refers to a method for predicting drug association provided herein.
As can be seen from fig. 2a and 2b, the methods for predicting drug association provided in the present application are superior to other methods currently most advanced.
As can be seen from the above steps, the method for predicting the drug relevance provided by the present application constructs a first contrast learning loss and a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, and then performs back propagation update on the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug according to the first contrast learning loss, the second contrast learning loss and the BPR loss, so that a more accurate feature vector can be obtained, thereby improving the accuracy of predicting the relevance of the lncRNA to be detected and the target drug.
The drug association prediction device provided by the application is exemplified in the following in connection with specific embodiments.
As shown in fig. 3, an embodiment of the present application provides a device for predicting drug association, where the device 300 for predicting drug association includes:
the initialization module 301 is configured to construct a correlation bipartite graph according to a pre-acquired correlation pair formed by the lncRNA to be tested and the target drug, and randomly initialize vector representation of the lncRNA to be tested and vector representation of the target drug, respectively;
the aggregation module 302 is configured to run a neural network model on the associated bipartite graph, and perform neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
the first contrast learning loss module 303 is configured to construct a local structure neighbor of the association pair based on the association bipartite graph, and construct a first contrast learning loss according to an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
the second contrast learning loss module 304 is configured to construct a global semantic neighbor of the association pair based on the association bipartite graph, and construct a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
The intermediate feature vector module 305 is configured to calculate a comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and update the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by using the comprehensive loss to obtain an intermediate feature vector of the lncRNA to be detected and an intermediate feature vector of the target drug;
the final feature vector module 306 is configured to determine whether the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset update termination condition; if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition, the intermediate feature vector of the lncRNA to be detected is used as a final feature vector of the lncRNA to be detected, and the intermediate feature vector of the target drug is used as a final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the aggregation module, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the aggregation module, and returning to the execution aggregation module;
the prediction model module 307 is configured to construct a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
The prediction module 308 is configured to predict the relevance of the lncRNA to be detected and the target drug by using a relevance prediction model.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
As shown in fig. 4, an embodiment of the present application provides a terminal device, as shown in fig. 4, a terminal device D10 of the embodiment includes: at least one processor D100 (only one processor is shown in fig. 4), a memory D101 and a computer program D102 stored in the memory D101 and executable on the at least one processor D100, the processor D100 implementing the steps in any of the various method embodiments described above when executing the computer program D102.
Specifically, when the processor D100 executes the computer program D102, an association bipartite graph is constructed by an association pair formed by the lncRNA to be detected and the target drug, and the vector representation of the lncRNA to be detected and the vector representation of the target drug are respectively initialized randomly, then a neural network model is operated on the association bipartite graph, neighbor node aggregation is respectively carried out on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug, then a local structure neighbor of the association pair is constructed based on the association bipartite graph, and then a first contrast learning loss is constructed based on the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, and then a second contrast learning loss is constructed according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, and the initial feature vector of the lncRNA to be detected is respectively updated in a reverse direction to obtain an intermediate feature vector of the lncRNA to be detected and an initial feature vector of the target drug to be detected, and finally an intermediate feature vector of the intermediate feature vector to be detected is predicted according to the association bipartite, and a final feature vector of the lncRNA to be detected and the intermediate feature vector to be detected is finally, and the intermediate feature vector of the intermediate feature of the lncRNA to be detected is predicted, and the intermediate feature vector is finally is predicted and the final feature vector is predicted by using the intermediate feature vector of the intermediate vector and the intermediate feature has been predicted to be predicted and the intermediate feature has. According to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, a first contrast learning loss and a second contrast learning loss are constructed, and then according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug are subjected to back propagation update, so that more accurate feature vectors can be obtained, and the accuracy of prediction of relevance of the lncRNA to be detected and the target drug is improved.
The processor D100 may be a central processing unit (CPU, central Processing Unit), the processor D100 may also be other general purpose processors, digital signal processors (DSP, digital Signal Processor), application specific integrated circuits (ASIC, application Specific Integrated Circuit), off-the-shelf programmable gate arrays (FPGA, field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory D101 may in some embodiments be an internal storage unit of the terminal device D10, for example a hard disk or a memory of the terminal device D10. The memory D101 may also be an external storage device of the terminal device D10 in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device D10. Further, the memory D101 may also include both an internal storage unit and an external storage device of the terminal device D10. The memory D101 is used for storing an operating system, an application program, a boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory D101 may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.
The present embodiments provide a computer program product which, when run on a terminal device, causes the terminal device to perform steps that enable the respective method embodiments described above to be implemented.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a predicting device/terminal equipment of pharmaceutical association, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The application has the following advantages:
1. a general computing framework for predicting the relevance of lncRNA and medicines is provided, in the framework, local structure neighbors are constructed to extract high-order relevance between relevance pairs, and global semantic neighbors are constructed to relieve the influence of data sparsity on experimental results.
2. Compared with the prior art, the influence of data sparsity on experimental results is relieved by introducing a contrast learning idea, and the prediction accuracy is effectively improved. And moreover, the correlation data of the large-scale lncRNA and the drug can be predicted in a very short time, so that blindness and cost of a biological experiment are reduced.
While the foregoing is directed to the preferred embodiments of the present application, it should be noted that modifications and adaptations to those embodiments may occur to one skilled in the art and that such modifications and adaptations are intended to be comprehended within the scope of the present application without departing from the principles set forth herein.

Claims (10)

1. A method for predicting drug association, comprising:
step 1, constructing a correlation bipartite graph according to a correlation pair formed by a to-be-detected lncRNA and a target drug, and randomly initializing vector representation of the to-be-detected lncRNA and vector representation of the target drug respectively;
step 2, running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
step 3, constructing a local structural neighbor of the association pair based on the association bipartite graph, and constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
step 4, constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
step 5, calculating comprehensive loss according to the first comparison learning loss, the second comparison learning loss and a BPR loss function, and respectively and reversely transmitting and updating an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug by utilizing the comprehensive loss to obtain an intermediate feature vector of the lncRNA to be detected and an intermediate feature vector of the target drug;
Step 6, if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset updating termination condition, taking the intermediate feature vector of the lncRNA to be detected as a final feature vector of the lncRNA to be detected, and taking the intermediate feature vector of the target drug as a final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the step 2, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the step 2, and returning to the step 2;
step 7, constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
and step 8, predicting the relevance of the lncRNA to be detected and the target drug by using the relevance prediction model.
2. The prediction method according to claim 1, wherein the neural network model in step 2 is a graph roll-up neural network model;
in the step 2, a neural network model is run on the associated bipartite graph, and neighbor node aggregation is performed on the vector representation of the lncRNA to be detected and the vector representation of the target drug respectively to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug, including:
By calculation formula
Figure FDA0004092975260000021
Figure FDA0004092975260000022
Obtaining an initial feature vector e of the lncRNA to be detected n An initial feature vector e of the target drug d; wherein ,Nnn Representing the neighbor node set of the lncRNA to be detected, and Nn d A set of neighbor nodes representing the target agent,
Figure FDA0004092975260000023
embedding a node vector representing the lncRNA to be tested in the first layer of the graph roll-up neural network,/for>
Figure FDA0004092975260000024
Embedding a node vector representing the target drug in the first layer of the graph roll-up neural network, wherein L represents the total layer number of the graph roll-up neural network, < >>
Figure FDA0004092975260000025
Embedding a node vector representing the lncRNA to be tested in the first layer (1) of the graph roll-up neural network,/I>
Figure FDA0004092975260000026
The node vector representing the target drug is embedded in the layer l+1 of the graph roll-up neural network.
3. The prediction method according to claim 2, wherein the constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in the step 3 includes:
by calculation formula
Figure FDA0004092975260000027
Obtaining the local structure neighbor learning loss of the lncRNA to be detected
Figure FDA0004092975260000031
And local structural neighbor learning loss of the target drug +.>
Figure FDA0004092975260000032
wherein ,
Figure FDA0004092975260000033
Representing the initial feature vector e of the lncRNA to be tested d Output of the kth layer of the neural network is rolled up in the graph,/- >
Figure FDA0004092975260000034
An initial feature vector e representing the target drug n At the output of the kth layer of the graph-rolled neural network, k represents an even number, τ represents the hyper-parameter of the softmax function, +.>
Figure FDA0004092975260000035
Vector representation representing the lncRNA to be tested,>
Figure FDA0004092975260000036
a vector representation representing the target drug, n_num representing the total number of lncRNA obtained in step 1, d_num representing the total number of drugs obtained in step 1,% and%>
Figure FDA0004092975260000037
Vector representation representing the ith lncRNA at layer 0 of the graph convolutional neural network, +.>
Figure FDA0004092975260000038
Vector representation representing the jth drug at layer 0 of the graph convolutional neural network, i=1, 2,..n_num, j=1, 2,..d_num;
by calculation formula
Figure FDA0004092975260000039
Obtaining the first contrast learning loss L local The method comprises the steps of carrying out a first treatment on the surface of the Where α represents a hyper-parameter for balancing weights.
4. The prediction method according to claim 3, wherein the constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in the step 4 includes:
by calculation formula
Figure FDA00040929752600000310
Obtaining the global semantic neighbor contrast learning loss of the lncRNA to be detected
Figure FDA00040929752600000311
All of the target drugsPartial semantic neighbor contrast learning penalty->
Figure FDA00040929752600000312
wherein ,ci Representing a prototype of the lncRNA to be tested, c j Representing prototypes of the drug, C representing a collection of prototypes;
by calculation formula
Figure FDA00040929752600000313
Obtaining the second contrast learning loss L glocal The method comprises the steps of carrying out a first treatment on the surface of the Where β represents the hyper-parameter for the balance weight.
5. The prediction method according to claim 4, wherein the step 5 includes:
by calculation formula
L=L BPR1 L local2 L glocal3 ||θ|| 2
Figure FDA00040929752600000314
Obtaining the comprehensive loss L; wherein lambda is 123 All represent hyper-parameters of the balance weights, θ represents parameters of the graph convolution neural network, σ represents a nonlinear activation function, τ represents paired training data,
Figure FDA0004092975260000041
Figure FDA0004092975260000042
representing the lncRNA to be tested: n and the target drug: d, d + Has relevance between->
Figure FDA0004092975260000043
Representing the lncRNA to be tested: n and sampling drugs: d, d - No correlation exists between the two;
respectively aiming at initial feature vectors e of the lncRNA to be detected by utilizing the comprehensive loss L n And an initial feature vector e of the target drug d Performing back propagation update to obtain an intermediate feature vector e of the lncRNA to be detected n ' and intermediate eigenvector e of the target drug d '。
6. The prediction method according to claim 5, wherein the expression of the relevance prediction model in step 7 is as follows:
Figure FDA0004092975260000044
wherein ,
Figure FDA0004092975260000045
representing the lncRNA to be tested: n and the target drug: and d, an association score.
7. The prediction method according to claim 1, characterized in that before performing said step 6, said prediction method further comprises:
according to the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug, calculating an AUC value and an AUPR value respectively;
if the AUC value and the AUPR value reach the maximum value, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset updating termination condition; otherwise the first set of parameters is selected,
and determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug do not meet a preset updating termination condition.
8. A device for predicting drug association, comprising:
the initialization module is used for constructing a correlation bipartite graph according to a correlation pair which is obtained in advance and consists of the lncRNA to be detected and the target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively;
the aggregation module is used for running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
The first contrast learning loss module is used for constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
the second contrast learning loss module is used for constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
the intermediate feature vector module is used for calculating comprehensive loss according to the first comparison learning loss, the second comparison learning loss and the BPR loss function, and respectively and reversely transmitting and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug;
the final feature vector module is used for judging whether the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset update termination condition; if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset updating termination condition, taking the intermediate feature vector of the lncRNA to be detected as a final feature vector of the lncRNA to be detected and taking the intermediate feature vector of the target drug as a final feature vector of the target drug; otherwise the first set of parameters is selected,
Taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the aggregation module, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the aggregation module, and returning to the execution aggregation module;
the prediction model module is used for constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
and the prediction module is used for predicting the relevance of the lncRNA to be detected and the target drug by using the relevance prediction model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method of predicting drug relevance according to any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of predicting drug relevance according to any one of claims 1 to 7.
CN202310157524.0A 2023-02-23 2023-02-23 Method, device, terminal equipment and medium for predicting drug relevance Pending CN116129989A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310157524.0A CN116129989A (en) 2023-02-23 2023-02-23 Method, device, terminal equipment and medium for predicting drug relevance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310157524.0A CN116129989A (en) 2023-02-23 2023-02-23 Method, device, terminal equipment and medium for predicting drug relevance

Publications (1)

Publication Number Publication Date
CN116129989A true CN116129989A (en) 2023-05-16

Family

ID=86306261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310157524.0A Pending CN116129989A (en) 2023-02-23 2023-02-23 Method, device, terminal equipment and medium for predicting drug relevance

Country Status (1)

Country Link
CN (1) CN116129989A (en)

Similar Documents

Publication Publication Date Title
Fu et al. A deep ensemble model to predict miRNA-disease association
Zhang et al. An efficient feature selection strategy based on multiple support vector machine technology with gene expression data
CN110767263B (en) Non-coding RNA and disease associated prediction method based on sparse subspace learning
CN110556184B (en) Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition
CN111370073B (en) Medicine interaction rule prediction method based on deep learning
Belciug Logistic regression paradigm for training a single-hidden layer feedforward neural network. Application to gene expression datasets for cancer research
CN113299338A (en) Knowledge graph-based synthetic lethal gene pair prediction method, system, terminal and medium
Moteghaed et al. Biomarker discovery based on hybrid optimization algorithm and artificial neural networks on microarray data for cancer classification
Zhang et al. A novel graph attention adversarial network for predicting disease-related associations
CN112164426A (en) Drug small molecule target activity prediction method and device based on TextCNN
Wang et al. Predicting Protein Interactions Using a Deep Learning Method‐Stacked Sparse Autoencoder Combined with a Probabilistic Classification Vector Machine
CN105808976A (en) Recommendation model based miRNA target gene prediction method
CN116230077A (en) Antiviral drug screening method based on restarting hypergraph double random walk
CN116343927A (en) miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm
Saheed et al. Microarray gene expression data classification via Wilcoxon sign rank sum and novel Grey Wolf optimized ensemble learning models
Hu et al. Cancer gene selection with adaptive optimization spiking neural p systems and hybrid classifiers
Guo et al. Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA–miRNA associations
CN110739028B (en) Cell line drug response prediction method based on K-nearest neighbor constraint matrix decomposition
CN113539479A (en) Similarity constraint-based miRNA-disease association prediction method and system
CN116453585A (en) mRNA and drug association prediction method, device, terminal equipment and medium
CN109920478B (en) Microorganism-disease relation prediction method based on similarity and low-rank matrix filling
CN114141306B (en) Distant metastasis identification method based on gene interaction mode optimization graph representation
CN116129989A (en) Method, device, terminal equipment and medium for predicting drug relevance
Wu On biological validity indices for soft clustering algorithms for gene expression data
CN115083511A (en) Peripheral gene regulation and control feature extraction method based on graph representation learning and attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination