CN116129989A - Method, device, terminal equipment and medium for predicting drug relevance - Google Patents
Method, device, terminal equipment and medium for predicting drug relevance Download PDFInfo
- Publication number
- CN116129989A CN116129989A CN202310157524.0A CN202310157524A CN116129989A CN 116129989 A CN116129989 A CN 116129989A CN 202310157524 A CN202310157524 A CN 202310157524A CN 116129989 A CN116129989 A CN 116129989A
- Authority
- CN
- China
- Prior art keywords
- lncrna
- feature vector
- detected
- target drug
- drug
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000003814 drug Substances 0.000 title claims abstract description 259
- 229940079593 drug Drugs 0.000 title claims abstract description 257
- 238000000034 method Methods 0.000 title claims abstract description 75
- 239000013598 vector Substances 0.000 claims abstract description 334
- 108020005198 Long Noncoding RNA Proteins 0.000 claims abstract description 234
- 230000002776 aggregation Effects 0.000 claims abstract description 27
- 238000004220 aggregation Methods 0.000 claims abstract description 27
- 230000006870 function Effects 0.000 claims abstract description 26
- 238000003062 neural network model Methods 0.000 claims abstract description 16
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000004364 calculation method Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 19
- 238000013527 convolutional neural network Methods 0.000 claims description 12
- 238000003860 storage Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 239000003795 chemical substances by application Substances 0.000 claims 1
- 230000008569 process Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000035945 sensitivity Effects 0.000 description 8
- 201000010099 disease Diseases 0.000 description 6
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 6
- 238000009826 distribution Methods 0.000 description 6
- 238000005457 optimization Methods 0.000 description 5
- 238000013434 data augmentation Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 208000022072 Gallbladder Neoplasms Diseases 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 201000010175 gallbladder cancer Diseases 0.000 description 2
- 108091070501 miRNA Proteins 0.000 description 2
- 239000002679 microRNA Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 108091027963 non-coding RNA Proteins 0.000 description 2
- 102000042567 non-coding RNA Human genes 0.000 description 2
- 201000008968 osteosarcoma Diseases 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 230000019491 signal transduction Effects 0.000 description 2
- 102000040650 (ribonucleotides)n+m Human genes 0.000 description 1
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 1
- 201000004569 Blindness Diseases 0.000 description 1
- 208000024172 Cardiovascular disease Diseases 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 108091046869 Telomeric non-coding RNA Proteins 0.000 description 1
- 108020004417 Untranslated RNA Proteins 0.000 description 1
- 102000039634 Untranslated RNA Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004166 bioassay Methods 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000002512 chemotherapy Methods 0.000 description 1
- DQLATGHUWYMOKM-UHFFFAOYSA-L cisplatin Chemical compound N[Pt](N)(Cl)Cl DQLATGHUWYMOKM-UHFFFAOYSA-L 0.000 description 1
- 229960004316 cisplatin Drugs 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000003596 drug target Substances 0.000 description 1
- 230000004064 dysfunction Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006718 epigenetic regulation Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 201000005249 lung adenocarcinoma Diseases 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 108091029369 miR-410 stem-loop Proteins 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001124 posttranscriptional effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000022983 regulation of cell cycle Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Crystallography & Structural Chemistry (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application is applicable to the technical field of biological information, and provides a method, a device, terminal equipment and a medium for predicting drug relevance, wherein a bipartite graph is constructed according to a relevance pair formed by lncRNA and a drug, and vector representation of the lncRNA and the drug is initialized; performing neighbor node aggregation on the vector representation by using a neural network model to obtain initial feature vectors of the lncRNA and the drug; constructing a local structure neighbor and a global semantic neighbor, and constructing a first contrast learning loss and a second contrast learning loss according to the initial feature vector; according to the contrast learning loss, updating the initial feature vector by combining a BPR loss function to obtain an intermediate feature vector of the lncRNA and the drug; if the intermediate feature vector meets the update termination condition, constructing a relevance prediction model by using the intermediate feature vector, and predicting relevance of the lncRNA and the drug. The method and the device can improve the accuracy of prediction of the relevance of the lncRNA and the drug.
Description
Technical Field
The application belongs to the technical field of biological information, and particularly relates to a method, a device, terminal equipment and a medium for predicting drug relevance.
Background
In recent years, there has been increasing evidence that ncrnas (non-coding RNAs, a functional RNA molecule) can affect drug efficiency by modulating genes associated with drug sensitivity, such as inducing alternative signaling pathways. Long non-coding RNA (lncRNA, a ncRNA) is an RNA molecule that is more than 200 nucleotides in length. lncRNA plays a key role in many biological processes such as epigenetic regulation, cell cycle regulation, cell differentiation, transcription and post-transcriptional regulation, and genomic splicing.
Numerous related studies have shown that lncRNA regulates human disease through the co-action of a range of biomolecules in organisms. Their mutations and dysfunctions are closely related to human diseases such as nervous system diseases, hematopathy, cardiovascular diseases and various cancers. With the development of sequencing technology, more and more lncRNA molecules are detected and analyzed in terms of sensitivity and depth, especially their role in drug sensitivity. Studies show that lncRNA can regulate drug sensitivity related genes, induce alternative signal pathways and further influence drug efficacy. For example, lncrrnanorad (non-coding RNA activated by DNA damage) inhibits proliferation of osteosarcoma HOS (human osteosarcoma cells)/DDP (human lung adenocarcinoma cells) and increases their sensitivity to cisplatin by targeting miR-410-3 p. The chemotherapy of the gall bladder cancer induces the sensitivity of the gall bladder cancer cells through a key regulatory factor lncRNA1 (GBCDrlnc 1), so that the identification of the association of the lncRNA and the sensitivity of the medicine has important significance for the development of the medicine. However, traditional biological assay-based methods tend to consume significant amounts of time and labor, and are highly blind, resulting in inaccurate predictions of lncRNA and drug association.
Disclosure of Invention
The embodiment of the application provides a method, a device, terminal equipment and a medium for predicting the relevance of a drug, which can solve the problem that the prediction of the relevance of the lncRNA and the drug is inaccurate at present.
In a first aspect, an embodiment of the present application provides a method for predicting drug association, including:
step 1, constructing a correlation bipartite graph according to a correlation pair formed by the lncRNA to be detected and a target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively;
step 4, constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the step 2, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the step 2, and returning to the step 2;
and step 8, predicting the relevance of the lncRNA to be detected and the target drug by using a relevance prediction model.
Optionally, the neural network model in step 2 is a graph roll-up neural network model.
Optionally, in step 2, running a neural network model on the associated bipartite graph, and respectively performing neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug, where the method includes:
by calculation formula
Obtaining an initial feature vector e of the lncRNA to be detected n Initial feature vector e of target drug d; wherein ,Nnn Representing neighbor node set, nn of lncRNA to be measured d A set of neighbor nodes representing the target drug,embedding of node vector representing lncRNA to be tested in layer I of graph roll-up neural network,/->Embedding a node vector representing a target drug in a first layer of a graph convolutional neural network, wherein L represents the total layer number of the graph convolutional neural network and +.>Embedding of node vector representing lncRNA to be tested in layer 1+1 of graph roll-up neural network, < >>The node vector representing the target drug is embedded at layer 1+1 of the graph convolutional neural network.
Optionally, in step 3, constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, including:
by calculation formula
Obtaining local structure neighbor learning loss of lncRNA to be detected Local structural neighbor learning loss of target drug +.> wherein ,Initial eigenvector e representing lncRNA to be tested d At the output of the kth layer of the graph roll-up neural network,an initial feature vector e representing the target drug n At the output of the kth layer of the graph-rolled neural network, k represents an even number, τ represents the hyper-parameter of the softmax function, +.>Vector representation representing lncRNA to be tested, +.>Vector representation of target drug, n_num represents total number of lncRNA obtained in step 1, d_num represents total number of drug obtained in step 1,/o>Vector representation representing the ith lncRNA at layer 0 of the graph convolutional neural network, +.>Vector representation representing the jth drug at layer 0 of the graph convolutional neural network, i=1, 2,..n_num, j=1, 2,..d_num;
by calculation formulaObtaining a first contrast studyLoss L local The method comprises the steps of carrying out a first treatment on the surface of the Where α represents a hyper-parameter for balancing weights.
Optionally, in step 4, constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, including:
by calculation formula
Obtaining global semantic neighbor contrast learning loss of lncRNA to be detectedGlobal semantic neighbor contrast learning penalty of target drug > wherein ,ci Representing prototype of lncRNA to be tested, c j Representing prototypes of the drug, C representing a collection of prototypes;
by calculation formulaObtaining a second contrast learning loss L glocal The method comprises the steps of carrying out a first treatment on the surface of the Where β represents the hyper-parameter for the balance weight.
Optionally, step 5 includes:
by calculation formula
L=L BPR +λ 1 L local +λ 2 L glocal +λ 3 ||θ|| 2
Obtaining comprehensive loss L; wherein lambda is 1 ,λ 2 ,λ 3 All represent hyper-parameters of the balance weights, θ represents parameters of the graph convolution neural network, σ represents a nonlinear activation function, τ represents paired training data, represents the lncRNA to be tested: n and target drug: d, d + Has relevance between->Represents the lncRNA to be tested: n and sampling drugs: d, d - No correlation exists between the two;
initial feature vector e of lncRNA to be detected by utilizing comprehensive loss L n And an initial feature vector e of the target drug d Performing back propagation update to obtain an intermediate feature vector e of the lncRNA to be detected n Intermediate eigenvector e of' and target drug d '。
Optionally, the expression of the relevance prediction model in step 7 is as follows:
Optionally, before performing step 6, the prediction method further includes:
according to the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug, calculating an AUC value and an AUPR value respectively;
If the AUC value and the AUPR value reach the maximum value, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition; otherwise the first set of parameters is selected,
and determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug do not meet the preset updating termination condition.
In a second aspect, embodiments of the present application provide a device for predicting drug association, including:
the initialization module is used for constructing a correlation bipartite graph according to a correlation pair which is obtained in advance and consists of the lncRNA to be detected and the target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively;
the aggregation module is used for running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
the first contrast learning loss module is used for constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
the second contrast learning loss module is used for constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
The intermediate feature vector module is used for calculating comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug;
the final feature vector module is used for judging whether the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition; if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition, the intermediate feature vector of the lncRNA to be detected is used as a final feature vector of the lncRNA to be detected, and the intermediate feature vector of the target drug is used as a final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the aggregation module, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the aggregation module, and returning to the execution aggregation module;
the prediction model module is used for constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
And the prediction module is used for predicting the relevance of the lncRNA to be detected and the target drug by using the relevance prediction model.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for predicting drug relevance described above when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor, implements a method for predicting drug relevance as described above.
The scheme of the application has the following beneficial effects:
in some embodiments of the present application, according to an initial feature vector of a lncRNA to be detected and an initial feature vector of a target drug, a first contrast learning loss and a second contrast learning loss are constructed, and then according to the first contrast learning loss, the second contrast learning loss and the BPR loss, the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug are respectively updated by back propagation, so that a more accurate feature vector can be obtained, thereby improving the accuracy of prediction of relevance between the lncRNA to be detected and the target drug.
Other advantages of the present application will be described in detail in the detailed description section that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for predicting drug relevance according to an embodiment of the present disclosure;
FIG. 2a is a graph of ROC comparing the predicted drug association method provided by one embodiment of the present application with other prior art performance;
FIG. 2b is a visual diagram of a method for predicting drug relevance and other prior art performance versus drug node feature vector aggregation according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a device for predicting drug association according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Aiming at the problem that the prediction of the relevance between the lncRNA and the drug is inaccurate at present, the application provides a method, a device, a terminal device and a medium for predicting the relevance between the drug, wherein a first contrast learning loss and a second contrast learning loss are constructed according to an initial feature vector of the lncRNA to be detected and an initial feature vector of a target drug, and then the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug are respectively subjected to back propagation update according to the first contrast learning loss, the second contrast learning loss and the BPR loss, so that more accurate feature vectors can be obtained, and the accuracy of the prediction of the relevance between the lncRNA to be detected and the target drug is improved.
As shown in fig. 1, the method for predicting drug association provided in the present application mainly includes the following steps:
step 1, constructing a correlation bipartite graph according to a correlation pair formed by the lncRNA to be detected and the target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively.
In the examples of the present application, the above-mentioned association pair of lncRNA to be tested and the target drug can be obtained from the RNAacrDrug database (containing RNAs from multiple sets of chemical data related to drug sensitivity). The lncRNA to be measured is any one of a plurality of lncRNA obtained, and the target drug is any one of a plurality of drug obtained.
It should be noted that, in the embodiment of the present application, the bipartite graph may be constructed by a common bipartite graph construction method.
The vector representation of the lncRNA to be detected and the vector representation of the target drug are randomly initialized to obtain more accurate feature vectors later, and if the operation is not performed, the feature vectors of the lncRNA to be detected and the feature vectors of the target drug are always zero, so that the update of the feature vectors is meaningless.
And 2, running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug.
In some embodiments of the present application, the neural network model is a graph roll-up neural network model.
Specifically, by a calculation formula
Obtaining an initial feature vector e of the lncRNA to be detected n Initial characterization of target drugQuantity e d; wherein ,Nnn Representing neighbor node set, nn of lncRNA to be measured d A set of neighbor nodes representing the target drug,embedding of node vector representing lncRNA to be tested in layer I of graph roll-up neural network,/->Embedding a node vector representing a target drug in a first layer of a graph convolutional neural network, wherein L represents the total layer number of the graph convolutional neural network and +.>Embedding of node vector representing lncRNA to be tested in layer 1+1 of graph roll-up neural network, < >>The node vector representing the target drug is embedded at layer 1+1 of the graph convolutional neural network.
It should be noted that, the initial feature vector e of the lncRNA to be tested n Initial feature vector e of target drug d In the layer aggregation stage, the vector representation of the lncRNA node to be detected of each graph volume is laminatedAnd vector representation of the target drug->The vector representation accuracy can be improved by the aggregation obtained by a weighted sum method.
And step 3, constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug.
The above-mentioned local structure neighbors represent the adjacent lncRNA nodes to be measured or adjacent target drug nodes in the spatial structure in the associated bipartite graph in step 1.
It should be noted that, the above-mentioned local structure neighbors are used to describe the high-order association in the association bipartite graph.
Feature vector e of layer I in graph roll-up neural network (l) Is the weighted sum of the first neighbor of each node (lncRNA node to be tested or drug target node).
Since even information propagation of bipartite graphs naturally aggregates information from homogeneous structural neighbors (structural neighbor nodes of the same type), embedding of homogeneous neighbors (nodes of the same type) can be obtained from the output of even layers of the graph-rolling network, and then the obtained feature vectors are utilized to simulate higher-order relationships between local neighbor nodes.
And 4, constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug.
The global semantic neighbors are expressed in the correlated bipartite graph in the step 1, are not adjacent in spatial structure, but have possible correlation (similar node effects, but no direct correlation on the bipartite graph) of the lncRNA nodes or target drug nodes to be detected, and the global semantic neighbors are mainly constructed to relieve the influence of data sparsity on experimental results and reduce the influence of noise generated in the construction process of local structure neighbors on prediction effects.
To construct the appropriate global semantic neighbor contrast learning objective, in some embodiments of the present application, prototype contrast learning objectives are developed by learning potential prototypes for each node (lncRNA node to be tested or target drug node) to identify global semantic neighbors. Based on the greater likelihood that similar lncRNA nodes to be tested or target drug nodes are located in adjacent feature spaces, a prototype can be defined as the center of a cluster consisting of a set of semantic neighbors, i.e., potential prototypes can be learned by a clustering algorithm (e.g., k-nearest neighbor algorithm).
And 5, calculating comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and respectively and reversely propagating and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug.
The expression of the above BPR (Bayesian Personalized Ranking, bayesian personalized ordering) loss function is as follows:
wherein, sigma is a nonlinear activation function, representing training data pairs->Represents the observed lncRNA to be tested: n and target drug d + There is a correlation between- >Sample drug (randomly initialized drug node, drug not associated with lncRNA: n): d, d - And (3) testing lncRNA: n has no experimentally verified correlation.
In some embodiments of the present application, after step 5 is performed, the intermediate vector is determined as follows:
and a step a, calculating an AUC value and an AUPR value respectively according to the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug.
The AUC value (area surrounded by the axis under the ROC curve (receiver operation characteristic curve)) and the AUPR value (area surrounded by the axis under the RC curve) are calculated herein to determine whether the intermediate feature vector of the lncRNA to be tested and the intermediate feature vector of the target drug at that time have satisfied the update termination condition (the optimal intermediate feature vector of the lncRNA to be tested and the optimal intermediate feature vector of the target drug). When the update termination condition is not met, the step is repeatedly executed until model fitting (solving the optimal intermediate feature vector of the lncRNA to be detected and the optimal intermediate feature vector of the target drug) is carried out, and the accuracy of the feature vector is improved, so that the accuracy of relevance prediction is improved.
It should be noted that, calculating the AUC value and the AUPR value belongs to common general knowledge, and the calculation process is not described herein.
Step b, if the AUC value and the AUPR value reach the maximum value, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition; otherwise, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug do not meet the preset updating termination condition.
and (2) taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the step (2), taking the intermediate feature vector of the target drug as the vector representation of the target drug in the step (2), and returning to the step (2).
And 7, constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug.
Specifically, the expression of the constructed relevance prediction model is as follows:
And step 8, predicting the relevance of the lncRNA to be detected and the target drug by using a relevance prediction model.
The relevance prediction model constructed in the step 7 is input with the final feature vector of the lncRNA to be detected and the final feature vector of the target drug obtained after the steps 1-6 are executed, so that the relevance score of the lncRNA to be detected and the target drug is obtained, and the higher the relevance score of the lncRNA to be detected and the target drug is, the higher the relevance between the lncRNA to be detected and the target drug is.
The specific process of constructing the first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in step 3 (constructing the local structural neighbor of the association pair based on the association bipartite graph and constructing the first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug) is illustrated as follows.
Step 3.1, through a calculation formula
Obtaining local structure neighbor learning loss of lncRNA to be detectedLocal structural neighbor learning loss of target drug +.>
wherein ,initial eigenvector e representing lncRNA to be tested d Output of the kth layer of the neural network is rolled up in the graph,/->An initial feature vector e representing the target drug n At the output of the kth layer of the graph-rolled neural network, k represents an even number, τ represents the hyper-parameter of the softmax function, +.>Vector representation representing lncRNA to be tested, +. >Vector representation of target drug, n_num represents total number of lncRNA obtained in step 1, d_num represents total number of drug obtained in step 1,/o>Vector representation representing the ith lncRNA at layer 0 of the graph convolutional neural network, +.>Vector representation of the jth drug at layer 0 of the graph convolutional neural network, i=1, 2.
Specifically, the feature vector of the node (the lncRNA node to be detected or the target drug node in the associated bipartite graph) and the feature vector output by the even layer graph convolution network model are taken as positive samples, other feature vectors are taken as negative samples, and finally the distance between the positive samples is minimized by using an InfoNCE (self-supervision contrast learning) loss function.
Where α represents a hyper-parameter for balancing weights.
Next, an exemplary description is given to a specific process of constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in step 4 (constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug).
Step 4.1, through a calculation formula
Obtaining global semantic neighbor contrast learning loss of lncRNA to be detectedGlobal semantic neighbor contrast learning penalty of target drug>
wherein ,ci Representing prototype of lncRNA to be tested, c j Representing prototypes of the drug, C representing a collection of prototypes. The prototype represents the center of a cluster in the global semantic structure neighborhood where all lncRNA or drugs have possible associations.
Where β represents the hyper-parameter for the balance weight.
The analysis process of step 4 is exemplarily described below.
Global semantic neighbors are constructed primarily to maximize the following likelihood functions (a function about parameters in the statistical model, representing likelihood in model parameters):
where Θ represents model parameters and a represents an incidence matrix. c i and cj Respectively representing the lncRNA to be tested: n and target drug: potential prototypes of d, p (·) represent likelihood functions that need to be maximized.
In the embodiment of the present application, based on an EM (Expectation-Maximization algorithm) optimization algorithm and an InfoNCE loss function, nodes in the same cluster are defined as positive samples, and nodes of different clusters are regarded as negative samples.
In the embodiment of the present application, in order to optimize the contrast learning manner of the global semantic neighbors, the lower bound of the likelihood function is obtained through the Jensen inequality, specifically as follows:
wherein ,Q(ci |e n ) Representation c i Distribution of Q (c) i |e d ) Representation c j Is a distribution of (a).
When e n (e d ) After being observed, the above equation is optimized using the EM optimization algorithm:
in E-step (one of the steps of the EM optimization algorithm), E n and ed Is fixed, so the K-means algorithm can be applied to vector representations of the lncRNA and target drug to be tested to estimate Q (c i |e n) and Q(cj |e d ). If lncRNA is to be tested: n belongs to cluster i and the target drug d belongs to cluster j, then the corresponding cluster center c i and cj Prototypes of n and d, respectively.
For prototype c i and cj Which is distributed as a norm andThe distribution of other prototypes is andIt is then possible to obtain:
in all clusters, assuming that the distribution of lncRNA to be tested and the target drug is an isotropic gaussian distribution, then one can obtain:
wherein ,(en -c i ) 2 =2-2e l ·c l . Assuming that each gaussian has the same derivative, represented by the temperature hyper-parameter σ, there is:
clustering the embedding of the lncRNA to be detected and the target drug by using a K-means algorithm to respectively obtain K clusters of the lncRNA to be detected and the target drug. Obtaining andAfter that, the final comparative learning loss (second comparative learning loss) can be obtained:
The following describes an exemplary procedure of step 5 (calculating a comprehensive loss according to the first contrast learning loss, the second contrast learning loss, and the BPR loss function, and updating the initial feature vector of the lncRNA to be measured and the initial feature vector of the target drug by using the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be measured and the intermediate feature vector of the target drug by counter propagation, respectively).
Step 5.1, through the calculation formula
L=L BPR +λ 1 L local +λ 2 L glocal +λ 3 ||θ|| 2
Obtaining comprehensive loss L; wherein lambda is 1 ,λ 2 ,λ 3 Each representing a hyper-parameter of the balance weight, θ representing a parameter of the graph convolution neural network (a kernel function of the graph convolution network), σ representing a nonlinear activation function, τ representing pairs of training data, represents the lncRNA to be tested: n and target drug: d, d + Has relevance between->Represents the lncRNA to be tested: n and sampling drugs: d, d - There is no correlation between them.
Step 5.2, utilizing the comprehensive loss L to respectively detect initial feature vectors e of the lncRNA to be detected n And an initial feature vector e of the target drug d Performing back propagation update to obtain an intermediate feature vector e of the lncRNA to be detected n Intermediate eigenvector e of' and target drug d '。
It should be noted that the back propagation by using the loss is common knowledge, and the process thereof is not described here.
In some embodiments of the present application, there is also provided validity verification of a prediction method of drug association, the result being as follows:
AUC | AUPR | |
NDSGCL_N | 0.9103 | 0.9178 |
NDSGCL_L | 0.9285 | 0.9444 |
NDSGCL_G | 0.9549 | 0.9267 |
NDSGCL | 0.9734 | 0.9800 |
in particular, to verify the impact of local structure neighbor contrast learning and global semantic neighbor contrast learning on model performance, in embodiments of the present application, different variants are constructed, where NDSGCL-N indicates that no contrast learning method is used, NDSGCL-L indicates that only local structure neighbors are used, NDSGCL-G indicates that only global semantic neighbors are used, NDSGCL indicates that both local structure neighbors and global semantic neighbors are used, and the results are shown in the table above. It can be deduced from the above table that the local structure neighbors can be used alone to effectively extract higher order relations between nodes, but the improvement of AUC is not great due to the introduction of noise in the process of constructing positive and negative samples; also, using global semantic neighbors alone can significantly alleviate data sparsity, but the boost to AUPR is not significant because higher-order associations between nodes cannot be fully exploited. Combining these two methods is critical to improving the predictive accuracy of NDSGCL.
To further evaluate the performance of the predictive methods of drug association provided herein, in some embodiments of the present application, a comparison is made in conjunction with other currently most advanced methods, as shown in fig. 2a and 2 b. The ordinate of fig. 2a represents true rate, the abscissa of fig. 2a represents false positive rate, and in fig. 2a, GCN represents graph roll-up network method (migration to lncRNA and drug association prediction problem); the lightGCN represents an optimization of the graph convolutional network, and the method abandons the characteristic change and nonlinear activation of the traditional graph convolutional network and only keeps the node aggregation of the graph convolutional network; GCL-ED represents a contrast learning method of a data augmentation method based on random missing of edges; GCL-ND represents a contrast learning method of a data augmentation method of random loss of disease nodes; MLRDFM indicates that MLRDFM integrates the similarity of four miRNAs and two diseases, and the association of miRNAs and diseases is predicted by a deep decomposition machine method; NDSGCL refers to a method for predicting drug association provided herein.
Init in fig. 2b represents a characteristic representation of the original drug node; the lightGCN represents an optimization of the graph convolution network, abandons characteristic change and nonlinear activation of the traditional graph convolution network, and only keeps node aggregation of the graph convolution network; GCL-ED represents a contrast learning method of a data augmentation method based on random missing of edges; GCL-ND represents a contrast learning method of a data augmentation method of random loss of disease nodes; NDSGCL refers to a method for predicting drug association provided herein.
As can be seen from fig. 2a and 2b, the methods for predicting drug association provided in the present application are superior to other methods currently most advanced.
As can be seen from the above steps, the method for predicting the drug relevance provided by the present application constructs a first contrast learning loss and a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, and then performs back propagation update on the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug according to the first contrast learning loss, the second contrast learning loss and the BPR loss, so that a more accurate feature vector can be obtained, thereby improving the accuracy of predicting the relevance of the lncRNA to be detected and the target drug.
The drug association prediction device provided by the application is exemplified in the following in connection with specific embodiments.
As shown in fig. 3, an embodiment of the present application provides a device for predicting drug association, where the device 300 for predicting drug association includes:
the initialization module 301 is configured to construct a correlation bipartite graph according to a pre-acquired correlation pair formed by the lncRNA to be tested and the target drug, and randomly initialize vector representation of the lncRNA to be tested and vector representation of the target drug, respectively;
the aggregation module 302 is configured to run a neural network model on the associated bipartite graph, and perform neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
the first contrast learning loss module 303 is configured to construct a local structure neighbor of the association pair based on the association bipartite graph, and construct a first contrast learning loss according to an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
the second contrast learning loss module 304 is configured to construct a global semantic neighbor of the association pair based on the association bipartite graph, and construct a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
The intermediate feature vector module 305 is configured to calculate a comprehensive loss according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, and update the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by using the comprehensive loss to obtain an intermediate feature vector of the lncRNA to be detected and an intermediate feature vector of the target drug;
the final feature vector module 306 is configured to determine whether the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset update termination condition; if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet the preset updating termination condition, the intermediate feature vector of the lncRNA to be detected is used as a final feature vector of the lncRNA to be detected, and the intermediate feature vector of the target drug is used as a final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the aggregation module, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the aggregation module, and returning to the execution aggregation module;
the prediction model module 307 is configured to construct a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
The prediction module 308 is configured to predict the relevance of the lncRNA to be detected and the target drug by using a relevance prediction model.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
As shown in fig. 4, an embodiment of the present application provides a terminal device, as shown in fig. 4, a terminal device D10 of the embodiment includes: at least one processor D100 (only one processor is shown in fig. 4), a memory D101 and a computer program D102 stored in the memory D101 and executable on the at least one processor D100, the processor D100 implementing the steps in any of the various method embodiments described above when executing the computer program D102.
Specifically, when the processor D100 executes the computer program D102, an association bipartite graph is constructed by an association pair formed by the lncRNA to be detected and the target drug, and the vector representation of the lncRNA to be detected and the vector representation of the target drug are respectively initialized randomly, then a neural network model is operated on the association bipartite graph, neighbor node aggregation is respectively carried out on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug, then a local structure neighbor of the association pair is constructed based on the association bipartite graph, and then a first contrast learning loss is constructed based on the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, and then a second contrast learning loss is constructed according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, and the initial feature vector of the lncRNA to be detected is respectively updated in a reverse direction to obtain an intermediate feature vector of the lncRNA to be detected and an initial feature vector of the target drug to be detected, and finally an intermediate feature vector of the intermediate feature vector to be detected is predicted according to the association bipartite, and a final feature vector of the lncRNA to be detected and the intermediate feature vector to be detected is finally, and the intermediate feature vector of the intermediate feature of the lncRNA to be detected is predicted, and the intermediate feature vector is finally is predicted and the final feature vector is predicted by using the intermediate feature vector of the intermediate vector and the intermediate feature has been predicted to be predicted and the intermediate feature has. According to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug, a first contrast learning loss and a second contrast learning loss are constructed, and then according to the first contrast learning loss, the second contrast learning loss and the BPR loss function, the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug are subjected to back propagation update, so that more accurate feature vectors can be obtained, and the accuracy of prediction of relevance of the lncRNA to be detected and the target drug is improved.
The processor D100 may be a central processing unit (CPU, central Processing Unit), the processor D100 may also be other general purpose processors, digital signal processors (DSP, digital Signal Processor), application specific integrated circuits (ASIC, application Specific Integrated Circuit), off-the-shelf programmable gate arrays (FPGA, field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory D101 may in some embodiments be an internal storage unit of the terminal device D10, for example a hard disk or a memory of the terminal device D10. The memory D101 may also be an external storage device of the terminal device D10 in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device D10. Further, the memory D101 may also include both an internal storage unit and an external storage device of the terminal device D10. The memory D101 is used for storing an operating system, an application program, a boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory D101 may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps that may implement the various method embodiments described above.
The present embodiments provide a computer program product which, when run on a terminal device, causes the terminal device to perform steps that enable the respective method embodiments described above to be implemented.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a predicting device/terminal equipment of pharmaceutical association, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The application has the following advantages:
1. a general computing framework for predicting the relevance of lncRNA and medicines is provided, in the framework, local structure neighbors are constructed to extract high-order relevance between relevance pairs, and global semantic neighbors are constructed to relieve the influence of data sparsity on experimental results.
2. Compared with the prior art, the influence of data sparsity on experimental results is relieved by introducing a contrast learning idea, and the prediction accuracy is effectively improved. And moreover, the correlation data of the large-scale lncRNA and the drug can be predicted in a very short time, so that blindness and cost of a biological experiment are reduced.
While the foregoing is directed to the preferred embodiments of the present application, it should be noted that modifications and adaptations to those embodiments may occur to one skilled in the art and that such modifications and adaptations are intended to be comprehended within the scope of the present application without departing from the principles set forth herein.
Claims (10)
1. A method for predicting drug association, comprising:
step 1, constructing a correlation bipartite graph according to a correlation pair formed by a to-be-detected lncRNA and a target drug, and randomly initializing vector representation of the to-be-detected lncRNA and vector representation of the target drug respectively;
step 2, running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
step 3, constructing a local structural neighbor of the association pair based on the association bipartite graph, and constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
step 4, constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
step 5, calculating comprehensive loss according to the first comparison learning loss, the second comparison learning loss and a BPR loss function, and respectively and reversely transmitting and updating an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug by utilizing the comprehensive loss to obtain an intermediate feature vector of the lncRNA to be detected and an intermediate feature vector of the target drug;
Step 6, if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset updating termination condition, taking the intermediate feature vector of the lncRNA to be detected as a final feature vector of the lncRNA to be detected, and taking the intermediate feature vector of the target drug as a final feature vector of the target drug; otherwise the first set of parameters is selected,
taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the step 2, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the step 2, and returning to the step 2;
step 7, constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
and step 8, predicting the relevance of the lncRNA to be detected and the target drug by using the relevance prediction model.
2. The prediction method according to claim 1, wherein the neural network model in step 2 is a graph roll-up neural network model;
in the step 2, a neural network model is run on the associated bipartite graph, and neighbor node aggregation is performed on the vector representation of the lncRNA to be detected and the vector representation of the target drug respectively to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug, including:
By calculation formula
Obtaining an initial feature vector e of the lncRNA to be detected n An initial feature vector e of the target drug d; wherein ,Nnn Representing the neighbor node set of the lncRNA to be detected, and Nn d A set of neighbor nodes representing the target agent,embedding a node vector representing the lncRNA to be tested in the first layer of the graph roll-up neural network,/for>Embedding a node vector representing the target drug in the first layer of the graph roll-up neural network, wherein L represents the total layer number of the graph roll-up neural network, < >>Embedding a node vector representing the lncRNA to be tested in the first layer (1) of the graph roll-up neural network,/I>The node vector representing the target drug is embedded in the layer l+1 of the graph roll-up neural network.
3. The prediction method according to claim 2, wherein the constructing a first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in the step 3 includes:
by calculation formula
Obtaining the local structure neighbor learning loss of the lncRNA to be detectedAnd local structural neighbor learning loss of the target drug +.> wherein ,Representing the initial feature vector e of the lncRNA to be tested d Output of the kth layer of the neural network is rolled up in the graph,/- >An initial feature vector e representing the target drug n At the output of the kth layer of the graph-rolled neural network, k represents an even number, τ represents the hyper-parameter of the softmax function, +.>Vector representation representing the lncRNA to be tested,>a vector representation representing the target drug, n_num representing the total number of lncRNA obtained in step 1, d_num representing the total number of drugs obtained in step 1,% and%>Vector representation representing the ith lncRNA at layer 0 of the graph convolutional neural network, +.>Vector representation representing the jth drug at layer 0 of the graph convolutional neural network, i=1, 2,..n_num, j=1, 2,..d_num;
4. The prediction method according to claim 3, wherein the constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug in the step 4 includes:
by calculation formula
Obtaining the global semantic neighbor contrast learning loss of the lncRNA to be detectedAll of the target drugsPartial semantic neighbor contrast learning penalty-> wherein ,ci Representing a prototype of the lncRNA to be tested, c j Representing prototypes of the drug, C representing a collection of prototypes;
5. The prediction method according to claim 4, wherein the step 5 includes:
by calculation formula
L=L BPR +λ 1 L local +λ 2 L glocal +λ 3 ||θ|| 2
Obtaining the comprehensive loss L; wherein lambda is 1 ,λ 2 ,λ 3 All represent hyper-parameters of the balance weights, θ represents parameters of the graph convolution neural network, σ represents a nonlinear activation function, τ represents paired training data, representing the lncRNA to be tested: n and the target drug: d, d + Has relevance between->Representing the lncRNA to be tested: n and sampling drugs: d, d - No correlation exists between the two;
respectively aiming at initial feature vectors e of the lncRNA to be detected by utilizing the comprehensive loss L n And an initial feature vector e of the target drug d Performing back propagation update to obtain an intermediate feature vector e of the lncRNA to be detected n ' and intermediate eigenvector e of the target drug d '。
7. The prediction method according to claim 1, characterized in that before performing said step 6, said prediction method further comprises:
according to the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug, calculating an AUC value and an AUPR value respectively;
if the AUC value and the AUPR value reach the maximum value, determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset updating termination condition; otherwise the first set of parameters is selected,
and determining that the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug do not meet a preset updating termination condition.
8. A device for predicting drug association, comprising:
the initialization module is used for constructing a correlation bipartite graph according to a correlation pair which is obtained in advance and consists of the lncRNA to be detected and the target drug, and randomly initializing vector representation of the lncRNA to be detected and vector representation of the target drug respectively;
the aggregation module is used for running a neural network model on the associated bipartite graph, and respectively carrying out neighbor node aggregation on the vector representation of the lncRNA to be detected and the vector representation of the target drug to obtain an initial feature vector of the lncRNA to be detected and an initial feature vector of the target drug;
The first contrast learning loss module is used for constructing a local structure neighbor of the association pair based on the association bipartite graph, and constructing first contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
the second contrast learning loss module is used for constructing a global semantic neighbor of the association pair based on the association bipartite graph, and constructing a second contrast learning loss according to the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug;
the intermediate feature vector module is used for calculating comprehensive loss according to the first comparison learning loss, the second comparison learning loss and the BPR loss function, and respectively and reversely transmitting and updating the initial feature vector of the lncRNA to be detected and the initial feature vector of the target drug by utilizing the comprehensive loss to obtain the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug;
the final feature vector module is used for judging whether the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset update termination condition; if the intermediate feature vector of the lncRNA to be detected and the intermediate feature vector of the target drug meet a preset updating termination condition, taking the intermediate feature vector of the lncRNA to be detected as a final feature vector of the lncRNA to be detected and taking the intermediate feature vector of the target drug as a final feature vector of the target drug; otherwise the first set of parameters is selected,
Taking the intermediate feature vector of the lncRNA to be detected as the vector representation of the lncRNA to be detected in the aggregation module, taking the intermediate feature vector of the target drug as the vector representation of the target drug in the aggregation module, and returning to the execution aggregation module;
the prediction model module is used for constructing a relevance prediction model according to the final feature vector of the lncRNA to be detected and the final feature vector of the target drug;
and the prediction module is used for predicting the relevance of the lncRNA to be detected and the target drug by using the relevance prediction model.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method of predicting drug relevance according to any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of predicting drug relevance according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310157524.0A CN116129989A (en) | 2023-02-23 | 2023-02-23 | Method, device, terminal equipment and medium for predicting drug relevance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310157524.0A CN116129989A (en) | 2023-02-23 | 2023-02-23 | Method, device, terminal equipment and medium for predicting drug relevance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116129989A true CN116129989A (en) | 2023-05-16 |
Family
ID=86306261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310157524.0A Pending CN116129989A (en) | 2023-02-23 | 2023-02-23 | Method, device, terminal equipment and medium for predicting drug relevance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116129989A (en) |
-
2023
- 2023-02-23 CN CN202310157524.0A patent/CN116129989A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fu et al. | A deep ensemble model to predict miRNA-disease association | |
Zhang et al. | An efficient feature selection strategy based on multiple support vector machine technology with gene expression data | |
CN110767263B (en) | Non-coding RNA and disease associated prediction method based on sparse subspace learning | |
CN110556184B (en) | Non-coding RNA and disease relation prediction method based on Hessian regular nonnegative matrix decomposition | |
CN111370073B (en) | Medicine interaction rule prediction method based on deep learning | |
Belciug | Logistic regression paradigm for training a single-hidden layer feedforward neural network. Application to gene expression datasets for cancer research | |
CN113299338A (en) | Knowledge graph-based synthetic lethal gene pair prediction method, system, terminal and medium | |
Moteghaed et al. | Biomarker discovery based on hybrid optimization algorithm and artificial neural networks on microarray data for cancer classification | |
Zhang et al. | A novel graph attention adversarial network for predicting disease-related associations | |
CN112164426A (en) | Drug small molecule target activity prediction method and device based on TextCNN | |
Wang et al. | Predicting Protein Interactions Using a Deep Learning Method‐Stacked Sparse Autoencoder Combined with a Probabilistic Classification Vector Machine | |
CN105808976A (en) | Recommendation model based miRNA target gene prediction method | |
CN116230077A (en) | Antiviral drug screening method based on restarting hypergraph double random walk | |
CN116343927A (en) | miRNA-disease association prediction method based on enhanced hypergraph convolution self-coding algorithm | |
Saheed et al. | Microarray gene expression data classification via Wilcoxon sign rank sum and novel Grey Wolf optimized ensemble learning models | |
Hu et al. | Cancer gene selection with adaptive optimization spiking neural p systems and hybrid classifiers | |
Guo et al. | Likelihood-based feature representation learning combined with neighborhood information for predicting circRNA–miRNA associations | |
CN110739028B (en) | Cell line drug response prediction method based on K-nearest neighbor constraint matrix decomposition | |
CN113539479A (en) | Similarity constraint-based miRNA-disease association prediction method and system | |
CN116453585A (en) | mRNA and drug association prediction method, device, terminal equipment and medium | |
CN109920478B (en) | Microorganism-disease relation prediction method based on similarity and low-rank matrix filling | |
CN114141306B (en) | Distant metastasis identification method based on gene interaction mode optimization graph representation | |
CN116129989A (en) | Method, device, terminal equipment and medium for predicting drug relevance | |
Wu | On biological validity indices for soft clustering algorithms for gene expression data | |
CN115083511A (en) | Peripheral gene regulation and control feature extraction method based on graph representation learning and attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |