CN117976139A

CN117976139A - Drug repositioning method and system based on deviation correcting mechanism and contrast learning

Info

Publication number: CN117976139A
Application number: CN202410371658.7A
Authority: CN
Inventors: 孟亚洁; 王毅; 许俊林; 唐贤方; 卢长城; 郭程; 刘芊蕊; 朱强; 胡新荣; 彭涛
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2024-03-29
Filing date: 2024-03-29
Publication date: 2024-05-03

Abstract

The invention discloses a drug repositioning method and a drug repositioning system based on a deviation correcting mechanism and contrast learning, which belong to a drug repositioning technology and comprise the following steps: based on priori knowledge of disease information and drug information, respectively aggregating the disease information and the drug information by LightGCN in combination with a correction mechanism to respectively acquire heterogeneous characteristics and neighbor characteristics of drug nodes and disease nodes; according to the heterogeneous characteristics and the neighbor characteristics of the drug node and the disease node, respectively generating a heterogeneous characteristic view and a neighbor characteristic view representing the drug node and the disease node, and performing data optimization by performing contrast learning on the heterogeneous characteristic view and the neighbor characteristic view and combining weighted binary cross entropy loss; performing dot product operation according to the optimized embedded vectors of the medicine nodes and the disease nodes to obtain the associated prediction between the medicine and the disease, so as to analyze the potential medicine corresponding to the disease; according to the invention, the supervision signals are captured by introducing contrast learning so as to relieve the data sparseness problem.

Description

Drug repositioning method and system based on deviation correcting mechanism and contrast learning

Technical Field

The invention relates to the technical field of drug repositioning, in particular to a drug repositioning method and system based on a deviation correcting mechanism and contrast learning.

Background

Drug repositioning refers to the discovery of drug candidates for rare or no therapeutic drug diseases, and deep learning techniques have become one of the dominant techniques for drug repositioning. Generally, a deep learning-based drug repositioning model aims at effectively integrating various network structure information, so that high-quality characterization is learned for each disease and drug, and finally, the purpose of prediction is achieved.

Drugs and diseases typically constitute three networks, namely, a drug-drug network, a disease-disease network, and a drug disease-associated network, both of which contain rich structural information and one heterogeneous network. However, some of the information is important, some is not important and can even be regarded as noise information, so there is an urgent need to design a new drug repositioning technique that learns reliable characterization by differentiating between rich information.

Disclosure of Invention

In order to solve the problems, the invention provides a drug repositioning method based on a deviation correcting mechanism and contrast learning, which comprises the following three stages:

And a data processing stage: based on priori knowledge of disease information and drug information, respectively aggregating the disease information and the drug information through LightGCN in combination with a correction mechanism to respectively acquire heterogeneous characteristics and neighbor characteristics of drug nodes and heterogeneous characteristics and neighbor characteristics of disease nodes;

Data optimization stage: according to the heterogeneous characteristics and the neighbor characteristics of the drug node and the disease node, respectively generating a heterogeneous characteristic view and a neighbor characteristic view representing the drug node and the disease node for comparison learning, and combining weighted binary cross entropy loss for data optimization;

prediction stage: and carrying out weighted fusion according to the optimized heterogeneous characteristics and neighbor characteristics of the medicine node and the disease node to obtain final embedded vectors of the medicine node and the disease node, and carrying out dot product operation to obtain the correlation prediction between the medicine and the disease so as to analyze the potential medicine corresponding to the disease.

Preferably, in the data processing stage, the relationship between the medicines and the diseases, the relationship between different medicines and the relationship between different diseases are used as the prior knowledge of the disease information and the medicine information.

Preferably, in the process of information aggregation in the data processing stage, heterogeneous information of nodes is aggregated according to the relation between the drug nodes and the disease nodes by LightGCN combining with a deviation correcting mechanism;

isomorphic information of different nodes is respectively aggregated according to the first K neighbor nodes of the drug node or the disease node.

Preferably, in the process of acquiring the embedded vectors in the data processing stage, the embedded vectors of different nodes are generated according to the aggregated heterogeneous information and isomorphic information, wherein each embedded vector is associated according to the relationship between a medicine node and a disease node, the relationship between different medicines and the relationship between different diseases.

Preferably, in the data processing stage, different inverse deviation scores are adaptively allocated to different aggregation nodes through a deviation correction mechanism, and then information aggregation is performed, so that deviation influence caused by popular nodes and long tail nodes is relieved.

Preferably, in the data optimization stage, based on heterogeneous features and neighbor features corresponding to different nodes, a dual view is constructed for comparison learning, and weighted binary cross entropy loss is combined as an optimization target.

The invention provides a drug repositioning system based on a deviation correcting mechanism and contrast learning, which comprises:

The data acquisition module is used for acquiring drug information and disease information in the data set;

The data preprocessing module is used for respectively aggregating the disease information and the drug information through LightGCN in combination with a correction mechanism based on priori knowledge of the disease information and the drug information, and respectively acquiring heterogeneous characteristics and neighbor characteristics of the drug node and heterogeneous characteristics and neighbor characteristics of the disease node;

The data optimization module is used for respectively generating heterogeneous characteristic views and neighbor characteristic views representing the drug node and the disease node according to the heterogeneous characteristics and the neighbor characteristics of the drug node and the disease node, performing contrast learning, and performing data optimization by combining weighted binary cross entropy loss;

And the prediction module is used for carrying out weighted fusion according to the optimized heterogeneous characteristics and neighbor characteristics of the medicine node and the disease node to obtain a final embedded vector of the medicine node and the disease node, carrying out dot product operation to obtain the associated prediction between the medicine and the disease, and further analyzing the potential medicine corresponding to the disease.

Preferably, the data preprocessing module is used for serving as priori knowledge of disease information and drug information according to the relationship between drugs and diseases, the relationship between different drugs and the relationship between different diseases; aggregating heterogeneous information of the nodes according to the relation between the drug nodes and the disease nodes by LightGCN combining with a deviation correcting mechanism; isomorphic information of different nodes is respectively aggregated according to the first K neighbor nodes of the drug node or the disease node.

Preferably, the data preprocessing module is further configured to generate embedded vectors of different nodes according to the aggregated heterogeneous information and isomorphic information, where each embedded vector is associated with a relationship between a drug node and a disease node, a relationship between different drugs, and a relationship between different diseases.

Preferably, the data optimization module is further configured to construct a dual view based on heterogeneous features and neighbor features corresponding to different nodes to perform contrast learning, and combine weighted binary cross entropy loss as an optimization target.

The invention discloses the following technical effects:

The invention firstly proposes and introduces a deviation correcting mechanism to relieve deviation caused by a flowing node and a long tail node in drug repositioning, thereby obtaining node characteristics with more expressive property;

According to the invention, contrast learning is introduced in the model optimization stage to capture the supervision signals so as to relieve the data sparseness problem.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the method of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

As shown in fig. 1, the invention provides a drug repositioning method based on a deviation correcting mechanism and contrast learning, which comprises the following three stages:

Still preferably, the drug repositioning method based on deviation correcting mechanism and contrast learning provided by the invention is used as priori knowledge of disease information and drug information according to the relationship between drugs and diseases, the relationship between different drugs and the relationship between different diseases in the data processing stage.

Further preferably, in the drug repositioning method based on deviation correcting mechanism and contrast learning provided by the invention, in the process of information aggregation in the data processing stage, heterogeneous information of nodes is aggregated according to the relationship between drug nodes and disease nodes by LightGCN in combination with the deviation correcting mechanism;

Further preferably, in the drug repositioning method based on deviation correcting mechanism and contrast learning provided by the invention, in the process of acquiring the embedded vectors in the data processing stage, the embedded vectors of different nodes are generated according to the aggregated heterogeneous information and isomorphic information, wherein each embedded vector is associated according to the relationship between a drug node and a disease node, the relationship between different drugs and the relationship between different diseases.

Further preferably, in the drug repositioning method based on the deviation correcting mechanism and the contrast learning, information aggregation is performed after different inverse deviation scores are adaptively distributed to different aggregation nodes through the deviation correcting mechanism in a data processing stage, so that deviation influence caused by popular nodes and long tail nodes is relieved.

Further preferably, in the drug repositioning method based on deviation correcting mechanism and contrast learning, in the data optimization stage, based on heterogeneous features and neighbor features corresponding to different nodes, dual views are constructed to carry out contrast learning, and weighted binary cross entropy loss is combined to serve as an optimization target.

Still preferably, the data preprocessing module of the drug repositioning system based on deviation correcting mechanism and contrast learning is used for serving as priori knowledge of disease information and drug information according to the relationship between drugs and diseases, the relationship between different drugs and the relationship between different diseases; aggregating heterogeneous information of the nodes according to the relation between the drug nodes and the disease nodes by LightGCN combining with a deviation correcting mechanism; isomorphic information of different nodes is respectively aggregated according to the first K neighbor nodes of the drug node or the disease node.

Still preferably, the data preprocessing module of the drug repositioning system based on deviation correcting mechanism and contrast learning provided by the invention is further used for generating embedded vectors of different nodes according to the aggregated heterogeneous information and isomorphic information, wherein each embedded vector is associated according to the relationship between a drug node and a disease node, the relationship between different drugs and the relationship between different diseases.

Further preferably, the data optimization module of the drug repositioning system based on deviation correcting mechanism and contrast learning provided by the invention is also used for constructing dual views for contrast learning based on heterogeneous features and neighbor features corresponding to different nodes, and combining weighted binary cross entropy loss as an optimization target.

Example 1: aiming at the deviation problem caused by popular nodes and long-tail nodes existing in drug repositioning, if each drug and disease are regarded as nodes, the target can be changed into self-adaptive distribution of different inverse deviation scores for different nodes, and then information aggregation is carried out according to the magnitudes of the inverse deviation scores, so that deviation is relieved, and the characteristic with better expression is obtained.

As shown in fig. 1, the invention provides a novel drug repositioning method based on a deviation correcting mechanism and contrast learning. Specifically, a drug-drug similarity network, a disease-disease similarity network, and a known drug-disease association network are first constructed to aggregate heterogeneous information and neighbor information, resulting in more complete node information. Meanwhile, in the information aggregation process, a deviation rectifying mechanism is introduced, and different inverse deviation scores are distributed for different nodes in a self-adaptive mode, so that deviation alleviation is guaranteed, and global signals are captured. In addition, in the model optimization stage, contrast learning is introduced to capture the supervision signals, so that the data sparseness problem is relieved. The method specifically comprises the following steps:

(1) Disease modeling. For each disease, disease modeling learns the corresponding potential vector representations, namely drug-disease interactions and disease-disease interactions, by aggregating two interactions. In particular, heterogeneous information is aggregated by a drug node associated with the disease node; the neighbor information is aggregated based on the first K neighbor nodes (disease nodes) of the disease node.

(2) Drug modeling. For each drug, drug modeling learns the corresponding potential vector representations, namely drug-disease interactions and drug-drug interactions, by aggregating the two interactions. In particular, heterogeneous information is aggregated by a disease node associated with the drug node; the neighbor information is aggregated based on the first K neighbor nodes (drug nodes) of the drug node.

(3) When aggregating information, a deskew mechanism is used to adaptively assign different inverse bias scores to different nodes.

(4) And introducing contrast learning, and combining weighted binary cross entropy loss to jointly form a model optimization target.

(5) And obtaining the final characteristics of each node by carrying out weighted fusion on the heterogeneous information and the neighbor information, and obtaining the final prediction score by using the dot product.

In order to optimize this loss function, the present invention uses Adam optimizers and cyclic learning rates during model training.

TABLE 1

Dataset

SCMFDD

SCPMF

DRGBCN

GLGMPNN

DRHGCN

DRDM

AUROC

Fdataset

0.776±0.001

0.893±0.001

0.930±0.001

0.942±0.001

0.945±0.002

0.951±0.002

Cdataset

0.793±0.001

0.913±0.002

0.945±0.002

0.955±0.001

0.962±0.001

0.965±0.001

LRSSL

0.768±0.001

0.895±0.001

0.944±0.001

0.949±0.001

0.953±0.001

0.955±0.001

Average

0.768

0.901

0.939

0.948

0.953

0.957

AUPR

Fdataset

0.005±0.000

0.349±0.006

0.408±0.001

0.517±0.005

0.567±0.006

0.582±0.005

Cdataset

0.005±0.000

0.423±0.004

0.442±0.003

0.601±0.006

0.642±0.005

0.651±0.004

LRSSL

0.004±0.000

0.271±0.002

0.255±0.002

0.406±0.004

0.408±0.005

0.412±0.002

Average

0.004

0.348

0.368

0.508

0.539

0.548

To demonstrate the superiority of the model of the present invention, the present invention was compared to 5 advanced models on the 3 data sets Fdataset, cdataset and LRSSL. The area under the ROC curve (AUROC) and the area under the precision recall curve (AUPR) have been widely used in bioinformatics research and are therefore used to evaluate the overall performance of the model. Table 1 shows the performance of the inventive model in 10 fold cross-validation versus other models, with 2 indices on 3 data sets being consistently better than all of the comparative models, with average AUROC and AUPR of 0.957 and 0.548, respectively, being 0.4% and 0.9% higher than the second best model DRHGCN.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In the description of the present invention, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A drug repositioning method based on deviation correcting mechanism and contrast learning is characterized by comprising the following three stages:

2. The drug repositioning method based on deviation correcting mechanism and contrast learning according to claim 1, wherein the drug repositioning method is characterized in that:

In the data processing stage, the relationship between medicines and diseases, the relationship between different medicines and the relationship between different diseases are used as the prior knowledge of the disease information and the medicine information.

3. The drug repositioning method based on deviation correcting mechanism and contrast learning according to claim 2, wherein the drug repositioning method is characterized in that:

in the process of information aggregation in the data processing stage, heterogeneous information of nodes is aggregated according to the relation between the drug nodes and the disease nodes by LightGCN combining with a deviation correcting mechanism;

and respectively aggregating isomorphic information of different nodes according to the first K neighbor nodes of the medicine node and the disease node.

4. A drug repositioning method based on correction mechanism and contrast learning according to claim 3, characterized in that:

In the process of acquiring the embedded vectors in the data processing stage, the embedded vectors of different nodes are generated according to the aggregated heterogeneous information and isomorphic information, wherein each embedded vector is associated according to the relationship between the drug node and the disease node, the relationship between different drugs and the relationship between different diseases.

5. The drug repositioning method based on deviation correcting mechanism and contrast learning according to claim 4, wherein the drug repositioning method is characterized in that:

In the data processing stage, through the deviation correcting mechanism, information aggregation is carried out after different inverse deviation scores are distributed for different aggregation nodes in a self-adaptive mode, and therefore deviation influence caused by popular nodes and long tail nodes is relieved.

6. The drug repositioning method based on deviation correcting mechanism and contrast learning according to claim 5, wherein the drug repositioning method is characterized in that:

In the data optimization stage, based on heterogeneous features and neighbor features corresponding to different nodes, double views are constructed for comparison learning, and weighted binary cross entropy loss is combined to serve as an optimization target.

7. A medication repositioning system based on a correction mechanism and contrast learning, comprising:

The data optimization module is used for respectively generating an isomerism characteristic view and a neighbor characteristic view representing the drug node and the disease node according to the isomerism characteristic and the neighbor characteristic of the drug node and the disease node, carrying out contrast learning, and carrying out data optimization by combining weighted binary cross entropy loss;

8. The drug repositioning system of claim 7 wherein the drug repositioning system is based on a deviation correcting mechanism and contrast learning, wherein:

The data preprocessing module is used for serving as priori knowledge of the disease information and the drug information according to the relationship between drugs and diseases, the relationship between different drugs and the relationship between different diseases; aggregating heterogeneous information of the nodes according to the relation between the drug nodes and the disease nodes by LightGCN combining with a deviation correcting mechanism; isomorphic information of different nodes is respectively aggregated according to the first K neighbor nodes of the drug node or the disease node.

9. The drug repositioning system based on correction mechanism and contrast learning of claim 8 wherein:

The data preprocessing module is further used for generating embedded vectors of different nodes according to the aggregated heterogeneous information and isomorphic information, wherein each embedded vector is associated according to the relationship between a medicine node and a disease node, the relationship between different medicines and the relationship between different diseases.

10. The drug repositioning system of claim 9 wherein the drug repositioning system is based on a deviation correcting mechanism and contrast learning, wherein:

The data optimization module is also used for constructing double views based on heterogeneous features and neighbor features corresponding to different nodes to perform contrast learning, and combining weighted binary cross entropy loss to serve as an optimization target.