CN111477344B - Drug side effect identification method based on self-weighted multi-core learning - Google Patents
Drug side effect identification method based on self-weighted multi-core learning Download PDFInfo
- Publication number
- CN111477344B CN111477344B CN202010280936.XA CN202010280936A CN111477344B CN 111477344 B CN111477344 B CN 111477344B CN 202010280936 A CN202010280936 A CN 202010280936A CN 111477344 B CN111477344 B CN 111477344B
- Authority
- CN
- China
- Prior art keywords
- drug
- matrix
- side effect
- core
- kernel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention discloses a drug side effect identification method based on self-weighted multi-core learning, which solves the problems of incomplete drug characteristic expression and unreasonable weight distribution in the method for identifying drug side effect based on multi-core learning. The method comprises the steps of data acquisition, construction of a medicine core matrix and a side effect core matrix and the like. According to the method, the medicine characteristics are described from multiple angles, and the kernel matrix is constructed by adopting four methods for the medicine and the side effect characteristics, so that the influence of characteristic deletion on a prediction result can be reduced; constructing an optimal nuclear matrix of the medicine and the side effect by adopting a self-weighting method, wherein the weight calculated by the self-weighting method can be better adapted to different nuclear matrices; the local structure of the side effect relationship of the medicine can be captured by expanding the nuclear matrix by adopting a nearest neighbor method.
Description
Technical Field
The invention relates to the field of multi-core learning, in particular to a drug side effect identification method based on self-weighted multi-core learning.
Background
In recent years, drug safety problems due to side effects of drugs have been attracting attention. Drug side effects have become an important factor in the failure of clinical trials of drugs and are also a major problem affecting public health. Related studies on drug side effects have mainly several aspects: the method comprises the steps of calculating the similarity between medicines and predicting medicine targets by utilizing the relationship between medicines and side effects, realizing medicine repositioning by utilizing the similarity between the side effects, predicting the side effects possibly caused by the medicines based on the information such as the chemical structure of the medicines and the like, predicting the side effects of the medicines by utilizing a disease network and the like. Drug side effect identification plays an important role in the field of drug research, and timely and accurately predicting drug side effects has become a research hotspot at home and abroad.
The conventional method for predicting and evaluating potential side effects of drugs is generally to carry out clinical experiments on patients before the drugs are marketed and observe adverse reactions generated after the patients take the drugs.
In recent years, the accumulation of a large amount of drug side effect data provides researchers with a data source that can explore drug side effects from a molecular level, such as the SIDER database, etc. The development of computer technologies such as complex networks, data mining and the like provides a new thought for identifying side effects of drugs, and more researches begin to mine the corresponding relation of potential side effects of drugs from massive biological information data by means of scientific calculation. In the current research methods, drug side effect identification can be classified into classification algorithms and recommendation algorithms. By using chemical, biological and classification algorithms of drugs, side effects of drugs can be identified, in which method the most important issue is extraction of effective features from drugs and side effects. The classification algorithm used is a support vector machine, a decision tree, etc. The recommendation system may also identify drug side effect associations, including Matrix Factorization (MF), label Propagation Algorithms (LPA), collaborative Filtering (CF), and bipartite local models. These methods are also applicable to drug-target interactions, drug-side effect association recognition and MiRNA-disease association prediction.
The kernel method belongs to one of the classification algorithms. In view of the complexity of single core insufficiently handling the problem, many existing core-based machine learning algorithms combine multiple cores to obtain better similarity metrics. The multi-core learning algorithm is one of the core methods, and a plurality of basic cores are combined to replace a single core. The multi-core learning combines a plurality of kernel functions defined on different input data sources, is suitable for the condition that the characteristics of a sample data set are irregular and heterogeneous, and has higher flexibility. There are studies to identify drug side effects based on multi-core learning methods. CKA-MKL model [ Y.Ding, J.Tang, F.Guo.Identification of drug-side effect association via multiple information integration with centered kernel alignment [ J ]. Neurochem, 2019] constructs multiple kernels from drug space and side effect space respectively, linearly weights the corresponding kernels by a multi-kernel learning algorithm based on center kernel alignment in two different spaces, and finally identifies drug side effect by fusing the drug kernels and side effect kernels by Kronecker RLS.
Although the side effect recognition method based on the multi-core learning method has advanced to some extent, the following problems remain:
the existing method does not consider the data sources such as the association relation between the drug and the target point, and the like, considers the incomplete characteristics of the drug and the side effect, can not accurately express the drug and the side effect, and influences the prediction accuracy.
Most existing multi-core learning methods assume that the optimal core is a linear combination of basic cores, and this assumption may not be satisfied, which results in improper weight distribution and affects the accuracy of the prediction result.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention provides a drug side effect identification method based on self-weighted multi-core learning, which solves the problems that drug characteristic expression is incomplete and weight distribution is unreasonable when a kernel function is weighted.
The invention provides a side effect identification method based on self-weighted multi-core learning, which more fully describes the characteristics of medicines from three aspects of side effect, target point and substructure. In order to capture the influence of the similar relationship between the drug and the side effect on the identification of the side effect of the drug, namely the local structure of the side effect relationship of the drug, a nearest neighbor method is adopted to expand the kernel matrix, so that the accuracy of the prediction result is improved.
The invention is realized by the following technical scheme:
a drug side effect identification method based on self-weighted multi-core learning comprises the following steps:
step 1: and (3) data acquisition: collecting information from a database;
step 2: construction of a drug core matrix and a side effect core matrix: constructing a data set representing the types of medicines, constructing a data set of the types of side effects, and constructing a relation matrix between medicines and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute core (GIP), a correlation coefficient core (Corr), a cosine similarity Core (COS) and a mutual information core (MI), and generating a core matrix of a medicine attribute space and a core matrix of a side effect attribute space according to the four kinds of similarity data obtained by calculation;
step 3: according to the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, a self-weighted multi-kernel learning objective function is established, the medicine optimal kernel matrix and the side effect optimal kernel matrix are obtained through iterative updating, the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space are expanded by a nearest neighbor method, at the moment, the objective function is minimized by using a Gao Sichang and harmonic function method, and the predicted medicine side effect relation matrix is finally obtained through continuous iterative updating.
Further, the step 2 further includes the steps of generating four drug attribute cores by using a drug-substructure relationship matrix, generating four drug attribute cores by using a drug-target relationship matrix, and substituting the eight drug attribute cores and the four drug attribute cores generated by using a drug-side effect relationship matrix into the step 3 for calculation.
Further, in the step 1, the information collected from the database includes drug information, drug-protein interaction information, targeting protein information, drug side effect relationship information, and side effect information having both targeting protein and side effect information.
Further, the chemical structural code of the drug adopts a molecular fingerprint, and the molecular fingerprint consists of various chemical substructures defined in the PubCHem database.
Further, the step 2 includes the following detailed steps:
with d= { D 1 ,d 2 ,…,d n The number of n drugs is represented by }, and the number of drugs is represented by d,S={s 1 ,s 2 ,…,s m -represents a collection of m side effects, s represents a side effect;
an n×m adjacency matrix F represents a relationship matrix between the drug and the side effects, F i.j (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) is an element of the F adjacent matrix, when the drug d i Side effects s exist j When F i.j =1; otherwise, F i.j =0, for drug d i The use of side effects is expressed asIs a binary vector with length of m, and the value of each element in the vector is 1 or 0;
the gaussian interaction profile kernel (GIP) is specifically expressed as:
and->Drug d indicated by side effects respectively i And drug d k Gamma represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
cosine similarity kernel (COS) is expressed as:
the mutual information core (MI) is expressed as:
u.epsilon.0.1 and v.epsilon.0.1 for the drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug has the side effect, and f (u) indicates that u is in the side effect spaceIn (c) is f (v) represents v at +.>F (u, v) represents the relative observed frequency.
Further, the step 3 includes the following detailed steps:
a kernel matrix representing a drug property space, +.>A kernel matrix representing a side effect attribute space, C d Representing the number of nuclei of the drug space, C s The objective function of the self-weighted multi-kernel learning, representing the number of kernels of the side effect space, is as follows:
wherein omega i Representation ofWeight of->Given->C d Representing the number of drug cores to obtain ω i After the initial value, calculate->ω i Along with->Dynamically changing and continuously updating omega i Finally, the optimal nucleus of the medicine is obtained>Obtaining optimal core of side effect by the same learning method>
Further, the nearest neighbor method specifically comprises the following steps: with medicine d i Similar k neighbor drugs are denoted as N (d i ) E D, k neighbor graph N d ∈R n×m The elements are as follows:
N d pharmaceutical core matrix for sparsificationUsing N d Obtaining an expanded medicine core matrix after thinning the core matrix>
Wherein, represents the Hadamard product of the matrix; for side effect information, and side effect s j Similar k neighbor side effects are denoted as N (s j ) Epsilon S, k-nearest neighbor graph N of side effects s Nuclear matrix for sparsifying side effectsObtaining an expanded side effect nuclear matrix->
Further, the following objective functions are specifically minimized using the gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF) method:
tr (. Cndot.) represents the trace of the matrix, μ and σ are non-negative parameters, E l (F * ) Is a loss function, E d (F * ) Is a graph regularization term to the drug feature space, E s (F * ) Is a graph regularization term to the side effect feature space, F train Representing a portion of the drug side effect relationship matrix, used as training data,is a diagonal matrix in which:
L d ∈R n×n and L s ∈R m×m Is a laplace matrix:
D d and D s Is a diagonal matrix:
I d ∈R n×n is an identity matrix, and continuously updates matrix F * Finally, a predicted drug side effect relation matrix is obtained.
The invention has the following advantages and beneficial effects:
according to the method, the medicine characteristics are described from multiple angles, and the kernel matrix is constructed by adopting four methods for the medicine and the side effect characteristics, so that the influence of characteristic deletion on a prediction result can be reduced; constructing an optimal nuclear matrix of the medicine and the side effect by adopting a self-weighting method, wherein the weight calculated by the self-weighting method can be better adapted to different nuclear matrices; the local structure of the side effect relationship of the medicine can be captured by expanding the nuclear matrix by adopting a nearest neighbor method. The invention can more accurately identify the side effects of the medicine based on the method.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention. In the drawings:
FIG. 1 is a schematic diagram of a process flow according to the present invention.
FIG. 2 is a schematic diagram of a multi-core learning model of the present invention.
Detailed Description
Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive improvements, are intended to fall within the scope of the invention.
A drug side effect identification method based on self-weighted multi-core learning, as shown in figure 1, comprises the following steps:
step 1: and (3) data acquisition: collecting information from a database;
step 2: construction of a drug core matrix and a side effect core matrix: constructing a data set representing the types of medicines, constructing a data set of the types of side effects, and constructing a relation matrix between medicines and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute core (GIP), a correlation coefficient core (Corr), a cosine similarity Core (COS) and a mutual information core (MI), and generating a core matrix of a medicine attribute space and a core matrix of a side effect attribute space according to the four kinds of similarity data obtained by calculation;
step 3: according to the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, a self-weighted multi-kernel learning objective function is established, the medicine optimal kernel matrix and the side effect optimal kernel matrix are obtained through iterative updating, the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space are expanded by a nearest neighbor method, at the moment, the objective function is minimized by using a Gao Sichang and harmonic function method, and the predicted medicine side effect relation matrix is finally obtained through continuous iterative updating.
Example 1:
and (3) data acquisition:
the data used in the technical scheme of the invention are derived from a Mizutani database. The Mizutani database collects 658 drug information, 5074 drug-protein interactions, 1368 targeting proteins, 49051 drug side effects and 1339 side effects with both targeting proteins and side effects. To encode the pharmaceutical chemical structure, a molecular fingerprint was used, which consists of 881 chemical substructures defined in the pubhem database.
Construction of drug core and side effect core:
with d= { D 1 ,d 2 ,…,d n The number of n drugs is represented by s= { S }, which is a set of n drugs 1 ,s 2 ,…,s m And m represents a set of m side effects. The n×m adjacency matrix F represents a relationship matrix between the drug and the side effects. F (F) i.j (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) is an element of the F adjacent matrix, when the drug d i Side effects s exist j When F i.j =1; otherwise, F i.j =0. For drug d i The use of side effects is expressed asIs a binary vector of length m, and the value of each element in the vector is 1 or 0.
Four similarity metrics are used to construct a kernel matrix: gaussian interaction property kernel (GIP), correlation coefficient kernel (Corr), cosine similarity kernel (COS), and mutual information kernel (MI).
The Gaussian interaction attribute core is constructed according to the topological structure of a known drug-side effect network, so that nonlinear mapping can be realized, drugs are mapped into nonlinear representation, and each drug vector is high in distinguishability and specifically expressed as follows:
and->Drug d indicated by side effects respectively i And drug d k Gamma represents the bandwidth of the gaussian kernel.
The correlation coefficient kernel may measure the linear relationship of the drug vector, expressed as:
The cosine similarity kernel regards the medicine as vector representation in m-dimensional side effect space, evaluates the similarity of two medicines by calculating the cosine value of the included angle of the two vectors, better measures the difference of two medicine variables in the side effect space, and the more consistent the direction directions of the two medicine variables are, the higher the similarity is. Expressed as:
the mutual information kernel can be used to measure the degree of interdependence between two discrete random variables, i.e., between two drug observable frequencies, expressed as:
u.epsilon.0, 1 and v.epsilon.0, 1 for the drug variable in the side effect space, 0 indicates that the drug does not have the side effect and 1 indicates that the drug does have the side effect. f (u) represents u inFor example, when u=1, f (u) represents 1 at the drug vector +.>Is a frequency of (a) in the frequency range of (b). f (v) represents v at +.>F (u, v) represents the relative observed frequency.
The above description is of the use of side effects to represent the property core of a drug, and similarly, the use of substructures to represent the property core of a drug is: k (K) GIP-chem,d 、K Corr-chem,d 、K Cos-chem,d And K MI-chem,d The method comprises the steps of carrying out a first treatment on the surface of the Using target to represent drug property core K GIP-target,d 、K Corr-target,d 、K Cos-target,d And K MI-target,d The method comprises the steps of carrying out a first treatment on the surface of the The attribute cores that use drugs to represent side effects are: k (K) GIP-link,s 、K Corr-link,s 、K Cos-link,s And K MI-link,s 。
Multi-core learning generates an optimal core:
as shown in the figure 2 of the drawings,core representing a drug property space->A kernel representing a side effect attribute space. C (C) d Representing the number of cores in the drug space, C in this scenario d =12;C s The number of nuclei representing the side effect space, C in this scenario s =4。
Taking the generation of a drug-optimal core as an example, due to the near-end of the drug-optimal coreFor the optimal kernel to be close to each attribute kernel of the drug or side effect, the objective function of the self-weighted multi-kernel learning is as follows: />
Wherein omega i Representation ofWeight of->Due to omega i Dependent on the target variable->The +.>Thus omega i And cannot be calculated. First give->C d Indicating the number of drug cores. Obtaining omega i After the initial value, calculate->ω i Along with->Dynamically changing and continuously updating omega i Finally, the optimal nucleus of the medicine is obtained>
Graph-based semi-supervised learning:
semi-supervised learning can obtain a global structure of drug side effects relationships, but ignores the effects of drug-and side effects-like relationships to drug side effects recognition. Thus, the present approach extends the core matrix with nearest neighbor methods. With medicine d i Similar k neighbor drugs are denoted as N (d i ) E D, k neighbor graph N d ∈R n×m The elements are as follows:
N d pharmaceutical core matrix for sparsificationUsing N d Obtaining the extended drug core matrix after sparsifying the core matrix
Wherein, represents the Hadamard product of the matrix.
For side effect information, and side effect s j Similar k neighbor side effects are denoted as N (s j ) Epsilon S, k-nearest neighbor graph N of side effects s Nuclear matrix for sparsifying side effectsObtaining an expanded side effect nuclear matrix->
To find the optimal predicted drug side effect relationship matrix F * The following objective functions were minimized using the gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF) method:
tr (·) represents the trace of the matrix. μ and σ are non-negative parameters. E (E) l (F * ) Is a loss function, E d (F * ) Is a graph regularization term to the drug feature space, E s (F * ) Is a graph regularization term to the side effect feature space. F (F) train Representing a portion of the drug side effect relationship matrix for use as training data.Is a diagonal matrix in which:
L d ∈R n×n and L s ∈R m×m Is a laplace matrix:
D d and D s Is a diagonal matrix:
I d ∈R n×n is an identity matrix. Continuously updating matrix F * Finally, a predicted drug side effect relation matrix is obtained.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (7)
1. The drug side effect identification method based on self-weighted multi-core learning is characterized by comprising the following steps of:
step 1: and (3) data acquisition: collecting information from a database; the information collected from the database comprises drug information and drug-protein interaction information, wherein the drug information and the drug-protein interaction information simultaneously comprise targeting protein information and side effect information, and the targeting protein information and the drug side effect relationship information and the side effect information; the data is derived from a Mizutani database which collects 658 drug information, 5074 drug-protein interactions, 1368 targeting proteins, 49051 drug side effect relationships, 1339 side effects, which have both targeting proteins and side effect messages;
step 2: constructing a medicine core matrix and a side effect core matrix based on the data in the step 1: constructing a data set representing the types of medicines, constructing a data set of the types of side effects, and constructing a relation matrix between medicines and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute core (GIP), a correlation coefficient core (Corr), a cosine similarity Core (COS) and a mutual information core (MI), and generating a core matrix of a medicine attribute space and a core matrix of a side effect attribute space according to the four kinds of similarity data obtained by calculation;
step 3: according to the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, a self-weighted multi-kernel learning objective function is established, the medicine optimal kernel matrix and the side effect optimal kernel matrix are obtained through iterative updating, the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space are expanded by a nearest neighbor method, at the moment, the objective function is minimized by using a Gao Sichang and harmonic function method, and the predicted medicine side effect relation matrix is finally obtained through continuous iterative updating.
2. The method for identifying side effects of drugs based on self-weighted multi-core learning according to claim 1, wherein the step 2 further comprises the steps of generating four drug attribute cores using a drug-substructure relationship matrix, generating four drug attribute cores using a drug-target relationship matrix, substituting the eight drug attribute cores into the step 3 together with the four drug attribute cores generated using the drug-side effect relationship matrix, and calculating.
3. The method for identifying side effects of a drug based on self-weighted multi-core learning according to claim 1, wherein the chemical structure code of the drug adopts molecular fingerprints, and the molecular fingerprints are composed of a plurality of chemical substructures defined in a pubhem database.
4. A method for identifying side effects of drugs based on self-weighted multi-core learning according to claim 3, wherein said step 2 comprises the following detailed steps:
with d= { D 1 ,d 2 ,…,d n The number of n drugs is represented by }, the number of drugs is represented by d, and s= { S 1 ,s 2 ,…,s m -represents a collection of m side effects, s represents a side effect;
an n×m adjacency matrix F represents a relationship matrix between the drug and the side effects, F i.j (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) is an element of the F adjacent matrix, when the drug d i Side effects s exist j When F i.j =1; otherwise, F i.j =0, for drug d i The side effects are denoted as F di Is a binary vector with length of m, and the value of each element in the vector is 1 or 0;
the gaussian interaction profile kernel (GIP) is specifically expressed as:
and->Drug d indicated by side effects respectively i And drug d k Gamma represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
cosine similarity kernel (COS) is expressed as:
the mutual information core (MI) is expressed as:
u.epsilon.0.1 and v.epsilon.0.1 for the drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug has the side effect, and f (u) indicates that u is in the side effect spaceIn (c) is f (v) represents v at +.>F (u, v) represents the relative observed frequency.
5. A method for identifying side effects of drugs based on self-weighted multi-core learning as claimed in claim 3, wherein said step 3 comprises the following detailed steps:
a kernel matrix representing a drug property space, +.>A kernel matrix representing a side effect attribute space, C d Representing the number of nuclei of the drug space, C s The objective function of the self-weighted multi-kernel learning, representing the number of kernels of the side effect space, is as follows:
6. The method for identifying side effects of drugs based on self-weighted multi-core learning according to claim 3, wherein the nearest neighbor method specifically comprises: with medicine d i Similar k neighbor drugs are denoted as N (d i ) E D, k neighbor graph N d ∈R n×m The elements are as follows:
N d pharmaceutical core matrix for sparsificationUsing N d Sparsifying a kernel matrixObtaining the expanded drug core matrix
Wherein, represents the Hadamard product of the matrix; for side effect information, and side effect s j Similar k neighbor side effects are denoted as N (s j ) Epsilon S, k-nearest neighbor graph N of side effects s Nuclear matrix for sparsifying side effectsObtaining an extended side effect kernel matrix
7. A method for identifying side effects of drugs based on self-weighted multi-kernel learning according to claim 3, characterized in that the following objective functions are minimized using gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF) method:
E d (F * )=tr(F *T L d F * )
E s (F * )=tr(F * L s F *T )
tr (. Cndot.) represents the trace of the matrix, μ and σ are non-negative parameters, E l (F * ) Is a loss function, E d (F * ) Is a graph regularization term to the drug feature space, E s (F * ) Is a graph regularization term to the side effect feature space, F train Representing a portion of the drug side effect relationship matrix, used as training data,is a diagonal matrix in which:
L d ∈R n×n and L s ∈R m×m Is a laplace matrix:
D d and D s Is a diagonal matrix:
I d ∈R n×n is an identity matrix, and continuously updates matrix F * Finally, a predicted drug side effect relation matrix is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010280936.XA CN111477344B (en) | 2020-04-10 | 2020-04-10 | Drug side effect identification method based on self-weighted multi-core learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010280936.XA CN111477344B (en) | 2020-04-10 | 2020-04-10 | Drug side effect identification method based on self-weighted multi-core learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111477344A CN111477344A (en) | 2020-07-31 |
CN111477344B true CN111477344B (en) | 2023-06-09 |
Family
ID=71751948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010280936.XA Active CN111477344B (en) | 2020-04-10 | 2020-04-10 | Drug side effect identification method based on self-weighted multi-core learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111477344B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071439B (en) * | 2020-08-19 | 2024-01-02 | 中南大学 | Drug side effect relationship prediction method, system, computer device, and storage medium |
CN112863693B (en) * | 2021-02-04 | 2021-09-28 | 东北林业大学 | Drug target interaction prediction method based on multi-channel graph convolution network |
CN115910382A (en) * | 2022-07-26 | 2023-04-04 | 苏州科技大学 | Method for predicting side effects of drugs by using restricted Boltzmann machine based on penalty regular term |
CN116504331A (en) * | 2023-04-28 | 2023-07-28 | 东北林业大学 | Frequency score prediction method for drug side effects based on multiple modes and multiple tasks |
CN116705148B (en) * | 2023-07-24 | 2023-10-27 | 中国人民解放军总医院 | Antiviral drug screening method and system based on Laplace least square method |
CN117079835B (en) * | 2023-08-21 | 2024-02-20 | 广东工业大学 | Multi-view-based medicine-medicine interaction prediction method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005027012A2 (en) * | 2003-09-16 | 2005-03-24 | Pfizer Inc. | System and method for the computer-assisted identification of drugs and indications |
WO2015054266A1 (en) * | 2013-10-08 | 2015-04-16 | The Regents Of The University Of California | Predictive optimization of network system response |
WO2016191340A1 (en) * | 2015-05-22 | 2016-12-01 | Georgetown University | Discovery and analysis of drug-related side effects |
CN108647484A (en) * | 2018-05-17 | 2018-10-12 | 中南大学 | A kind of drug relationship prediction technique integrated based on multiple information with least square method |
CN110188812A (en) * | 2019-05-24 | 2019-08-30 | 长沙理工大学 | A kind of multicore clustering method of quick processing missing isomeric data |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130179187A1 (en) * | 2012-01-06 | 2013-07-11 | Molecular Health | Systems and methods for de-risking patient treatment |
EP2801047B1 (en) * | 2012-01-06 | 2022-02-23 | Molecular Health GmbH | Systems and methods for multivariate analysis of adverse event data |
US9530095B2 (en) * | 2013-06-26 | 2016-12-27 | International Business Machines Corporation | Method and system for exploring the associations between drug side-effects and therapeutic indications |
US10803144B2 (en) * | 2014-05-06 | 2020-10-13 | International Business Machines Corporation | Predicting drug-drug interactions based on clinical side effects |
US20160092793A1 (en) * | 2014-09-26 | 2016-03-31 | Thomson Reuters Global Resources | Pharmacovigilance systems and methods utilizing cascading filters and machine learning models to classify and discern pharmaceutical trends from social media posts |
US11037684B2 (en) * | 2014-11-14 | 2021-06-15 | International Business Machines Corporation | Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity |
US20180011977A1 (en) * | 2015-03-13 | 2018-01-11 | Ubic, Inc. | Data analysis system, data analysis method, and data analysis program |
US10783997B2 (en) * | 2016-08-26 | 2020-09-22 | International Business Machines Corporation | Personalized tolerance prediction of adverse drug events |
CN106529205B (en) * | 2016-11-03 | 2019-03-26 | 中南大学 | It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information |
US11289178B2 (en) * | 2017-04-21 | 2022-03-29 | International Business Machines Corporation | Identifying chemical substructures associated with adverse drug reactions |
CN106960131A (en) * | 2017-05-05 | 2017-07-18 | 华东师范大学 | A kind of drug side-effect Forecasting Methodology based on multi-feature fusion |
KR101953762B1 (en) * | 2017-09-25 | 2019-03-04 | (주)신테카바이오 | Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data |
US20190206537A1 (en) * | 2018-01-04 | 2019-07-04 | Chioma Cynthia Nwaubani | Method and system for customizing, aggregating, prioritizing, and displaying medication adverse effects |
US11164678B2 (en) * | 2018-03-06 | 2021-11-02 | International Business Machines Corporation | Finding precise causal multi-drug-drug interactions for adverse drug reaction analysis |
KR20200023689A (en) * | 2018-08-20 | 2020-03-06 | 아주대학교산학협력단 | The method of artificial intelligence(AI)-based adverse drug reactions detection and the system thereof |
CN110246550B (en) * | 2019-06-12 | 2022-12-06 | 西安电子科技大学 | Drug combination prediction method based on drug similarity network data |
CN110957002B (en) * | 2019-12-17 | 2023-04-28 | 电子科技大学 | Drug target interaction relation prediction method based on synergistic matrix decomposition |
-
2020
- 2020-04-10 CN CN202010280936.XA patent/CN111477344B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005027012A2 (en) * | 2003-09-16 | 2005-03-24 | Pfizer Inc. | System and method for the computer-assisted identification of drugs and indications |
WO2015054266A1 (en) * | 2013-10-08 | 2015-04-16 | The Regents Of The University Of California | Predictive optimization of network system response |
WO2016191340A1 (en) * | 2015-05-22 | 2016-12-01 | Georgetown University | Discovery and analysis of drug-related side effects |
CN108647484A (en) * | 2018-05-17 | 2018-10-12 | 中南大学 | A kind of drug relationship prediction technique integrated based on multiple information with least square method |
CN110188812A (en) * | 2019-05-24 | 2019-08-30 | 长沙理工大学 | A kind of multicore clustering method of quick processing missing isomeric data |
Also Published As
Publication number | Publication date |
---|---|
CN111477344A (en) | 2020-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111477344B (en) | Drug side effect identification method based on self-weighted multi-core learning | |
Lan et al. | A survey of data mining and deep learning in bioinformatics | |
Askr et al. | Deep learning in drug discovery: an integrative review and future challenges | |
Tang et al. | Unified one-step multi-view spectral clustering | |
Bhatti et al. | Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence | |
Pashaei et al. | Binary black hole algorithm for feature selection and classification on biological data | |
Shi et al. | Graph temporal ensembling based semi-supervised convolutional neural network with noisy labels for histopathology image analysis | |
CN110347932B (en) | Cross-network user alignment method based on deep learning | |
Ghadge et al. | Intelligent heart attack prediction system using big data | |
Malondkar et al. | Spark-GHSOM: growing hierarchical self-organizing map for large scale mixed attribute datasets | |
López-Cruz et al. | Bayesian network modeling of the consensus between experts: An application to neuron classification | |
CN105117618B (en) | It is a kind of based on the drug targets of guilt by association principle and network topology structure feature interact recognition methods | |
Lin et al. | Patient similarity via joint embeddings of medical knowledge graph and medical entity descriptions | |
CN112382411A (en) | Drug-protein targeting effect prediction method based on heterogeneous graph | |
Sarwar et al. | A survey of big data analytics in healthcare | |
Pouyan et al. | Clustering single-cell expression data using random forest graphs | |
Zhao et al. | A multi-graph deep learning model for predicting drug-disease associations | |
Zhang et al. | Line graph contrastive learning for link prediction | |
Luo et al. | Towards semi-supervised universal graph classification | |
Lynn et al. | Data independent acquisition based bi-directional deep networks for biometric ECG authentication | |
Ding et al. | Boosting few-shot hyperspectral image classification using pseudo-label learning | |
Simić et al. | A hybrid clustering approach for diagnosing medical diseases | |
Hedar et al. | K-means cloning: adaptive spherical k-means clustering | |
Zhang et al. | Domain-specific topic model for knowledge discovery in computational and data-intensive scientific communities | |
CN115394348A (en) | IncRNA subcellular localization prediction method, equipment and medium based on graph convolution network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |