CN111477344B - Drug side effect identification method based on self-weighted multi-core learning - Google Patents

Drug side effect identification method based on self-weighted multi-core learning Download PDF

Info

Publication number
CN111477344B
CN111477344B CN202010280936.XA CN202010280936A CN111477344B CN 111477344 B CN111477344 B CN 111477344B CN 202010280936 A CN202010280936 A CN 202010280936A CN 111477344 B CN111477344 B CN 111477344B
Authority
CN
China
Prior art keywords
drug
matrix
side effect
core
kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010280936.XA
Other languages
Chinese (zh)
Other versions
CN111477344A (en
Inventor
刘勇国
李杨
杨尚明
李巧勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010280936.XA priority Critical patent/CN111477344B/en
Publication of CN111477344A publication Critical patent/CN111477344A/en
Application granted granted Critical
Publication of CN111477344B publication Critical patent/CN111477344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a drug side effect identification method based on self-weighted multi-core learning, which solves the problems of incomplete drug characteristic expression and unreasonable weight distribution in the method for identifying drug side effect based on multi-core learning. The method comprises the steps of data acquisition, construction of a medicine core matrix and a side effect core matrix and the like. According to the method, the medicine characteristics are described from multiple angles, and the kernel matrix is constructed by adopting four methods for the medicine and the side effect characteristics, so that the influence of characteristic deletion on a prediction result can be reduced; constructing an optimal nuclear matrix of the medicine and the side effect by adopting a self-weighting method, wherein the weight calculated by the self-weighting method can be better adapted to different nuclear matrices; the local structure of the side effect relationship of the medicine can be captured by expanding the nuclear matrix by adopting a nearest neighbor method.

Description

Drug side effect identification method based on self-weighted multi-core learning
Technical Field
The invention relates to the field of multi-core learning, in particular to a drug side effect identification method based on self-weighted multi-core learning.
Background
In recent years, drug safety problems due to side effects of drugs have been attracting attention. Drug side effects have become an important factor in the failure of clinical trials of drugs and are also a major problem affecting public health. Related studies on drug side effects have mainly several aspects: the method comprises the steps of calculating the similarity between medicines and predicting medicine targets by utilizing the relationship between medicines and side effects, realizing medicine repositioning by utilizing the similarity between the side effects, predicting the side effects possibly caused by the medicines based on the information such as the chemical structure of the medicines and the like, predicting the side effects of the medicines by utilizing a disease network and the like. Drug side effect identification plays an important role in the field of drug research, and timely and accurately predicting drug side effects has become a research hotspot at home and abroad.
The conventional method for predicting and evaluating potential side effects of drugs is generally to carry out clinical experiments on patients before the drugs are marketed and observe adverse reactions generated after the patients take the drugs.
In recent years, the accumulation of a large amount of drug side effect data provides researchers with a data source that can explore drug side effects from a molecular level, such as the SIDER database, etc. The development of computer technologies such as complex networks, data mining and the like provides a new thought for identifying side effects of drugs, and more researches begin to mine the corresponding relation of potential side effects of drugs from massive biological information data by means of scientific calculation. In the current research methods, drug side effect identification can be classified into classification algorithms and recommendation algorithms. By using chemical, biological and classification algorithms of drugs, side effects of drugs can be identified, in which method the most important issue is extraction of effective features from drugs and side effects. The classification algorithm used is a support vector machine, a decision tree, etc. The recommendation system may also identify drug side effect associations, including Matrix Factorization (MF), label Propagation Algorithms (LPA), collaborative Filtering (CF), and bipartite local models. These methods are also applicable to drug-target interactions, drug-side effect association recognition and MiRNA-disease association prediction.
The kernel method belongs to one of the classification algorithms. In view of the complexity of single core insufficiently handling the problem, many existing core-based machine learning algorithms combine multiple cores to obtain better similarity metrics. The multi-core learning algorithm is one of the core methods, and a plurality of basic cores are combined to replace a single core. The multi-core learning combines a plurality of kernel functions defined on different input data sources, is suitable for the condition that the characteristics of a sample data set are irregular and heterogeneous, and has higher flexibility. There are studies to identify drug side effects based on multi-core learning methods. CKA-MKL model [ Y.Ding, J.Tang, F.Guo.Identification of drug-side effect association via multiple information integration with centered kernel alignment [ J ]. Neurochem, 2019] constructs multiple kernels from drug space and side effect space respectively, linearly weights the corresponding kernels by a multi-kernel learning algorithm based on center kernel alignment in two different spaces, and finally identifies drug side effect by fusing the drug kernels and side effect kernels by Kronecker RLS.
Although the side effect recognition method based on the multi-core learning method has advanced to some extent, the following problems remain:
the existing method does not consider the data sources such as the association relation between the drug and the target point, and the like, considers the incomplete characteristics of the drug and the side effect, can not accurately express the drug and the side effect, and influences the prediction accuracy.
Most existing multi-core learning methods assume that the optimal core is a linear combination of basic cores, and this assumption may not be satisfied, which results in improper weight distribution and affects the accuracy of the prediction result.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention provides a drug side effect identification method based on self-weighted multi-core learning, which solves the problems that drug characteristic expression is incomplete and weight distribution is unreasonable when a kernel function is weighted.
The invention provides a side effect identification method based on self-weighted multi-core learning, which more fully describes the characteristics of medicines from three aspects of side effect, target point and substructure. In order to capture the influence of the similar relationship between the drug and the side effect on the identification of the side effect of the drug, namely the local structure of the side effect relationship of the drug, a nearest neighbor method is adopted to expand the kernel matrix, so that the accuracy of the prediction result is improved.
The invention is realized by the following technical scheme:
a drug side effect identification method based on self-weighted multi-core learning comprises the following steps:
step 1: and (3) data acquisition: collecting information from a database;
step 2: construction of a drug core matrix and a side effect core matrix: constructing a data set representing the types of medicines, constructing a data set of the types of side effects, and constructing a relation matrix between medicines and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute core (GIP), a correlation coefficient core (Corr), a cosine similarity Core (COS) and a mutual information core (MI), and generating a core matrix of a medicine attribute space and a core matrix of a side effect attribute space according to the four kinds of similarity data obtained by calculation;
step 3: according to the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, a self-weighted multi-kernel learning objective function is established, the medicine optimal kernel matrix and the side effect optimal kernel matrix are obtained through iterative updating, the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space are expanded by a nearest neighbor method, at the moment, the objective function is minimized by using a Gao Sichang and harmonic function method, and the predicted medicine side effect relation matrix is finally obtained through continuous iterative updating.
Further, the step 2 further includes the steps of generating four drug attribute cores by using a drug-substructure relationship matrix, generating four drug attribute cores by using a drug-target relationship matrix, and substituting the eight drug attribute cores and the four drug attribute cores generated by using a drug-side effect relationship matrix into the step 3 for calculation.
Further, in the step 1, the information collected from the database includes drug information, drug-protein interaction information, targeting protein information, drug side effect relationship information, and side effect information having both targeting protein and side effect information.
Further, the chemical structural code of the drug adopts a molecular fingerprint, and the molecular fingerprint consists of various chemical substructures defined in the PubCHem database.
Further, the step 2 includes the following detailed steps:
with d= { D 1 ,d 2 ,…,d n The number of n drugs is represented by }, and the number of drugs is represented by d,S={s 1 ,s 2 ,…,s m -represents a collection of m side effects, s represents a side effect;
an n×m adjacency matrix F represents a relationship matrix between the drug and the side effects, F i.j (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) is an element of the F adjacent matrix, when the drug d i Side effects s exist j When F i.j =1; otherwise, F i.j =0, for drug d i The use of side effects is expressed as
Figure BDA0002446548040000031
Is a binary vector with length of m, and the value of each element in the vector is 1 or 0;
the gaussian interaction profile kernel (GIP) is specifically expressed as:
Figure BDA0002446548040000032
Figure BDA0002446548040000033
and->
Figure BDA0002446548040000034
Drug d indicated by side effects respectively i And drug d k Gamma represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
Figure BDA0002446548040000035
Figure BDA0002446548040000036
denoted as->
Figure BDA0002446548040000037
And->
Figure BDA0002446548040000038
Covariance of->
Figure BDA0002446548040000039
Denoted as->
Figure BDA00024465480400000310
Variance of->
Figure BDA00024465480400000311
Denoted as->
Figure BDA00024465480400000312
Is a variance of (2);
cosine similarity kernel (COS) is expressed as:
Figure BDA00024465480400000313
the mutual information core (MI) is expressed as:
Figure BDA00024465480400000314
u.epsilon.0.1 and v.epsilon.0.1 for the drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug has the side effect, and f (u) indicates that u is in the side effect space
Figure BDA00024465480400000315
In (c) is f (v) represents v at +.>
Figure BDA00024465480400000316
F (u, v) represents the relative observed frequency.
Further, the step 3 includes the following detailed steps:
Figure BDA00024465480400000317
a kernel matrix representing a drug property space, +.>
Figure BDA00024465480400000318
A kernel matrix representing a side effect attribute space, C d Representing the number of nuclei of the drug space, C s The objective function of the self-weighted multi-kernel learning, representing the number of kernels of the side effect space, is as follows:
Figure BDA0002446548040000041
wherein omega i Representation of
Figure BDA0002446548040000042
Weight of->
Figure BDA0002446548040000043
Given->
Figure BDA0002446548040000044
C d Representing the number of drug cores to obtain ω i After the initial value, calculate->
Figure BDA0002446548040000045
ω i Along with->
Figure BDA0002446548040000046
Dynamically changing and continuously updating omega i Finally, the optimal nucleus of the medicine is obtained>
Figure BDA0002446548040000047
Obtaining optimal core of side effect by the same learning method>
Figure BDA0002446548040000048
Further, the nearest neighbor method specifically comprises the following steps: with medicine d i Similar k neighbor drugs are denoted as N (d i ) E D, k neighbor graph N d ∈R n×m The elements are as follows:
Figure BDA0002446548040000049
N d pharmaceutical core matrix for sparsification
Figure BDA00024465480400000410
Using N d Obtaining an expanded medicine core matrix after thinning the core matrix>
Figure BDA00024465480400000411
Figure BDA00024465480400000412
Wherein, represents the Hadamard product of the matrix; for side effect information, and side effect s j Similar k neighbor side effects are denoted as N (s j ) Epsilon S, k-nearest neighbor graph N of side effects s Nuclear matrix for sparsifying side effects
Figure BDA00024465480400000413
Obtaining an expanded side effect nuclear matrix->
Figure BDA00024465480400000414
Figure BDA00024465480400000415
Further, the following objective functions are specifically minimized using the gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF) method:
Figure BDA00024465480400000416
Figure BDA00024465480400000417
Figure BDA00024465480400000418
Figure BDA00024465480400000419
tr (. Cndot.) represents the trace of the matrix, μ and σ are non-negative parameters, E l (F * ) Is a loss function, E d (F * ) Is a graph regularization term to the drug feature space, E s (F * ) Is a graph regularization term to the side effect feature space, F train Representing a portion of the drug side effect relationship matrix, used as training data,
Figure BDA00024465480400000420
is a diagonal matrix in which:
Figure BDA00024465480400000421
L d ∈R n×n and L s ∈R m×m Is a laplace matrix:
Figure BDA0002446548040000051
Figure BDA0002446548040000052
D d and D s Is a diagonal matrix:
Figure BDA0002446548040000053
Figure BDA0002446548040000054
to find F * Order-making
Figure BDA0002446548040000055
The objective function may be rewritten as:
Figure BDA0002446548040000056
Figure BDA0002446548040000057
Figure BDA0002446548040000058
I d ∈R n×n is an identity matrix, and continuously updates matrix F * Finally, a predicted drug side effect relation matrix is obtained.
The invention has the following advantages and beneficial effects:
according to the method, the medicine characteristics are described from multiple angles, and the kernel matrix is constructed by adopting four methods for the medicine and the side effect characteristics, so that the influence of characteristic deletion on a prediction result can be reduced; constructing an optimal nuclear matrix of the medicine and the side effect by adopting a self-weighting method, wherein the weight calculated by the self-weighting method can be better adapted to different nuclear matrices; the local structure of the side effect relationship of the medicine can be captured by expanding the nuclear matrix by adopting a nearest neighbor method. The invention can more accurately identify the side effects of the medicine based on the method.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention. In the drawings:
FIG. 1 is a schematic diagram of a process flow according to the present invention.
FIG. 2 is a schematic diagram of a multi-core learning model of the present invention.
Detailed Description
Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive improvements, are intended to fall within the scope of the invention.
A drug side effect identification method based on self-weighted multi-core learning, as shown in figure 1, comprises the following steps:
step 1: and (3) data acquisition: collecting information from a database;
step 2: construction of a drug core matrix and a side effect core matrix: constructing a data set representing the types of medicines, constructing a data set of the types of side effects, and constructing a relation matrix between medicines and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute core (GIP), a correlation coefficient core (Corr), a cosine similarity Core (COS) and a mutual information core (MI), and generating a core matrix of a medicine attribute space and a core matrix of a side effect attribute space according to the four kinds of similarity data obtained by calculation;
step 3: according to the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, a self-weighted multi-kernel learning objective function is established, the medicine optimal kernel matrix and the side effect optimal kernel matrix are obtained through iterative updating, the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space are expanded by a nearest neighbor method, at the moment, the objective function is minimized by using a Gao Sichang and harmonic function method, and the predicted medicine side effect relation matrix is finally obtained through continuous iterative updating.
Example 1:
and (3) data acquisition:
the data used in the technical scheme of the invention are derived from a Mizutani database. The Mizutani database collects 658 drug information, 5074 drug-protein interactions, 1368 targeting proteins, 49051 drug side effects and 1339 side effects with both targeting proteins and side effects. To encode the pharmaceutical chemical structure, a molecular fingerprint was used, which consists of 881 chemical substructures defined in the pubhem database.
Construction of drug core and side effect core:
with d= { D 1 ,d 2 ,…,d n The number of n drugs is represented by s= { S }, which is a set of n drugs 1 ,s 2 ,…,s m And m represents a set of m side effects. The n×m adjacency matrix F represents a relationship matrix between the drug and the side effects. F (F) i.j (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) is an element of the F adjacent matrix, when the drug d i Side effects s exist j When F i.j =1; otherwise, F i.j =0. For drug d i The use of side effects is expressed as
Figure BDA0002446548040000061
Is a binary vector of length m, and the value of each element in the vector is 1 or 0.
Four similarity metrics are used to construct a kernel matrix: gaussian interaction property kernel (GIP), correlation coefficient kernel (Corr), cosine similarity kernel (COS), and mutual information kernel (MI).
The Gaussian interaction attribute core is constructed according to the topological structure of a known drug-side effect network, so that nonlinear mapping can be realized, drugs are mapped into nonlinear representation, and each drug vector is high in distinguishability and specifically expressed as follows:
Figure BDA0002446548040000062
Figure BDA0002446548040000063
and->
Figure BDA0002446548040000064
Drug d indicated by side effects respectively i And drug d k Gamma represents the bandwidth of the gaussian kernel.
The correlation coefficient kernel may measure the linear relationship of the drug vector, expressed as:
Figure BDA0002446548040000065
Figure BDA0002446548040000066
denoted as->
Figure BDA0002446548040000067
And->
Figure BDA0002446548040000068
Covariance of->
Figure BDA0002446548040000069
Denoted as->
Figure BDA00024465480400000610
Variance of->
Figure BDA00024465480400000611
Denoted as->
Figure BDA00024465480400000612
Is a variance of (c).
The cosine similarity kernel regards the medicine as vector representation in m-dimensional side effect space, evaluates the similarity of two medicines by calculating the cosine value of the included angle of the two vectors, better measures the difference of two medicine variables in the side effect space, and the more consistent the direction directions of the two medicine variables are, the higher the similarity is. Expressed as:
Figure BDA0002446548040000071
the mutual information kernel can be used to measure the degree of interdependence between two discrete random variables, i.e., between two drug observable frequencies, expressed as:
Figure BDA0002446548040000072
u.epsilon.0, 1 and v.epsilon.0, 1 for the drug variable in the side effect space, 0 indicates that the drug does not have the side effect and 1 indicates that the drug does have the side effect. f (u) represents u in
Figure BDA0002446548040000073
For example, when u=1, f (u) represents 1 at the drug vector +.>
Figure BDA0002446548040000074
Is a frequency of (a) in the frequency range of (b). f (v) represents v at +.>
Figure BDA0002446548040000075
F (u, v) represents the relative observed frequency.
The above description is of the use of side effects to represent the property core of a drug, and similarly, the use of substructures to represent the property core of a drug is: k (K) GIP-chem,d 、K Corr-chem,d 、K Cos-chem,d And K MI-chem,d The method comprises the steps of carrying out a first treatment on the surface of the Using target to represent drug property core K GIP-target,d 、K Corr-target,d 、K Cos-target,d And K MI-target,d The method comprises the steps of carrying out a first treatment on the surface of the The attribute cores that use drugs to represent side effects are: k (K) GIP-link,s 、K Corr-link,s 、K Cos-link,s And K MI-link,s
Multi-core learning generates an optimal core:
as shown in the figure 2 of the drawings,
Figure BDA0002446548040000076
core representing a drug property space->
Figure BDA0002446548040000077
A kernel representing a side effect attribute space. C (C) d Representing the number of cores in the drug space, C in this scenario d =12;C s The number of nuclei representing the side effect space, C in this scenario s =4。
Taking the generation of a drug-optimal core as an example, due to the near-end of the drug-optimal core
Figure BDA0002446548040000078
For the optimal kernel to be close to each attribute kernel of the drug or side effect, the objective function of the self-weighted multi-kernel learning is as follows: />
Figure BDA0002446548040000079
Wherein omega i Representation of
Figure BDA00024465480400000710
Weight of->
Figure BDA00024465480400000711
Due to omega i Dependent on the target variable->
Figure BDA00024465480400000712
The +.>
Figure BDA00024465480400000713
Thus omega i And cannot be calculated. First give->
Figure BDA00024465480400000714
C d Indicating the number of drug cores. Obtaining omega i After the initial value, calculate->
Figure BDA00024465480400000715
ω i Along with->
Figure BDA00024465480400000716
Dynamically changing and continuously updating omega i Finally, the optimal nucleus of the medicine is obtained>
Figure BDA00024465480400000717
Obtaining the optimal core of the side effect by the same method
Figure BDA00024465480400000718
Graph-based semi-supervised learning:
semi-supervised learning can obtain a global structure of drug side effects relationships, but ignores the effects of drug-and side effects-like relationships to drug side effects recognition. Thus, the present approach extends the core matrix with nearest neighbor methods. With medicine d i Similar k neighbor drugs are denoted as N (d i ) E D, k neighbor graph N d ∈R n×m The elements are as follows:
Figure BDA0002446548040000081
N d pharmaceutical core matrix for sparsification
Figure BDA0002446548040000082
Using N d Obtaining the extended drug core matrix after sparsifying the core matrix
Figure BDA0002446548040000083
Figure BDA0002446548040000084
Wherein, represents the Hadamard product of the matrix.
For side effect information, and side effect s j Similar k neighbor side effects are denoted as N (s j ) Epsilon S, k-nearest neighbor graph N of side effects s Nuclear matrix for sparsifying side effects
Figure BDA0002446548040000085
Obtaining an expanded side effect nuclear matrix->
Figure BDA0002446548040000086
Figure BDA0002446548040000087
To find the optimal predicted drug side effect relationship matrix F * The following objective functions were minimized using the gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF) method:
Figure BDA0002446548040000088
Figure BDA0002446548040000089
Figure BDA00024465480400000810
Figure BDA00024465480400000811
tr (·) represents the trace of the matrix. μ and σ are non-negative parameters. E (E) l (F * ) Is a loss function, E d (F * ) Is a graph regularization term to the drug feature space, E s (F * ) Is a graph regularization term to the side effect feature space. F (F) train Representing a portion of the drug side effect relationship matrix for use as training data.
Figure BDA00024465480400000812
Is a diagonal matrix in which:
Figure BDA00024465480400000813
L d ∈R n×n and L s ∈R m×m Is a laplace matrix:
Figure BDA00024465480400000814
/>
Figure BDA00024465480400000815
D d and D s Is a diagonal matrix:
Figure BDA00024465480400000816
Figure BDA00024465480400000817
to find F * Order-making
Figure BDA00024465480400000818
The objective function may be rewritten as:
Figure BDA00024465480400000819
Figure BDA00024465480400000820
Figure BDA0002446548040000091
I d ∈R n×n is an identity matrix. Continuously updating matrix F * Finally, a predicted drug side effect relation matrix is obtained.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (7)

1. The drug side effect identification method based on self-weighted multi-core learning is characterized by comprising the following steps of:
step 1: and (3) data acquisition: collecting information from a database; the information collected from the database comprises drug information and drug-protein interaction information, wherein the drug information and the drug-protein interaction information simultaneously comprise targeting protein information and side effect information, and the targeting protein information and the drug side effect relationship information and the side effect information; the data is derived from a Mizutani database which collects 658 drug information, 5074 drug-protein interactions, 1368 targeting proteins, 49051 drug side effect relationships, 1339 side effects, which have both targeting proteins and side effect messages;
step 2: constructing a medicine core matrix and a side effect core matrix based on the data in the step 1: constructing a data set representing the types of medicines, constructing a data set of the types of side effects, and constructing a relation matrix between medicines and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute core (GIP), a correlation coefficient core (Corr), a cosine similarity Core (COS) and a mutual information core (MI), and generating a core matrix of a medicine attribute space and a core matrix of a side effect attribute space according to the four kinds of similarity data obtained by calculation;
step 3: according to the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, a self-weighted multi-kernel learning objective function is established, the medicine optimal kernel matrix and the side effect optimal kernel matrix are obtained through iterative updating, the kernel matrix of the medicine attribute space and the kernel matrix of the side effect attribute space are expanded by a nearest neighbor method, at the moment, the objective function is minimized by using a Gao Sichang and harmonic function method, and the predicted medicine side effect relation matrix is finally obtained through continuous iterative updating.
2. The method for identifying side effects of drugs based on self-weighted multi-core learning according to claim 1, wherein the step 2 further comprises the steps of generating four drug attribute cores using a drug-substructure relationship matrix, generating four drug attribute cores using a drug-target relationship matrix, substituting the eight drug attribute cores into the step 3 together with the four drug attribute cores generated using the drug-side effect relationship matrix, and calculating.
3. The method for identifying side effects of a drug based on self-weighted multi-core learning according to claim 1, wherein the chemical structure code of the drug adopts molecular fingerprints, and the molecular fingerprints are composed of a plurality of chemical substructures defined in a pubhem database.
4. A method for identifying side effects of drugs based on self-weighted multi-core learning according to claim 3, wherein said step 2 comprises the following detailed steps:
with d= { D 1 ,d 2 ,…,d n The number of n drugs is represented by }, the number of drugs is represented by d, and s= { S 1 ,s 2 ,…,s m -represents a collection of m side effects, s represents a side effect;
an n×m adjacency matrix F represents a relationship matrix between the drug and the side effects, F i.j (1.ltoreq.i.ltoreq.n, 1.ltoreq.j.ltoreq.m) is an element of the F adjacent matrix, when the drug d i Side effects s exist j When F i.j =1; otherwise, F i.j =0, for drug d i The side effects are denoted as F di Is a binary vector with length of m, and the value of each element in the vector is 1 or 0;
the gaussian interaction profile kernel (GIP) is specifically expressed as:
Figure FDA0004071988600000021
Figure FDA0004071988600000022
and->
Figure FDA0004071988600000023
Drug d indicated by side effects respectively i And drug d k Gamma represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
Figure FDA0004071988600000024
Figure FDA0004071988600000025
denoted as->
Figure FDA0004071988600000026
And->
Figure FDA0004071988600000027
Covariance of->
Figure FDA0004071988600000028
Denoted as->
Figure FDA0004071988600000029
Variance of->
Figure FDA00040719886000000210
Denoted as->
Figure FDA00040719886000000211
Is a variance of (2);
cosine similarity kernel (COS) is expressed as:
Figure FDA00040719886000000212
the mutual information core (MI) is expressed as:
Figure FDA00040719886000000213
u.epsilon.0.1 and v.epsilon.0.1 for the drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug has the side effect, and f (u) indicates that u is in the side effect space
Figure FDA00040719886000000214
In (c) is f (v) represents v at +.>
Figure FDA00040719886000000215
F (u, v) represents the relative observed frequency.
5. A method for identifying side effects of drugs based on self-weighted multi-core learning as claimed in claim 3, wherein said step 3 comprises the following detailed steps:
Figure FDA0004071988600000031
a kernel matrix representing a drug property space, +.>
Figure FDA0004071988600000032
A kernel matrix representing a side effect attribute space, C d Representing the number of nuclei of the drug space, C s The objective function of the self-weighted multi-kernel learning, representing the number of kernels of the side effect space, is as follows:
Figure FDA0004071988600000033
wherein omega i Representation of
Figure FDA0004071988600000034
Weight of->
Figure FDA0004071988600000035
Given->
Figure FDA0004071988600000036
Obtaining omega i After the initial value, calculate->
Figure FDA0004071988600000037
ω i Along with->
Figure FDA0004071988600000038
Dynamically changing and continuously updating omega i Finally, the optimal nucleus of the medicine is obtained>
Figure FDA0004071988600000039
Obtaining optimal core of side effect by the same learning method>
Figure FDA00040719886000000310
6. The method for identifying side effects of drugs based on self-weighted multi-core learning according to claim 3, wherein the nearest neighbor method specifically comprises: with medicine d i Similar k neighbor drugs are denoted as N (d i ) E D, k neighbor graph N d ∈R n×m The elements are as follows:
Figure FDA00040719886000000311
N d pharmaceutical core matrix for sparsification
Figure FDA00040719886000000312
Using N d Sparsifying a kernel matrixObtaining the expanded drug core matrix
Figure FDA00040719886000000313
Figure FDA00040719886000000314
Wherein, represents the Hadamard product of the matrix; for side effect information, and side effect s j Similar k neighbor side effects are denoted as N (s j ) Epsilon S, k-nearest neighbor graph N of side effects s Nuclear matrix for sparsifying side effects
Figure FDA00040719886000000315
Obtaining an extended side effect kernel matrix
Figure FDA00040719886000000316
Figure FDA00040719886000000317
/>
7. A method for identifying side effects of drugs based on self-weighted multi-kernel learning according to claim 3, characterized in that the following objective functions are minimized using gaussian field and harmonic function (Gaussian Fields and Harmonic Functions, GFHF) method:
Figure FDA00040719886000000318
Figure FDA00040719886000000319
E d (F * )=tr(F *T L d F * )
E s (F * )=tr(F * L s F *T )
tr (. Cndot.) represents the trace of the matrix, μ and σ are non-negative parameters, E l (F * ) Is a loss function, E d (F * ) Is a graph regularization term to the drug feature space, E s (F * ) Is a graph regularization term to the side effect feature space, F train Representing a portion of the drug side effect relationship matrix, used as training data,
Figure FDA0004071988600000041
is a diagonal matrix in which:
Figure FDA0004071988600000042
L d ∈R n×n and L s ∈R m×m Is a laplace matrix:
Figure FDA0004071988600000043
Figure FDA0004071988600000044
D d and D s Is a diagonal matrix:
Figure FDA0004071988600000045
Figure FDA0004071988600000046
to find F * Order-making
Figure FDA0004071988600000047
The objective function may be rewritten as:
Figure FDA0004071988600000048
Figure FDA0004071988600000049
Figure FDA00040719886000000410
I d ∈R n×n is an identity matrix, and continuously updates matrix F * Finally, a predicted drug side effect relation matrix is obtained.
CN202010280936.XA 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning Active CN111477344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010280936.XA CN111477344B (en) 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010280936.XA CN111477344B (en) 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning

Publications (2)

Publication Number Publication Date
CN111477344A CN111477344A (en) 2020-07-31
CN111477344B true CN111477344B (en) 2023-06-09

Family

ID=71751948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010280936.XA Active CN111477344B (en) 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning

Country Status (1)

Country Link
CN (1) CN111477344B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071439B (en) * 2020-08-19 2024-01-02 中南大学 Drug side effect relationship prediction method, system, computer device, and storage medium
CN112863693B (en) * 2021-02-04 2021-09-28 东北林业大学 Drug target interaction prediction method based on multi-channel graph convolution network
CN115910382A (en) * 2022-07-26 2023-04-04 苏州科技大学 Method for predicting side effects of drugs by using restricted Boltzmann machine based on penalty regular term
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks
CN116705148B (en) * 2023-07-24 2023-10-27 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method
CN117079835B (en) * 2023-08-21 2024-02-20 广东工业大学 Multi-view-based medicine-medicine interaction prediction method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027012A2 (en) * 2003-09-16 2005-03-24 Pfizer Inc. System and method for the computer-assisted identification of drugs and indications
WO2015054266A1 (en) * 2013-10-08 2015-04-16 The Regents Of The University Of California Predictive optimization of network system response
WO2016191340A1 (en) * 2015-05-22 2016-12-01 Georgetown University Discovery and analysis of drug-related side effects
CN108647484A (en) * 2018-05-17 2018-10-12 中南大学 A kind of drug relationship prediction technique integrated based on multiple information with least square method
CN110188812A (en) * 2019-05-24 2019-08-30 长沙理工大学 A kind of multicore clustering method of quick processing missing isomeric data

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130179187A1 (en) * 2012-01-06 2013-07-11 Molecular Health Systems and methods for de-risking patient treatment
EP2801047B1 (en) * 2012-01-06 2022-02-23 Molecular Health GmbH Systems and methods for multivariate analysis of adverse event data
US9530095B2 (en) * 2013-06-26 2016-12-27 International Business Machines Corporation Method and system for exploring the associations between drug side-effects and therapeutic indications
US10803144B2 (en) * 2014-05-06 2020-10-13 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
US20160092793A1 (en) * 2014-09-26 2016-03-31 Thomson Reuters Global Resources Pharmacovigilance systems and methods utilizing cascading filters and machine learning models to classify and discern pharmaceutical trends from social media posts
US11037684B2 (en) * 2014-11-14 2021-06-15 International Business Machines Corporation Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity
US20180011977A1 (en) * 2015-03-13 2018-01-11 Ubic, Inc. Data analysis system, data analysis method, and data analysis program
US10783997B2 (en) * 2016-08-26 2020-09-22 International Business Machines Corporation Personalized tolerance prediction of adverse drug events
CN106529205B (en) * 2016-11-03 2019-03-26 中南大学 It is a kind of based on drug minor structure, the drug targets Relationship Prediction method of molecule character description information
US11289178B2 (en) * 2017-04-21 2022-03-29 International Business Machines Corporation Identifying chemical substructures associated with adverse drug reactions
CN106960131A (en) * 2017-05-05 2017-07-18 华东师范大学 A kind of drug side-effect Forecasting Methodology based on multi-feature fusion
KR101953762B1 (en) * 2017-09-25 2019-03-04 (주)신테카바이오 Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data
US20190206537A1 (en) * 2018-01-04 2019-07-04 Chioma Cynthia Nwaubani Method and system for customizing, aggregating, prioritizing, and displaying medication adverse effects
US11164678B2 (en) * 2018-03-06 2021-11-02 International Business Machines Corporation Finding precise causal multi-drug-drug interactions for adverse drug reaction analysis
KR20200023689A (en) * 2018-08-20 2020-03-06 아주대학교산학협력단 The method of artificial intelligence(AI)-based adverse drug reactions detection and the system thereof
CN110246550B (en) * 2019-06-12 2022-12-06 西安电子科技大学 Drug combination prediction method based on drug similarity network data
CN110957002B (en) * 2019-12-17 2023-04-28 电子科技大学 Drug target interaction relation prediction method based on synergistic matrix decomposition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027012A2 (en) * 2003-09-16 2005-03-24 Pfizer Inc. System and method for the computer-assisted identification of drugs and indications
WO2015054266A1 (en) * 2013-10-08 2015-04-16 The Regents Of The University Of California Predictive optimization of network system response
WO2016191340A1 (en) * 2015-05-22 2016-12-01 Georgetown University Discovery and analysis of drug-related side effects
CN108647484A (en) * 2018-05-17 2018-10-12 中南大学 A kind of drug relationship prediction technique integrated based on multiple information with least square method
CN110188812A (en) * 2019-05-24 2019-08-30 长沙理工大学 A kind of multicore clustering method of quick processing missing isomeric data

Also Published As

Publication number Publication date
CN111477344A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN111477344B (en) Drug side effect identification method based on self-weighted multi-core learning
Lan et al. A survey of data mining and deep learning in bioinformatics
Askr et al. Deep learning in drug discovery: an integrative review and future challenges
Tang et al. Unified one-step multi-view spectral clustering
Bhatti et al. Deep learning with graph convolutional networks: An overview and latest applications in computational intelligence
Pashaei et al. Binary black hole algorithm for feature selection and classification on biological data
Shi et al. Graph temporal ensembling based semi-supervised convolutional neural network with noisy labels for histopathology image analysis
CN110347932B (en) Cross-network user alignment method based on deep learning
Ghadge et al. Intelligent heart attack prediction system using big data
Malondkar et al. Spark-GHSOM: growing hierarchical self-organizing map for large scale mixed attribute datasets
López-Cruz et al. Bayesian network modeling of the consensus between experts: An application to neuron classification
CN105117618B (en) It is a kind of based on the drug targets of guilt by association principle and network topology structure feature interact recognition methods
Lin et al. Patient similarity via joint embeddings of medical knowledge graph and medical entity descriptions
CN112382411A (en) Drug-protein targeting effect prediction method based on heterogeneous graph
Sarwar et al. A survey of big data analytics in healthcare
Pouyan et al. Clustering single-cell expression data using random forest graphs
Zhao et al. A multi-graph deep learning model for predicting drug-disease associations
Zhang et al. Line graph contrastive learning for link prediction
Luo et al. Towards semi-supervised universal graph classification
Lynn et al. Data independent acquisition based bi-directional deep networks for biometric ECG authentication
Ding et al. Boosting few-shot hyperspectral image classification using pseudo-label learning
Simić et al. A hybrid clustering approach for diagnosing medical diseases
Hedar et al. K-means cloning: adaptive spherical k-means clustering
Zhang et al. Domain-specific topic model for knowledge discovery in computational and data-intensive scientific communities
CN115394348A (en) IncRNA subcellular localization prediction method, equipment and medium based on graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant