CN111477344A - Drug side effect identification method based on self-weighted multi-core learning - Google Patents

Drug side effect identification method based on self-weighted multi-core learning Download PDF

Info

Publication number
CN111477344A
CN111477344A CN202010280936.XA CN202010280936A CN111477344A CN 111477344 A CN111477344 A CN 111477344A CN 202010280936 A CN202010280936 A CN 202010280936A CN 111477344 A CN111477344 A CN 111477344A
Authority
CN
China
Prior art keywords
drug
matrix
side effect
kernel
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010280936.XA
Other languages
Chinese (zh)
Other versions
CN111477344B (en
Inventor
刘勇国
李杨
杨尚明
李巧勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010280936.XA priority Critical patent/CN111477344B/en
Publication of CN111477344A publication Critical patent/CN111477344A/en
Application granted granted Critical
Publication of CN111477344B publication Critical patent/CN111477344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Software Systems (AREA)
  • Biotechnology (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Epidemiology (AREA)
  • Evolutionary Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Primary Health Care (AREA)
  • Toxicology (AREA)
  • Bioethics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a medicine side effect identification method based on self-weighting multi-core learning, which solves the problems of incomplete medicine characteristic expression and unreasonable weight distribution when weighting a kernel function in the medicine side effect identification method based on multi-core learning. The method comprises the steps of data acquisition, construction of a drug core matrix and a side effect core matrix and the like. The method describes the characteristics of the medicine from multiple angles, and adopts four methods to construct the nuclear matrix for the characteristics of the medicine and the side effects, so that the influence of characteristic loss on the prediction result can be reduced; an optimal kernel matrix of the medicine and the side effect is constructed by adopting a self-weighting method, and the weight calculated by the self-weighting method can be better adapted to different kernel matrices; the local structure of the drug side effect relationship can be captured by expanding the nuclear matrix by adopting a nearest neighbor method.

Description

Drug side effect identification method based on self-weighted multi-core learning
Technical Field
The invention relates to the field of multi-core learning, in particular to a drug side effect identification method based on self-weighting multi-core learning.
Background
In recent years, the problem of drug safety due to side effects of drugs has been receiving much attention. The side effects of the drugs become important factors for the failure of clinical trials of the drugs and are also the main problems affecting public health. There are several major studies on the side effects of drugs: the method comprises the steps of calculating the similarity between medicines and predicting medicine targets by utilizing the relation between medicines and side effects, realizing medicine relocation by utilizing the similarity of the side effects between medicines, predicting the side effects possibly caused by the medicines based on information such as chemical structures of the medicines and the like, predicting the side effects of the medicines by utilizing a disease network and the like. Identification of side effects of drugs plays an important role in the field of drug research, and timely and accurate prediction of side effects of drugs has become a hot point of research at home and abroad.
The conventional method for predicting and evaluating potential side effects of drugs generally comprises the steps of carrying out clinical experiments on patients before the drugs are marketed, and observing adverse reactions generated after the patients take the drugs.
In recent years, the accumulation of a large amount of drug side effect data provides researchers with a data source capable of exploring drug side effects from a molecular level, such as a SIDER database and the like, the development of computer technologies such as complex networks, data mining and the like provides a new idea for drug side effect identification, and more researches start to mine the corresponding relation of potential drug side effects from massive biological information data by means of a scientific calculation method.
The multi-core learning combination is suitable for the condition that the characteristics of a sample data set are irregular and heterogeneous, and has higher flexibility.
Although the side effect identification method based on the multi-core learning method has been advanced to some extent, the following problems still exist:
the existing method does not consider data sources such as incidence relation between drugs and targets, considers incomplete characteristics of drugs and side effects, cannot accurately express the drugs and the side effects, and influences prediction precision.
Most of the existing multi-core learning methods assume that the optimal kernel is a linear combination of basic kernels, and the assumption may not be true, so that the weight distribution is not appropriate, and the accuracy of the prediction result is influenced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for identifying the side effect of the medicine based on the multi-core learning has the problems of incomplete medicine characteristic expression and unreasonable weight distribution when weighting a kernel function, and the invention provides the method for identifying the side effect of the medicine based on the self-weighting multi-core learning, which solves the problems.
The invention provides a side effect identification method based on self-weighting multi-core learning, and the characteristics of the medicine are more comprehensively described from three aspects of side effects, targets and substructures. In order to capture the influence of the relationship between the medicine and the side effect similar to the medicine on the identification of the side effect of the medicine, namely the local structure of the relationship between the medicine and the side effect, a nearest neighbor method is adopted to expand a kernel matrix, and the accuracy of a prediction result is improved.
The invention is realized by the following technical scheme:
a drug side effect identification method based on self-weighted multi-core learning comprises the following steps:
step 1: data acquisition: collecting information from a database;
step 2: constructing a drug core matrix and a side effect core matrix: constructing a data set representing the drug types, constructing a data set of the side effect types, and constructing a relation matrix between the drugs and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS) and a mutual information kernel (MI), and generating a kernel matrix of a drug attribute space and a kernel matrix of a side effect attribute space according to the four kinds of similarity data obtained through calculation;
and step 3: and 2, establishing a self-weighted multi-core learning objective function according to the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, iteratively updating to obtain an optimal drug kernel matrix and an optimal side effect kernel matrix, expanding the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space by using a nearest neighbor method, minimizing the objective function by using a Gaussian field and harmonic function method, continuously iteratively updating, and finally obtaining a predicted drug side effect relationship matrix.
Further, the step 2 includes the following steps of generating four drug attribute kernels by using the drug-substructure relationship matrix, generating four drug attribute kernels by using the drug-target relationship matrix, and substituting the eight drug attribute kernels and the four drug attribute kernels generated by using the drug-side effect relationship matrix into the step 3 for calculation.
Further, in the step 1, the information collected from the database includes drug information, drug-protein interaction information, targeted protein information, drug side effect relationship information, and side effect information, which have both targeted protein and side effect information.
Further, the chemical structure coding of the drug employs a molecular fingerprint consisting of a plurality of chemical substructures defined in the PubChem database.
Further, the step 2 includes the following detailed steps:
with D ═ D1,d2,…,dnDenotes a set of n drugs, d denotes a drug, S ═ S1,s2,…,smDenotes the set of m side effects, s denotes side effect;
n × m, F, represents a relationship matrix between the drug and the side effects, Fi.j(1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m) is an element of the F adjacency matrix, when the drug diThere are side effects sjWhen F is presenti.j1 is ═ 1; otherwise, Fi.j0 for drug diThe side effects of use are expressed as
Figure BDA0002446548040000031
Is a binary vector with length m, and the value of each element in the vector is 1 or 0;
the gaussian interaction property kernel (GIP) is specifically expressed as:
Figure BDA0002446548040000032
Figure BDA0002446548040000033
and
Figure BDA0002446548040000034
respectively, the use of a drug d which is indicated by a side effectiAnd a drug dkA binary vector of (a), γ represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
Figure BDA0002446548040000035
Figure BDA0002446548040000036
is shown as
Figure BDA0002446548040000037
And
Figure BDA0002446548040000038
the covariance of (a) of (b),
Figure BDA0002446548040000039
is shown as
Figure BDA00024465480400000310
The variance of (a) is determined,
Figure BDA00024465480400000311
is shown as
Figure BDA00024465480400000312
The variance of (a);
the cosine similarity kernel (COS) is expressed as:
Figure BDA00024465480400000313
the mutual information core (MI) is represented as:
Figure BDA00024465480400000314
u ∈ 0,1 and v ∈ 0,1, for the drug variable on the side effect space, 0 means that the drug does not have this side effect,1 indicates that the drug has the side effect, and f (u) indicates that u is in
Figure BDA00024465480400000315
F (v) denotes that v is at
Figure BDA00024465480400000316
F (u, v) represents the relative observed frequency.
Further, the step 3 includes the following detailed steps:
Figure BDA00024465480400000317
a kernel matrix representing a drug property space,
Figure BDA00024465480400000318
a kernel matrix representing the side effect attribute space, CdNumber of nuclei representing drug space, CsThe number of kernels representing the side effect space, the objective function of self-weighted multi-kernel learning is as follows:
Figure BDA0002446548040000041
wherein, ω isiTo represent
Figure BDA0002446548040000042
The weight of (a) is determined,
Figure BDA0002446548040000043
given a
Figure BDA0002446548040000044
CdIndicates the number of drug nuclei, to obtain ωiAfter the initial value of (2), calculating
Figure BDA0002446548040000045
ωiWith following
Figure BDA0002446548040000046
Dynamic stateChange, continuously update omegaiFinally obtaining the drug optimum nucleus
Figure BDA0002446548040000047
Obtaining the optimal nucleus of side effect by the same learning method
Figure BDA0002446548040000048
Further, the nearest neighbor method specifically includes: and medicament diSimilar k neighbor drugs are denoted N (d)i) ∈ D, k neighbor graph Nd∈Rn×mThe middle element is set as:
Figure BDA0002446548040000049
Ndfor thinning drug core matrices
Figure BDA00024465480400000410
Using NdObtaining an extended drug core matrix after thinning the core matrix
Figure BDA00024465480400000411
Figure BDA00024465480400000412
Wherein, the Hadamard product of matrix is expressed; for side effect information, with side effect sjSimilar k neighbor side effects are denoted as N(s)j) ∈ S, k neighbor map of adverse events NsNuclear matrix for sparsifying side effects
Figure BDA00024465480400000413
Obtaining an extended side-effect kernel matrix
Figure BDA00024465480400000414
Figure BDA00024465480400000415
Further, the following objective function is minimized using a Gaussian Field and Harmonic Functions (GFHF) method:
Figure BDA00024465480400000416
Figure BDA00024465480400000417
Figure BDA00024465480400000418
Figure BDA00024465480400000419
tr (-) denotes the trace of the matrix, μ and σ are non-negative parameters, El(F*) Is a loss function, Ed(F*) Is a graph regularization term to the drug feature space, Es(F*) Is a graph regularization term to the side effect feature space, FtrainRepresenting part of the drug side effect relationship matrix, used as training data,
Figure BDA00024465480400000420
is a diagonal matrix, where:
Figure BDA00024465480400000421
Ld∈Rn×nand Ls∈Rm×mIs the laplace matrix:
Figure BDA0002446548040000051
Figure BDA0002446548040000052
Ddand DsIs a diagonal matrix:
Figure BDA0002446548040000053
Figure BDA0002446548040000054
to ask for F*Let us order
Figure BDA0002446548040000055
The objective function can be rewritten as:
Figure BDA0002446548040000056
Figure BDA0002446548040000057
Figure BDA0002446548040000058
Id∈Rn×nis a unit matrix, constantly updating a matrix F*And finally obtaining a predicted medicine side effect relation matrix.
The invention has the following advantages and beneficial effects:
the method describes the characteristics of the medicine from multiple angles, and adopts four methods to construct the nuclear matrix for the characteristics of the medicine and the side effects, so that the influence of characteristic loss on the prediction result can be reduced; an optimal kernel matrix of the medicine and the side effect is constructed by adopting a self-weighting method, and the weight calculated by the self-weighting method can be better adapted to different kernel matrices; the local structure of the drug side effect relationship can be captured by expanding the nuclear matrix by adopting a nearest neighbor method. Based on the method, the invention can more accurately identify the side effect of the medicine.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic process flow diagram of the present invention.
FIG. 2 is a diagram of a multi-core learning model according to the present invention.
Detailed Description
Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any inventive changes, are within the scope of the present invention.
A drug side effect recognition method based on self-weighted multi-core learning is disclosed, as shown in figure 1, and comprises the following steps:
step 1: data acquisition: collecting information from a database;
step 2: constructing a drug core matrix and a side effect core matrix: constructing a data set representing the drug types, constructing a data set of the side effect types, and constructing a relation matrix between the drugs and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS) and a mutual information kernel (MI), and generating a kernel matrix of a drug attribute space and a kernel matrix of a side effect attribute space according to the four kinds of similarity data obtained through calculation;
and step 3: and 2, establishing a self-weighted multi-core learning objective function according to the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, iteratively updating to obtain an optimal drug kernel matrix and an optimal side effect kernel matrix, expanding the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space by using a nearest neighbor method, minimizing the objective function by using a Gaussian field and harmonic function method, continuously iteratively updating, and finally obtaining a predicted drug side effect relationship matrix.
Example 1:
data acquisition:
the data used by the technical scheme of the invention is from a Mizutani database. The Mizutani database collects 658 drug information, 5074 drug-protein interactions, 1368 targeting proteins, 49051 drug side effect relationships, and 1339 side effects with both targeting proteins and side effect signals. To encode the chemical structure of a drug, a molecular fingerprint consisting of 881 chemical substructures defined in the PubChem database was used.
Construction of drug core and side effect core:
with D ═ D1,d2,…,dnDenotes a set of n drugs, S ═ S1,s2,…,smN × m indicates a matrix of relationships between drugs and side effectsi.j(1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m) is an element of the F adjacency matrix, when the drug diThere are side effects sjWhen F is presenti.j1 is ═ 1; otherwise, Fi.j0. For drug diThe side effects of use are expressed as
Figure BDA0002446548040000061
Is a binary vector of length m, each element in the vector having a value of 1 or 0.
The kernel matrix is constructed using four similarity measures: a gaussian interaction property kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS), and a mutual information kernel (MI).
The Gaussian interaction attribute kernel is constructed according to the topological structure of the known drug-side effect network, nonlinear mapping can be realized, drugs are mapped into nonlinear representation, and each drug vector has high distinguishability, and the specific representation is as follows:
Figure BDA0002446548040000062
Figure BDA0002446548040000063
and
Figure BDA0002446548040000064
respectively, the use of a drug d which is indicated by a side effectiAnd a drug dkAnd gamma represents the bandwidth of the gaussian kernel.
The correlation coefficient kernel can measure the linear relationship of the drug vectors, which is expressed as:
Figure BDA0002446548040000065
Figure BDA0002446548040000066
is shown as
Figure BDA0002446548040000067
And
Figure BDA0002446548040000068
the covariance of (a) of (b),
Figure BDA0002446548040000069
is shown as
Figure BDA00024465480400000610
The variance of (a) is determined,
Figure BDA00024465480400000611
is shown as
Figure BDA00024465480400000612
The variance of (c).
The cosine similarity kernel considers the medicines as vector representation on an m-dimensional side effect space, and evaluates the similarity of the two medicines by calculating the cosine value of an included angle between the two vectors, so that the difference of the directions of the two medicine variables on the side effect space is better measured, and the more consistent the directions of the two medicine variables are, the higher the similarity is. Expressed as:
Figure BDA0002446548040000071
mutual information kernels can be used to measure the degree of interdependence between two discrete random variables, i.e., the degree of interdependence between two observable frequencies of a drug, expressed as:
Figure BDA0002446548040000072
u ∈ 0,1 and v ∈ 0,1, for a drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug does have the side effect f (u) indicates that u is in
Figure BDA0002446548040000073
F (u) denotes 1 in the drug vector, e.g. when u is 1
Figure BDA0002446548040000074
Of (2) is used. (v) denotes v is
Figure BDA0002446548040000075
F (u, v) represents the relative observed frequency.
The above description uses side effects to represent the property core of a drug, and similarly, the property core of a drug represented using a substructure is: kGIP-chem,d、KCorr-chem,d、KCos-chem,dAnd KMI-chem,d(ii) a Using the target to represent the property core of a drug as KGIP-target,d、KCorr-target,d、KCos-target,dAnd KMI-target,d(ii) a The attribute cores for side effects with drugs are: kGIP-link,s、KCorr-link,s、KCos-link,sAnd KMI-link,s
Multi-kernel learning generates the optimal kernel:
as shown in figure 2 of the drawings, in which,
Figure BDA0002446548040000076
a kernel representing a space of drug properties,
Figure BDA0002446548040000077
a kernel representing a side effect attribute space. CdNumber of nuclei representing the drug space, C in the present cased=12;CsNumber of nuclei representing side effect space, C in the present cases=4。
Taking the example of generating a drug-optimized core, the approach to the final drug-optimized core
Figure BDA0002446548040000078
The weight of (c) will be higher, and in order to get the optimal kernel close to each attribute kernel of the drug or side effect, the objective function of self-weighted multi-kernel learning is as follows:
Figure BDA0002446548040000079
wherein, ω isiTo represent
Figure BDA00024465480400000710
The weight of (a) is determined,
Figure BDA00024465480400000711
due to omegaiDependent on the target variable
Figure BDA00024465480400000712
Cannot be determined directly at the beginning of the algorithm
Figure BDA00024465480400000713
Thus omegaiIt cannot be calculated. Firstly, give
Figure BDA00024465480400000714
CdIndicating the number of drug cores. To obtain omegaiAfter the initial value of (2), calculating
Figure BDA00024465480400000715
ωiWith following
Figure BDA00024465480400000716
Dynamically changing, constantly updating omegaiFinally obtaining the drug optimum nucleus
Figure BDA00024465480400000717
Obtaining nucleus with optimal side effects by the same method
Figure BDA00024465480400000718
Semi-supervised learning based on graphs:
semi-supervised learning can gain a global structure of drug side-effect relationships, but neglects the effect of drugs similar to drugs and side-effect relationships on drug side-effect recognition. Therefore, the scheme expands the kernel matrix by using a nearest neighbor method. And medicament diSimilar k neighbor drugs are denoted N (d)i) ∈ D, k neighbor graph Nd∈Rn×mThe middle element is set as:
Figure BDA0002446548040000081
Ndfor thinning drug core matrices
Figure BDA0002446548040000082
Using NdObtaining an extended drug core matrix after thinning the core matrix
Figure BDA0002446548040000083
Figure BDA0002446548040000084
Where, denotes the hadamard product of the matrix.
For side effect information, with side effect sjSimilar k neighbor side effects are denoted as N(s)j) ∈ S, k neighbor map of adverse events NsNuclear matrix for sparsifying side effects
Figure BDA0002446548040000085
Obtaining an extended side-effect kernel matrix
Figure BDA0002446548040000086
Figure BDA0002446548040000087
To find the optimal predicted drug side effect relationship matrix F*The following objective function is minimized using the Gaussian Field and Harmonic Functions (GFHF) method:
Figure BDA0002446548040000088
Figure BDA0002446548040000089
Figure BDA00024465480400000810
Figure BDA00024465480400000811
tr (-) denotes the trace of the matrix. μ and σ are non-negative parameters. El(F*) Is a loss function, Ed(F*) Is a graph regularization term to the drug feature space, Es(F*) Is a graph regularization term to the side effect feature space. FtrainAnd representing part of the drug side effect relation matrix to be used as training data.
Figure BDA00024465480400000812
Is a diagonal matrix, where:
Figure BDA00024465480400000813
Ld∈Rn×nand Ls∈Rm×mIs the laplace matrix:
Figure BDA00024465480400000814
Figure BDA00024465480400000815
Ddand DsIs a diagonal matrix:
Figure BDA00024465480400000816
Figure BDA00024465480400000817
to ask for F*Let us order
Figure BDA00024465480400000818
The objective function can be rewritten as:
Figure BDA00024465480400000819
Figure BDA00024465480400000820
Figure BDA0002446548040000091
Id∈Rn×nis an identity matrix. Constantly updating matrix F*And finally obtaining a predicted medicine side effect relation matrix.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A drug side effect identification method based on self-weighted multi-core learning is characterized by comprising the following steps:
step 1: data acquisition: collecting information from a database;
step 2: constructing a drug core matrix and a side effect core matrix: constructing a data set representing the drug types, constructing a data set of the side effect types, and constructing a relation matrix between the drugs and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS) and a mutual information kernel (MI), and generating a kernel matrix of a drug attribute space and a kernel matrix of a side effect attribute space according to the four kinds of similarity data obtained through calculation;
and step 3: and 2, establishing a self-weighted multi-core learning objective function according to the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, iteratively updating to obtain an optimal drug kernel matrix and an optimal side effect kernel matrix, expanding the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space by using a nearest neighbor method, minimizing the objective function by using a Gaussian field and harmonic function method, continuously iteratively updating, and finally obtaining a predicted drug side effect relationship matrix.
2. The method for identifying the drug side effect based on the self-weighted multi-core learning according to claim 1, wherein the step 2 further comprises the steps of generating four drug attribute kernels by using the drug-substructure relationship matrix, generating four drug attribute kernels by using the drug-target relationship matrix, and substituting the eight drug attribute kernels and the four drug attribute kernels generated by using the drug-side effect relationship matrix into the step 3 for calculation.
3. The method for identifying the side effect of the drug based on the self-weighted multi-core learning as claimed in claim 1, wherein in the step 1, the information collected from the database includes drug information having both target protein and side effect information, drug-protein interaction information, target protein information, drug side effect relationship information, and side effect information.
4. The method for identifying the side effect of the drug based on the self-weighted multi-nuclear learning as claimed in claim 3, wherein the chemical structure code of the drug adopts a molecular fingerprint, and the molecular fingerprint is composed of a plurality of chemical substructures defined in a PubChem database.
5. The method for identifying the side effect of the drug based on the self-weighted multi-core learning according to claim 4, wherein the step 2 comprises the following detailed steps:
with D ═ D1,d2,…,dnDenotes a set of n drugs, d denotes a drug, S ═ S1,s2,…,smDenotes the set of m side effects, s denotes side effect;
n × m, F, represents a relationship matrix between the drug and the side effects, Fi.j(1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m) is an element of the F adjacency matrix, when the drug diThere are side effects sjWhen F is presenti.j1 is ═ 1; otherwise, Fi.j0 for drug diThe side effects of use are expressed as
Figure FDA0002446548030000011
Is a binary vector with length m, and the value of each element in the vector is 1 or 0;
the gaussian interaction property kernel (GIP) is specifically expressed as:
Figure FDA0002446548030000012
Figure FDA0002446548030000021
and
Figure FDA0002446548030000022
respectively, the use of a drug d which is indicated by a side effectiAnd a drug dkA binary vector of (a), γ represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
Figure FDA0002446548030000023
Figure FDA0002446548030000024
is shown as
Figure FDA0002446548030000025
And
Figure FDA0002446548030000026
the covariance of (a) of (b),
Figure FDA0002446548030000027
is shown as
Figure FDA0002446548030000028
The variance of (a) is determined,
Figure FDA0002446548030000029
is shown as
Figure FDA00024465480300000210
The variance of (a);
the cosine similarity kernel (COS) is expressed as:
Figure FDA00024465480300000211
the mutual information core (MI) is represented as:
Figure FDA00024465480300000212
u ∈ 0,1 and v ∈ 0,1, for a drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug does have the side effect, f (u) indicates that u is in
Figure FDA00024465480300000213
F (v) denotes that v is at
Figure FDA00024465480300000214
F (u, v) represents the relative observed frequency.
6. The method for identifying the side effect of the drug based on the self-weighted multi-core learning as claimed in claim 4, wherein the step 3 comprises the following detailed steps:
Figure FDA00024465480300000215
a kernel matrix representing a drug property space,
Figure FDA00024465480300000216
a kernel matrix representing the side effect attribute space, CdNumber of nuclei representing drug space, CsThe number of kernels representing the side effect space, the objective function of self-weighted multi-kernel learning is as follows:
Figure FDA00024465480300000217
wherein, ω isiTo represent
Figure FDA00024465480300000218
The weight of (a) is determined,
Figure FDA00024465480300000219
given a
Figure FDA00024465480300000220
To obtain omegaiAfter the initial value of (2), calculatingωiWith following
Figure FDA00024465480300000222
Dynamically changing, constantly updating omegaiFinally obtaining the drug optimum nucleus
Figure FDA00024465480300000223
Obtaining the optimal nucleus of side effect by the same learning method
Figure FDA00024465480300000224
7. The method for identifying the side effect of the drug based on the self-weighted multi-core learning according to claim 4, wherein the nearest neighbor method specifically comprises the following steps: and medicament diSimilar k neighbor drugs are denoted N (d)i) ∈ D, k neighbor graph Nd∈Rn×mThe middle element is set as:
Figure FDA0002446548030000031
Ndfor thinning drug core matrices
Figure FDA0002446548030000032
Using NdObtaining an extended drug core matrix after thinning the core matrix
Figure FDA0002446548030000033
Figure FDA0002446548030000034
Wherein, the Hadamard product of matrix is expressed; for side effect information, the followingSide effects sjSimilar k neighbor side effects are denoted as N(s)j) ∈ S, k neighbor map of adverse events NsNuclear matrix for sparsifying side effects
Figure FDA0002446548030000035
Obtaining an extended side-effect kernel matrix
Figure FDA0002446548030000036
Figure FDA0002446548030000037
8. The method for identifying adverse drug reactions based on self-weighted multi-core learning according to claim 4, wherein the following objective function is minimized by using Gaussian Fields and Harmonic Functions (GFHF) method:
Figure FDA0002446548030000038
Figure FDA0002446548030000039
Ed(F*)=tr(F*TLdF*)
Es(F*)=tr(F*LsF*T)
tr (-) denotes the trace of the matrix, μ and σ are non-negative parameters, El(F*) Is a loss function, Ed(F*) Is a graph regularization term to the drug feature space, Es(F*) Is a graph regularization term to the side effect feature space, FtrainRepresenting part of the drug side effect relationship matrix, used as training data,
Figure FDA00024465480300000310
is toAn angle matrix, wherein:
Figure FDA00024465480300000311
Ld∈Rn×nand Ls∈Rm×mIs the laplace matrix:
Figure FDA00024465480300000312
Figure FDA00024465480300000313
Ddand DsIs a diagonal matrix:
Figure FDA00024465480300000314
Figure FDA00024465480300000315
to ask for F*Let us order
Figure FDA00024465480300000316
The objective function can be rewritten as:
Figure FDA00024465480300000317
Figure FDA0002446548030000041
Figure FDA0002446548030000042
Id∈Rn×nis a unit matrix, constantly updating a matrix F*Finally obtaining the predicted medicine side effectAnd (4) an action relation matrix.
CN202010280936.XA 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning Active CN111477344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010280936.XA CN111477344B (en) 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010280936.XA CN111477344B (en) 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning

Publications (2)

Publication Number Publication Date
CN111477344A true CN111477344A (en) 2020-07-31
CN111477344B CN111477344B (en) 2023-06-09

Family

ID=71751948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010280936.XA Active CN111477344B (en) 2020-04-10 2020-04-10 Drug side effect identification method based on self-weighted multi-core learning

Country Status (1)

Country Link
CN (1) CN111477344B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071439A (en) * 2020-08-19 2020-12-11 中南大学 Method, system, computer device and storage medium for predicting side effect relationship of drug
CN112863693A (en) * 2021-02-04 2021-05-28 东北林业大学 Drug target interaction prediction method based on multi-channel graph convolution network
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks
CN116705148A (en) * 2023-07-24 2023-09-05 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method
CN117079835A (en) * 2023-08-21 2023-11-17 广东工业大学 Multi-view-based medicine-medicine interaction prediction method and system
WO2024021368A1 (en) * 2022-07-26 2024-02-01 苏州科技大学 Drug side effect prediction method based on restricted boltzmann machine with penalty regularization term

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027012A2 (en) * 2003-09-16 2005-03-24 Pfizer Inc. System and method for the computer-assisted identification of drugs and indications
US20130179181A1 (en) * 2012-01-06 2013-07-11 Molecular Health Systems and methods for personalized de-risking based on patient genome data
EP2801047A2 (en) * 2012-01-06 2014-11-12 Molecular Health AG Systems and methods for multivariate analysis of adverse event data
US20150006438A1 (en) * 2013-06-26 2015-01-01 International Business Machines Corporation Method and system for exploring the associations between drug side-effects and therapeutic indications
WO2015054266A1 (en) * 2013-10-08 2015-04-16 The Regents Of The University Of California Predictive optimization of network system response
US20150324693A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
US20160092793A1 (en) * 2014-09-26 2016-03-31 Thomson Reuters Global Resources Pharmacovigilance systems and methods utilizing cascading filters and machine learning models to classify and discern pharmaceutical trends from social media posts
US20160140312A1 (en) * 2014-11-14 2016-05-19 International Business Machines Corporation Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity
WO2016191340A1 (en) * 2015-05-22 2016-12-01 Georgetown University Discovery and analysis of drug-related side effects
CN106529205A (en) * 2016-11-03 2017-03-22 中南大学 Drug target relation prediction method based on drug substructure and molecule character description information
CN106960131A (en) * 2017-05-05 2017-07-18 华东师范大学 A kind of drug side-effect Forecasting Methodology based on multi-feature fusion
US20180011977A1 (en) * 2015-03-13 2018-01-11 Ubic, Inc. Data analysis system, data analysis method, and data analysis program
US20180060508A1 (en) * 2016-08-26 2018-03-01 International Business Machines Corporation Personalized tolerance prediction of adverse drug events
CN108647484A (en) * 2018-05-17 2018-10-12 中南大学 A kind of drug relationship prediction technique integrated based on multiple information with least square method
US20180307804A1 (en) * 2017-04-21 2018-10-25 International Business Machines Corporation Identifying chemical substructures associated with adverse drug reactions
KR101953762B1 (en) * 2017-09-25 2019-03-04 (주)신테카바이오 Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data
US20190206537A1 (en) * 2018-01-04 2019-07-04 Chioma Cynthia Nwaubani Method and system for customizing, aggregating, prioritizing, and displaying medication adverse effects
CN110188812A (en) * 2019-05-24 2019-08-30 长沙理工大学 A kind of multicore clustering method of quick processing missing isomeric data
US20190279775A1 (en) * 2018-03-06 2019-09-12 International Business Machines Corporation Finding Precise Causal Multi-Drug-Drug Interactions for Adverse Drug Reaction Analysis
CN110246550A (en) * 2019-06-12 2019-09-17 西安电子科技大学 Pharmaceutical composition prediction technique based on drug similitude network data
KR20200023689A (en) * 2018-08-20 2020-03-06 아주대학교산학협력단 The method of artificial intelligence(AI)-based adverse drug reactions detection and the system thereof
CN110957002A (en) * 2019-12-17 2020-04-03 电子科技大学 Drug target interaction relation prediction method based on collaborative matrix decomposition

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005027012A2 (en) * 2003-09-16 2005-03-24 Pfizer Inc. System and method for the computer-assisted identification of drugs and indications
US20130179181A1 (en) * 2012-01-06 2013-07-11 Molecular Health Systems and methods for personalized de-risking based on patient genome data
EP2801047A2 (en) * 2012-01-06 2014-11-12 Molecular Health AG Systems and methods for multivariate analysis of adverse event data
US20150006438A1 (en) * 2013-06-26 2015-01-01 International Business Machines Corporation Method and system for exploring the associations between drug side-effects and therapeutic indications
WO2015054266A1 (en) * 2013-10-08 2015-04-16 The Regents Of The University Of California Predictive optimization of network system response
US20150324693A1 (en) * 2014-05-06 2015-11-12 International Business Machines Corporation Predicting drug-drug interactions based on clinical side effects
US20160092793A1 (en) * 2014-09-26 2016-03-31 Thomson Reuters Global Resources Pharmacovigilance systems and methods utilizing cascading filters and machine learning models to classify and discern pharmaceutical trends from social media posts
US20160140312A1 (en) * 2014-11-14 2016-05-19 International Business Machines Corporation Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity
US20180011977A1 (en) * 2015-03-13 2018-01-11 Ubic, Inc. Data analysis system, data analysis method, and data analysis program
WO2016191340A1 (en) * 2015-05-22 2016-12-01 Georgetown University Discovery and analysis of drug-related side effects
US20180166175A1 (en) * 2015-05-22 2018-06-14 Georgetown University Discovery and analysis of drug-related side effects
US20180060508A1 (en) * 2016-08-26 2018-03-01 International Business Machines Corporation Personalized tolerance prediction of adverse drug events
CN106529205A (en) * 2016-11-03 2017-03-22 中南大学 Drug target relation prediction method based on drug substructure and molecule character description information
US20180307804A1 (en) * 2017-04-21 2018-10-25 International Business Machines Corporation Identifying chemical substructures associated with adverse drug reactions
CN106960131A (en) * 2017-05-05 2017-07-18 华东师范大学 A kind of drug side-effect Forecasting Methodology based on multi-feature fusion
KR101953762B1 (en) * 2017-09-25 2019-03-04 (주)신테카바이오 Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data
US20190206537A1 (en) * 2018-01-04 2019-07-04 Chioma Cynthia Nwaubani Method and system for customizing, aggregating, prioritizing, and displaying medication adverse effects
US20190279775A1 (en) * 2018-03-06 2019-09-12 International Business Machines Corporation Finding Precise Causal Multi-Drug-Drug Interactions for Adverse Drug Reaction Analysis
CN108647484A (en) * 2018-05-17 2018-10-12 中南大学 A kind of drug relationship prediction technique integrated based on multiple information with least square method
KR20200023689A (en) * 2018-08-20 2020-03-06 아주대학교산학협력단 The method of artificial intelligence(AI)-based adverse drug reactions detection and the system thereof
CN110188812A (en) * 2019-05-24 2019-08-30 长沙理工大学 A kind of multicore clustering method of quick processing missing isomeric data
CN110246550A (en) * 2019-06-12 2019-09-17 西安电子科技大学 Pharmaceutical composition prediction technique based on drug similitude network data
CN110957002A (en) * 2019-12-17 2020-04-03 电子科技大学 Drug target interaction relation prediction method based on collaborative matrix decomposition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范馨月等: "基于文本药物副作用知识发现研究" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112071439A (en) * 2020-08-19 2020-12-11 中南大学 Method, system, computer device and storage medium for predicting side effect relationship of drug
CN112071439B (en) * 2020-08-19 2024-01-02 中南大学 Drug side effect relationship prediction method, system, computer device, and storage medium
CN112863693A (en) * 2021-02-04 2021-05-28 东北林业大学 Drug target interaction prediction method based on multi-channel graph convolution network
WO2024021368A1 (en) * 2022-07-26 2024-02-01 苏州科技大学 Drug side effect prediction method based on restricted boltzmann machine with penalty regularization term
CN116504331A (en) * 2023-04-28 2023-07-28 东北林业大学 Frequency score prediction method for drug side effects based on multiple modes and multiple tasks
CN116705148A (en) * 2023-07-24 2023-09-05 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method
CN116705148B (en) * 2023-07-24 2023-10-27 中国人民解放军总医院 Antiviral drug screening method and system based on Laplace least square method
CN117079835A (en) * 2023-08-21 2023-11-17 广东工业大学 Multi-view-based medicine-medicine interaction prediction method and system
CN117079835B (en) * 2023-08-21 2024-02-20 广东工业大学 Multi-view-based medicine-medicine interaction prediction method and system

Also Published As

Publication number Publication date
CN111477344B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN111477344A (en) Drug side effect identification method based on self-weighted multi-core learning
Ghazal et al. Feature optimization and identification of ovarian cancer using internet of medical things
Ravì et al. Deep learning for health informatics
US7062508B2 (en) Method and computer-based system for non-probabilistic hypothesis generation and verification
Bahi et al. Deep learning for ligand-based virtual screening in drug discovery
CN108198621A (en) A kind of database data synthesis dicision of diagnosis and treatment method based on neural network
Lv et al. Meta learning with graph attention networks for low-data drug discovery
Qian et al. Identification of drug-side effect association via restricted Boltzmann machines with penalized term
Song et al. DNMG: Deep molecular generative model by fusion of 3D information for de novo drug design
Urteaga et al. A machine learning model for the prognosis of pulseless electrical activity during out-of-hospital cardiac arrest
Wagstaff et al. Multiple-instance regression with structured data
Wang et al. scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data
CN115169067A (en) Brain network model construction method and device, electronic equipment and medium
Lee et al. Benchmarking community detection methods on social media data
Yao et al. Chemical property relation guided few-shot molecular property prediction
WO2023240720A1 (en) Drug screening model construction method and apparatus, screening method, device, and medium
Wang et al. Medical tumor image classification based on Few-shot learning
Wei et al. Shape description and recognition method inspired by the primary visual cortex
Gundogdu et al. SigPrimedNet: a signaling-informed neural network for scRNA-seq annotation of known and unknown cell types
Dash et al. Handbook of research on computational intelligence applications in bioinformatics
Xie et al. A deep learning approach based on feature reconstruction and multi-dimensional attention mechanism for drug-drug interaction prediction
Jiji et al. Decision support techniques for dermatology using case-based reasoning
Zheng et al. Clustering the prevalence of pediatric chronic conditions in the United States using distributed computing
Chen et al. [Retracted] Intelligent Fuzzy Optimization Algorithm for Data Set Information Clustering Patterns Based on Data Mining and IoT
Krenek et al. Artificial neural networks in biomedicine applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant