CN111477344A - Drug side effect identification method based on self-weighted multi-core learning - Google Patents
Drug side effect identification method based on self-weighted multi-core learning Download PDFInfo
- Publication number
- CN111477344A CN111477344A CN202010280936.XA CN202010280936A CN111477344A CN 111477344 A CN111477344 A CN 111477344A CN 202010280936 A CN202010280936 A CN 202010280936A CN 111477344 A CN111477344 A CN 111477344A
- Authority
- CN
- China
- Prior art keywords
- drug
- matrix
- side effect
- kernel
- core
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/90—Programming languages; Computing architectures; Database systems; Data warehousing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Software Systems (AREA)
- Biotechnology (AREA)
- Public Health (AREA)
- Biophysics (AREA)
- Epidemiology (AREA)
- Evolutionary Biology (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Primary Health Care (AREA)
- Toxicology (AREA)
- Bioethics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a medicine side effect identification method based on self-weighting multi-core learning, which solves the problems of incomplete medicine characteristic expression and unreasonable weight distribution when weighting a kernel function in the medicine side effect identification method based on multi-core learning. The method comprises the steps of data acquisition, construction of a drug core matrix and a side effect core matrix and the like. The method describes the characteristics of the medicine from multiple angles, and adopts four methods to construct the nuclear matrix for the characteristics of the medicine and the side effects, so that the influence of characteristic loss on the prediction result can be reduced; an optimal kernel matrix of the medicine and the side effect is constructed by adopting a self-weighting method, and the weight calculated by the self-weighting method can be better adapted to different kernel matrices; the local structure of the drug side effect relationship can be captured by expanding the nuclear matrix by adopting a nearest neighbor method.
Description
Technical Field
The invention relates to the field of multi-core learning, in particular to a drug side effect identification method based on self-weighting multi-core learning.
Background
In recent years, the problem of drug safety due to side effects of drugs has been receiving much attention. The side effects of the drugs become important factors for the failure of clinical trials of the drugs and are also the main problems affecting public health. There are several major studies on the side effects of drugs: the method comprises the steps of calculating the similarity between medicines and predicting medicine targets by utilizing the relation between medicines and side effects, realizing medicine relocation by utilizing the similarity of the side effects between medicines, predicting the side effects possibly caused by the medicines based on information such as chemical structures of the medicines and the like, predicting the side effects of the medicines by utilizing a disease network and the like. Identification of side effects of drugs plays an important role in the field of drug research, and timely and accurate prediction of side effects of drugs has become a hot point of research at home and abroad.
The conventional method for predicting and evaluating potential side effects of drugs generally comprises the steps of carrying out clinical experiments on patients before the drugs are marketed, and observing adverse reactions generated after the patients take the drugs.
In recent years, the accumulation of a large amount of drug side effect data provides researchers with a data source capable of exploring drug side effects from a molecular level, such as a SIDER database and the like, the development of computer technologies such as complex networks, data mining and the like provides a new idea for drug side effect identification, and more researches start to mine the corresponding relation of potential drug side effects from massive biological information data by means of a scientific calculation method.
The multi-core learning combination is suitable for the condition that the characteristics of a sample data set are irregular and heterogeneous, and has higher flexibility.
Although the side effect identification method based on the multi-core learning method has been advanced to some extent, the following problems still exist:
the existing method does not consider data sources such as incidence relation between drugs and targets, considers incomplete characteristics of drugs and side effects, cannot accurately express the drugs and the side effects, and influences prediction precision.
Most of the existing multi-core learning methods assume that the optimal kernel is a linear combination of basic kernels, and the assumption may not be true, so that the weight distribution is not appropriate, and the accuracy of the prediction result is influenced.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for identifying the side effect of the medicine based on the multi-core learning has the problems of incomplete medicine characteristic expression and unreasonable weight distribution when weighting a kernel function, and the invention provides the method for identifying the side effect of the medicine based on the self-weighting multi-core learning, which solves the problems.
The invention provides a side effect identification method based on self-weighting multi-core learning, and the characteristics of the medicine are more comprehensively described from three aspects of side effects, targets and substructures. In order to capture the influence of the relationship between the medicine and the side effect similar to the medicine on the identification of the side effect of the medicine, namely the local structure of the relationship between the medicine and the side effect, a nearest neighbor method is adopted to expand a kernel matrix, and the accuracy of a prediction result is improved.
The invention is realized by the following technical scheme:
a drug side effect identification method based on self-weighted multi-core learning comprises the following steps:
step 1: data acquisition: collecting information from a database;
step 2: constructing a drug core matrix and a side effect core matrix: constructing a data set representing the drug types, constructing a data set of the side effect types, and constructing a relation matrix between the drugs and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS) and a mutual information kernel (MI), and generating a kernel matrix of a drug attribute space and a kernel matrix of a side effect attribute space according to the four kinds of similarity data obtained through calculation;
and step 3: and 2, establishing a self-weighted multi-core learning objective function according to the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, iteratively updating to obtain an optimal drug kernel matrix and an optimal side effect kernel matrix, expanding the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space by using a nearest neighbor method, minimizing the objective function by using a Gaussian field and harmonic function method, continuously iteratively updating, and finally obtaining a predicted drug side effect relationship matrix.
Further, the step 2 includes the following steps of generating four drug attribute kernels by using the drug-substructure relationship matrix, generating four drug attribute kernels by using the drug-target relationship matrix, and substituting the eight drug attribute kernels and the four drug attribute kernels generated by using the drug-side effect relationship matrix into the step 3 for calculation.
Further, in the step 1, the information collected from the database includes drug information, drug-protein interaction information, targeted protein information, drug side effect relationship information, and side effect information, which have both targeted protein and side effect information.
Further, the chemical structure coding of the drug employs a molecular fingerprint consisting of a plurality of chemical substructures defined in the PubChem database.
Further, the step 2 includes the following detailed steps:
with D ═ D1,d2,…,dnDenotes a set of n drugs, d denotes a drug, S ═ S1,s2,…,smDenotes the set of m side effects, s denotes side effect;
n × m, F, represents a relationship matrix between the drug and the side effects, Fi.j(1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m) is an element of the F adjacency matrix, when the drug diThere are side effects sjWhen F is presenti.j1 is ═ 1; otherwise, Fi.j0 for drug diThe side effects of use are expressed asIs a binary vector with length m, and the value of each element in the vector is 1 or 0;
the gaussian interaction property kernel (GIP) is specifically expressed as:
andrespectively, the use of a drug d which is indicated by a side effectiAnd a drug dkA binary vector of (a), γ represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
is shown asAndthe covariance of (a) of (b),is shown asThe variance of (a) is determined,is shown asThe variance of (a);
the cosine similarity kernel (COS) is expressed as:
the mutual information core (MI) is represented as:
u ∈ 0,1 and v ∈ 0,1, for the drug variable on the side effect space, 0 means that the drug does not have this side effect,1 indicates that the drug has the side effect, and f (u) indicates that u is inF (v) denotes that v is atF (u, v) represents the relative observed frequency.
Further, the step 3 includes the following detailed steps:
a kernel matrix representing a drug property space,a kernel matrix representing the side effect attribute space, CdNumber of nuclei representing drug space, CsThe number of kernels representing the side effect space, the objective function of self-weighted multi-kernel learning is as follows:
wherein, ω isiTo representThe weight of (a) is determined,given aCdIndicates the number of drug nuclei, to obtain ωiAfter the initial value of (2), calculatingωiWith followingDynamic stateChange, continuously update omegaiFinally obtaining the drug optimum nucleusObtaining the optimal nucleus of side effect by the same learning method
Further, the nearest neighbor method specifically includes: and medicament diSimilar k neighbor drugs are denoted N (d)i) ∈ D, k neighbor graph Nd∈Rn×mThe middle element is set as:
Ndfor thinning drug core matricesUsing NdObtaining an extended drug core matrix after thinning the core matrix
Wherein, the Hadamard product of matrix is expressed; for side effect information, with side effect sjSimilar k neighbor side effects are denoted as N(s)j) ∈ S, k neighbor map of adverse events NsNuclear matrix for sparsifying side effectsObtaining an extended side-effect kernel matrix
Further, the following objective function is minimized using a Gaussian Field and Harmonic Functions (GFHF) method:
tr (-) denotes the trace of the matrix, μ and σ are non-negative parameters, El(F*) Is a loss function, Ed(F*) Is a graph regularization term to the drug feature space, Es(F*) Is a graph regularization term to the side effect feature space, FtrainRepresenting part of the drug side effect relationship matrix, used as training data,is a diagonal matrix, where:
Ld∈Rn×nand Ls∈Rm×mIs the laplace matrix:
Ddand DsIs a diagonal matrix:
Id∈Rn×nis a unit matrix, constantly updating a matrix F*And finally obtaining a predicted medicine side effect relation matrix.
The invention has the following advantages and beneficial effects:
the method describes the characteristics of the medicine from multiple angles, and adopts four methods to construct the nuclear matrix for the characteristics of the medicine and the side effects, so that the influence of characteristic loss on the prediction result can be reduced; an optimal kernel matrix of the medicine and the side effect is constructed by adopting a self-weighting method, and the weight calculated by the self-weighting method can be better adapted to different kernel matrices; the local structure of the drug side effect relationship can be captured by expanding the nuclear matrix by adopting a nearest neighbor method. Based on the method, the invention can more accurately identify the side effect of the medicine.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a schematic process flow diagram of the present invention.
FIG. 2 is a diagram of a multi-core learning model according to the present invention.
Detailed Description
Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangements of components set forth in the following description or illustrated in the drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any inventive changes, are within the scope of the present invention.
A drug side effect recognition method based on self-weighted multi-core learning is disclosed, as shown in figure 1, and comprises the following steps:
step 1: data acquisition: collecting information from a database;
step 2: constructing a drug core matrix and a side effect core matrix: constructing a data set representing the drug types, constructing a data set of the side effect types, and constructing a relation matrix between the drugs and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS) and a mutual information kernel (MI), and generating a kernel matrix of a drug attribute space and a kernel matrix of a side effect attribute space according to the four kinds of similarity data obtained through calculation;
and step 3: and 2, establishing a self-weighted multi-core learning objective function according to the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, iteratively updating to obtain an optimal drug kernel matrix and an optimal side effect kernel matrix, expanding the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space by using a nearest neighbor method, minimizing the objective function by using a Gaussian field and harmonic function method, continuously iteratively updating, and finally obtaining a predicted drug side effect relationship matrix.
Example 1:
data acquisition:
the data used by the technical scheme of the invention is from a Mizutani database. The Mizutani database collects 658 drug information, 5074 drug-protein interactions, 1368 targeting proteins, 49051 drug side effect relationships, and 1339 side effects with both targeting proteins and side effect signals. To encode the chemical structure of a drug, a molecular fingerprint consisting of 881 chemical substructures defined in the PubChem database was used.
Construction of drug core and side effect core:
with D ═ D1,d2,…,dnDenotes a set of n drugs, S ═ S1,s2,…,smN × m indicates a matrix of relationships between drugs and side effectsi.j(1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m) is an element of the F adjacency matrix, when the drug diThere are side effects sjWhen F is presenti.j1 is ═ 1; otherwise, Fi.j0. For drug diThe side effects of use are expressed asIs a binary vector of length m, each element in the vector having a value of 1 or 0.
The kernel matrix is constructed using four similarity measures: a gaussian interaction property kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS), and a mutual information kernel (MI).
The Gaussian interaction attribute kernel is constructed according to the topological structure of the known drug-side effect network, nonlinear mapping can be realized, drugs are mapped into nonlinear representation, and each drug vector has high distinguishability, and the specific representation is as follows:
andrespectively, the use of a drug d which is indicated by a side effectiAnd a drug dkAnd gamma represents the bandwidth of the gaussian kernel.
The correlation coefficient kernel can measure the linear relationship of the drug vectors, which is expressed as:
is shown asAndthe covariance of (a) of (b),is shown asThe variance of (a) is determined,is shown asThe variance of (c).
The cosine similarity kernel considers the medicines as vector representation on an m-dimensional side effect space, and evaluates the similarity of the two medicines by calculating the cosine value of an included angle between the two vectors, so that the difference of the directions of the two medicine variables on the side effect space is better measured, and the more consistent the directions of the two medicine variables are, the higher the similarity is. Expressed as:
mutual information kernels can be used to measure the degree of interdependence between two discrete random variables, i.e., the degree of interdependence between two observable frequencies of a drug, expressed as:
u ∈ 0,1 and v ∈ 0,1, for a drug variable in the side effect space, 0 indicates that the drug does not have the side effect, 1 indicates that the drug does have the side effect f (u) indicates that u is inF (u) denotes 1 in the drug vector, e.g. when u is 1Of (2) is used. (v) denotes v isF (u, v) represents the relative observed frequency.
The above description uses side effects to represent the property core of a drug, and similarly, the property core of a drug represented using a substructure is: kGIP-chem,d、KCorr-chem,d、KCos-chem,dAnd KMI-chem,d(ii) a Using the target to represent the property core of a drug as KGIP-target,d、KCorr-target,d、KCos-target,dAnd KMI-target,d(ii) a The attribute cores for side effects with drugs are: kGIP-link,s、KCorr-link,s、KCos-link,sAnd KMI-link,s。
Multi-kernel learning generates the optimal kernel:
as shown in figure 2 of the drawings, in which,a kernel representing a space of drug properties,a kernel representing a side effect attribute space. CdNumber of nuclei representing the drug space, C in the present cased=12;CsNumber of nuclei representing side effect space, C in the present cases=4。
Taking the example of generating a drug-optimized core, the approach to the final drug-optimized coreThe weight of (c) will be higher, and in order to get the optimal kernel close to each attribute kernel of the drug or side effect, the objective function of self-weighted multi-kernel learning is as follows:
wherein, ω isiTo representThe weight of (a) is determined,due to omegaiDependent on the target variableCannot be determined directly at the beginning of the algorithmThus omegaiIt cannot be calculated. Firstly, giveCdIndicating the number of drug cores. To obtain omegaiAfter the initial value of (2), calculatingωiWith followingDynamically changing, constantly updating omegaiFinally obtaining the drug optimum nucleus
Semi-supervised learning based on graphs:
semi-supervised learning can gain a global structure of drug side-effect relationships, but neglects the effect of drugs similar to drugs and side-effect relationships on drug side-effect recognition. Therefore, the scheme expands the kernel matrix by using a nearest neighbor method. And medicament diSimilar k neighbor drugs are denoted N (d)i) ∈ D, k neighbor graph Nd∈Rn×mThe middle element is set as:
Ndfor thinning drug core matricesUsing NdObtaining an extended drug core matrix after thinning the core matrix
Where, denotes the hadamard product of the matrix.
For side effect information, with side effect sjSimilar k neighbor side effects are denoted as N(s)j) ∈ S, k neighbor map of adverse events NsNuclear matrix for sparsifying side effectsObtaining an extended side-effect kernel matrix
To find the optimal predicted drug side effect relationship matrix F*The following objective function is minimized using the Gaussian Field and Harmonic Functions (GFHF) method:
tr (-) denotes the trace of the matrix. μ and σ are non-negative parameters. El(F*) Is a loss function, Ed(F*) Is a graph regularization term to the drug feature space, Es(F*) Is a graph regularization term to the side effect feature space. FtrainAnd representing part of the drug side effect relation matrix to be used as training data.Is a diagonal matrix, where:
Ld∈Rn×nand Ls∈Rm×mIs the laplace matrix:
Ddand DsIs a diagonal matrix:
Id∈Rn×nis an identity matrix. Constantly updating matrix F*And finally obtaining a predicted medicine side effect relation matrix.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (8)
1. A drug side effect identification method based on self-weighted multi-core learning is characterized by comprising the following steps:
step 1: data acquisition: collecting information from a database;
step 2: constructing a drug core matrix and a side effect core matrix: constructing a data set representing the drug types, constructing a data set of the side effect types, and constructing a relation matrix between the drugs and the side effects;
calculating four kinds of similarity data of the relation matrix, wherein the four kinds of similarity data are a Gaussian interaction attribute kernel (GIP), a correlation coefficient kernel (Corr), a cosine similarity kernel (COS) and a mutual information kernel (MI), and generating a kernel matrix of a drug attribute space and a kernel matrix of a side effect attribute space according to the four kinds of similarity data obtained through calculation;
and step 3: and 2, establishing a self-weighted multi-core learning objective function according to the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space obtained in the step 2, iteratively updating to obtain an optimal drug kernel matrix and an optimal side effect kernel matrix, expanding the kernel matrix of the drug attribute space and the kernel matrix of the side effect attribute space by using a nearest neighbor method, minimizing the objective function by using a Gaussian field and harmonic function method, continuously iteratively updating, and finally obtaining a predicted drug side effect relationship matrix.
2. The method for identifying the drug side effect based on the self-weighted multi-core learning according to claim 1, wherein the step 2 further comprises the steps of generating four drug attribute kernels by using the drug-substructure relationship matrix, generating four drug attribute kernels by using the drug-target relationship matrix, and substituting the eight drug attribute kernels and the four drug attribute kernels generated by using the drug-side effect relationship matrix into the step 3 for calculation.
3. The method for identifying the side effect of the drug based on the self-weighted multi-core learning as claimed in claim 1, wherein in the step 1, the information collected from the database includes drug information having both target protein and side effect information, drug-protein interaction information, target protein information, drug side effect relationship information, and side effect information.
4. The method for identifying the side effect of the drug based on the self-weighted multi-nuclear learning as claimed in claim 3, wherein the chemical structure code of the drug adopts a molecular fingerprint, and the molecular fingerprint is composed of a plurality of chemical substructures defined in a PubChem database.
5. The method for identifying the side effect of the drug based on the self-weighted multi-core learning according to claim 4, wherein the step 2 comprises the following detailed steps:
with D ═ D1,d2,…,dnDenotes a set of n drugs, d denotes a drug, S ═ S1,s2,…,smDenotes the set of m side effects, s denotes side effect;
n × m, F, represents a relationship matrix between the drug and the side effects, Fi.j(1. ltoreq. i.ltoreq.n, 1. ltoreq. j.ltoreq.m) is an element of the F adjacency matrix, when the drug diThere are side effects sjWhen F is presenti.j1 is ═ 1; otherwise, Fi.j0 for drug diThe side effects of use are expressed asIs a binary vector with length m, and the value of each element in the vector is 1 or 0;
the gaussian interaction property kernel (GIP) is specifically expressed as:
andrespectively, the use of a drug d which is indicated by a side effectiAnd a drug dkA binary vector of (a), γ represents the bandwidth of the gaussian kernel;
the correlation coefficient kernel (Corr) is expressed as:
is shown asAndthe covariance of (a) of (b),is shown asThe variance of (a) is determined,is shown asThe variance of (a);
the cosine similarity kernel (COS) is expressed as:
the mutual information core (MI) is represented as:
6. The method for identifying the side effect of the drug based on the self-weighted multi-core learning as claimed in claim 4, wherein the step 3 comprises the following detailed steps:
a kernel matrix representing a drug property space,a kernel matrix representing the side effect attribute space, CdNumber of nuclei representing drug space, CsThe number of kernels representing the side effect space, the objective function of self-weighted multi-kernel learning is as follows:
wherein, ω isiTo representThe weight of (a) is determined,given aTo obtain omegaiAfter the initial value of (2), calculatingωiWith followingDynamically changing, constantly updating omegaiFinally obtaining the drug optimum nucleusObtaining the optimal nucleus of side effect by the same learning method
7. The method for identifying the side effect of the drug based on the self-weighted multi-core learning according to claim 4, wherein the nearest neighbor method specifically comprises the following steps: and medicament diSimilar k neighbor drugs are denoted N (d)i) ∈ D, k neighbor graph Nd∈Rn×mThe middle element is set as:
Ndfor thinning drug core matricesUsing NdObtaining an extended drug core matrix after thinning the core matrix
Wherein, the Hadamard product of matrix is expressed; for side effect information, the followingSide effects sjSimilar k neighbor side effects are denoted as N(s)j) ∈ S, k neighbor map of adverse events NsNuclear matrix for sparsifying side effectsObtaining an extended side-effect kernel matrix
8. The method for identifying adverse drug reactions based on self-weighted multi-core learning according to claim 4, wherein the following objective function is minimized by using Gaussian Fields and Harmonic Functions (GFHF) method:
Ed(F*)=tr(F*TLdF*)
Es(F*)=tr(F*LsF*T)
tr (-) denotes the trace of the matrix, μ and σ are non-negative parameters, El(F*) Is a loss function, Ed(F*) Is a graph regularization term to the drug feature space, Es(F*) Is a graph regularization term to the side effect feature space, FtrainRepresenting part of the drug side effect relationship matrix, used as training data,is toAn angle matrix, wherein:
Ld∈Rn×nand Ls∈Rm×mIs the laplace matrix:
Ddand DsIs a diagonal matrix:
Id∈Rn×nis a unit matrix, constantly updating a matrix F*Finally obtaining the predicted medicine side effectAnd (4) an action relation matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010280936.XA CN111477344B (en) | 2020-04-10 | 2020-04-10 | Drug side effect identification method based on self-weighted multi-core learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010280936.XA CN111477344B (en) | 2020-04-10 | 2020-04-10 | Drug side effect identification method based on self-weighted multi-core learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111477344A true CN111477344A (en) | 2020-07-31 |
CN111477344B CN111477344B (en) | 2023-06-09 |
Family
ID=71751948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010280936.XA Active CN111477344B (en) | 2020-04-10 | 2020-04-10 | Drug side effect identification method based on self-weighted multi-core learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111477344B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071439A (en) * | 2020-08-19 | 2020-12-11 | 中南大学 | Method, system, computer device and storage medium for predicting side effect relationship of drug |
CN112863693A (en) * | 2021-02-04 | 2021-05-28 | 东北林业大学 | Drug target interaction prediction method based on multi-channel graph convolution network |
CN116504331A (en) * | 2023-04-28 | 2023-07-28 | 东北林业大学 | Frequency score prediction method for drug side effects based on multiple modes and multiple tasks |
CN116705148A (en) * | 2023-07-24 | 2023-09-05 | 中国人民解放军总医院 | Antiviral drug screening method and system based on Laplace least square method |
CN117079835A (en) * | 2023-08-21 | 2023-11-17 | 广东工业大学 | Multi-view-based medicine-medicine interaction prediction method and system |
WO2024021368A1 (en) * | 2022-07-26 | 2024-02-01 | 苏州科技大学 | Drug side effect prediction method based on restricted boltzmann machine with penalty regularization term |
Citations (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005027012A2 (en) * | 2003-09-16 | 2005-03-24 | Pfizer Inc. | System and method for the computer-assisted identification of drugs and indications |
US20130179181A1 (en) * | 2012-01-06 | 2013-07-11 | Molecular Health | Systems and methods for personalized de-risking based on patient genome data |
EP2801047A2 (en) * | 2012-01-06 | 2014-11-12 | Molecular Health AG | Systems and methods for multivariate analysis of adverse event data |
US20150006438A1 (en) * | 2013-06-26 | 2015-01-01 | International Business Machines Corporation | Method and system for exploring the associations between drug side-effects and therapeutic indications |
WO2015054266A1 (en) * | 2013-10-08 | 2015-04-16 | The Regents Of The University Of California | Predictive optimization of network system response |
US20150324693A1 (en) * | 2014-05-06 | 2015-11-12 | International Business Machines Corporation | Predicting drug-drug interactions based on clinical side effects |
US20160092793A1 (en) * | 2014-09-26 | 2016-03-31 | Thomson Reuters Global Resources | Pharmacovigilance systems and methods utilizing cascading filters and machine learning models to classify and discern pharmaceutical trends from social media posts |
US20160140312A1 (en) * | 2014-11-14 | 2016-05-19 | International Business Machines Corporation | Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity |
WO2016191340A1 (en) * | 2015-05-22 | 2016-12-01 | Georgetown University | Discovery and analysis of drug-related side effects |
CN106529205A (en) * | 2016-11-03 | 2017-03-22 | 中南大学 | Drug target relation prediction method based on drug substructure and molecule character description information |
CN106960131A (en) * | 2017-05-05 | 2017-07-18 | 华东师范大学 | A kind of drug side-effect Forecasting Methodology based on multi-feature fusion |
US20180011977A1 (en) * | 2015-03-13 | 2018-01-11 | Ubic, Inc. | Data analysis system, data analysis method, and data analysis program |
US20180060508A1 (en) * | 2016-08-26 | 2018-03-01 | International Business Machines Corporation | Personalized tolerance prediction of adverse drug events |
CN108647484A (en) * | 2018-05-17 | 2018-10-12 | 中南大学 | A kind of drug relationship prediction technique integrated based on multiple information with least square method |
US20180307804A1 (en) * | 2017-04-21 | 2018-10-25 | International Business Machines Corporation | Identifying chemical substructures associated with adverse drug reactions |
KR101953762B1 (en) * | 2017-09-25 | 2019-03-04 | (주)신테카바이오 | Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data |
US20190206537A1 (en) * | 2018-01-04 | 2019-07-04 | Chioma Cynthia Nwaubani | Method and system for customizing, aggregating, prioritizing, and displaying medication adverse effects |
CN110188812A (en) * | 2019-05-24 | 2019-08-30 | 长沙理工大学 | A kind of multicore clustering method of quick processing missing isomeric data |
US20190279775A1 (en) * | 2018-03-06 | 2019-09-12 | International Business Machines Corporation | Finding Precise Causal Multi-Drug-Drug Interactions for Adverse Drug Reaction Analysis |
CN110246550A (en) * | 2019-06-12 | 2019-09-17 | 西安电子科技大学 | Pharmaceutical composition prediction technique based on drug similitude network data |
KR20200023689A (en) * | 2018-08-20 | 2020-03-06 | 아주대학교산학협력단 | The method of artificial intelligence(AI)-based adverse drug reactions detection and the system thereof |
CN110957002A (en) * | 2019-12-17 | 2020-04-03 | 电子科技大学 | Drug target interaction relation prediction method based on collaborative matrix decomposition |
-
2020
- 2020-04-10 CN CN202010280936.XA patent/CN111477344B/en active Active
Patent Citations (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2005027012A2 (en) * | 2003-09-16 | 2005-03-24 | Pfizer Inc. | System and method for the computer-assisted identification of drugs and indications |
US20130179181A1 (en) * | 2012-01-06 | 2013-07-11 | Molecular Health | Systems and methods for personalized de-risking based on patient genome data |
EP2801047A2 (en) * | 2012-01-06 | 2014-11-12 | Molecular Health AG | Systems and methods for multivariate analysis of adverse event data |
US20150006438A1 (en) * | 2013-06-26 | 2015-01-01 | International Business Machines Corporation | Method and system for exploring the associations between drug side-effects and therapeutic indications |
WO2015054266A1 (en) * | 2013-10-08 | 2015-04-16 | The Regents Of The University Of California | Predictive optimization of network system response |
US20150324693A1 (en) * | 2014-05-06 | 2015-11-12 | International Business Machines Corporation | Predicting drug-drug interactions based on clinical side effects |
US20160092793A1 (en) * | 2014-09-26 | 2016-03-31 | Thomson Reuters Global Resources | Pharmacovigilance systems and methods utilizing cascading filters and machine learning models to classify and discern pharmaceutical trends from social media posts |
US20160140312A1 (en) * | 2014-11-14 | 2016-05-19 | International Business Machines Corporation | Generating drug repositioning hypotheses based on integrating multiple aspects of drug similarity and disease similarity |
US20180011977A1 (en) * | 2015-03-13 | 2018-01-11 | Ubic, Inc. | Data analysis system, data analysis method, and data analysis program |
WO2016191340A1 (en) * | 2015-05-22 | 2016-12-01 | Georgetown University | Discovery and analysis of drug-related side effects |
US20180166175A1 (en) * | 2015-05-22 | 2018-06-14 | Georgetown University | Discovery and analysis of drug-related side effects |
US20180060508A1 (en) * | 2016-08-26 | 2018-03-01 | International Business Machines Corporation | Personalized tolerance prediction of adverse drug events |
CN106529205A (en) * | 2016-11-03 | 2017-03-22 | 中南大学 | Drug target relation prediction method based on drug substructure and molecule character description information |
US20180307804A1 (en) * | 2017-04-21 | 2018-10-25 | International Business Machines Corporation | Identifying chemical substructures associated with adverse drug reactions |
CN106960131A (en) * | 2017-05-05 | 2017-07-18 | 华东师范大学 | A kind of drug side-effect Forecasting Methodology based on multi-feature fusion |
KR101953762B1 (en) * | 2017-09-25 | 2019-03-04 | (주)신테카바이오 | Drug indication and response prediction systems and method using AI deep learning based on convergence of different category data |
US20190206537A1 (en) * | 2018-01-04 | 2019-07-04 | Chioma Cynthia Nwaubani | Method and system for customizing, aggregating, prioritizing, and displaying medication adverse effects |
US20190279775A1 (en) * | 2018-03-06 | 2019-09-12 | International Business Machines Corporation | Finding Precise Causal Multi-Drug-Drug Interactions for Adverse Drug Reaction Analysis |
CN108647484A (en) * | 2018-05-17 | 2018-10-12 | 中南大学 | A kind of drug relationship prediction technique integrated based on multiple information with least square method |
KR20200023689A (en) * | 2018-08-20 | 2020-03-06 | 아주대학교산학협력단 | The method of artificial intelligence(AI)-based adverse drug reactions detection and the system thereof |
CN110188812A (en) * | 2019-05-24 | 2019-08-30 | 长沙理工大学 | A kind of multicore clustering method of quick processing missing isomeric data |
CN110246550A (en) * | 2019-06-12 | 2019-09-17 | 西安电子科技大学 | Pharmaceutical composition prediction technique based on drug similitude network data |
CN110957002A (en) * | 2019-12-17 | 2020-04-03 | 电子科技大学 | Drug target interaction relation prediction method based on collaborative matrix decomposition |
Non-Patent Citations (1)
Title |
---|
范馨月等: "基于文本药物副作用知识发现研究" * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071439A (en) * | 2020-08-19 | 2020-12-11 | 中南大学 | Method, system, computer device and storage medium for predicting side effect relationship of drug |
CN112071439B (en) * | 2020-08-19 | 2024-01-02 | 中南大学 | Drug side effect relationship prediction method, system, computer device, and storage medium |
CN112863693A (en) * | 2021-02-04 | 2021-05-28 | 东北林业大学 | Drug target interaction prediction method based on multi-channel graph convolution network |
WO2024021368A1 (en) * | 2022-07-26 | 2024-02-01 | 苏州科技大学 | Drug side effect prediction method based on restricted boltzmann machine with penalty regularization term |
CN116504331A (en) * | 2023-04-28 | 2023-07-28 | 东北林业大学 | Frequency score prediction method for drug side effects based on multiple modes and multiple tasks |
CN116705148A (en) * | 2023-07-24 | 2023-09-05 | 中国人民解放军总医院 | Antiviral drug screening method and system based on Laplace least square method |
CN116705148B (en) * | 2023-07-24 | 2023-10-27 | 中国人民解放军总医院 | Antiviral drug screening method and system based on Laplace least square method |
CN117079835A (en) * | 2023-08-21 | 2023-11-17 | 广东工业大学 | Multi-view-based medicine-medicine interaction prediction method and system |
CN117079835B (en) * | 2023-08-21 | 2024-02-20 | 广东工业大学 | Multi-view-based medicine-medicine interaction prediction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN111477344B (en) | 2023-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111477344A (en) | Drug side effect identification method based on self-weighted multi-core learning | |
Ghazal et al. | Feature optimization and identification of ovarian cancer using internet of medical things | |
Ravì et al. | Deep learning for health informatics | |
US7062508B2 (en) | Method and computer-based system for non-probabilistic hypothesis generation and verification | |
Bahi et al. | Deep learning for ligand-based virtual screening in drug discovery | |
CN108198621A (en) | A kind of database data synthesis dicision of diagnosis and treatment method based on neural network | |
Lv et al. | Meta learning with graph attention networks for low-data drug discovery | |
Qian et al. | Identification of drug-side effect association via restricted Boltzmann machines with penalized term | |
Song et al. | DNMG: Deep molecular generative model by fusion of 3D information for de novo drug design | |
Urteaga et al. | A machine learning model for the prognosis of pulseless electrical activity during out-of-hospital cardiac arrest | |
Wagstaff et al. | Multiple-instance regression with structured data | |
Wang et al. | scGMAAE: Gaussian mixture adversarial autoencoders for diversification analysis of scRNA-seq data | |
CN115169067A (en) | Brain network model construction method and device, electronic equipment and medium | |
Lee et al. | Benchmarking community detection methods on social media data | |
Yao et al. | Chemical property relation guided few-shot molecular property prediction | |
WO2023240720A1 (en) | Drug screening model construction method and apparatus, screening method, device, and medium | |
Wang et al. | Medical tumor image classification based on Few-shot learning | |
Wei et al. | Shape description and recognition method inspired by the primary visual cortex | |
Gundogdu et al. | SigPrimedNet: a signaling-informed neural network for scRNA-seq annotation of known and unknown cell types | |
Dash et al. | Handbook of research on computational intelligence applications in bioinformatics | |
Xie et al. | A deep learning approach based on feature reconstruction and multi-dimensional attention mechanism for drug-drug interaction prediction | |
Jiji et al. | Decision support techniques for dermatology using case-based reasoning | |
Zheng et al. | Clustering the prevalence of pediatric chronic conditions in the United States using distributed computing | |
Chen et al. | [Retracted] Intelligent Fuzzy Optimization Algorithm for Data Set Information Clustering Patterns Based on Data Mining and IoT | |
Krenek et al. | Artificial neural networks in biomedicine applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |