CN112435720A - Prediction method based on self-attention mechanism and multi-drug characteristic combination - Google Patents

Prediction method based on self-attention mechanism and multi-drug characteristic combination Download PDF

Info

Publication number
CN112435720A
CN112435720A CN202011403977.XA CN202011403977A CN112435720A CN 112435720 A CN112435720 A CN 112435720A CN 202011403977 A CN202011403977 A CN 202011403977A CN 112435720 A CN112435720 A CN 112435720A
Authority
CN
China
Prior art keywords
drug
protein
medicine
features
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011403977.XA
Other languages
Chinese (zh)
Other versions
CN112435720B (en
Inventor
宋晓宁
华阳
於东军
冯振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ditu Suzhou Biotechnology Co ltd
Original Assignee
Shanghai Litu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Litu Information Technology Co ltd filed Critical Shanghai Litu Information Technology Co ltd
Priority to CN202011403977.XA priority Critical patent/CN112435720B/en
Publication of CN112435720A publication Critical patent/CN112435720A/en
Application granted granted Critical
Publication of CN112435720B publication Critical patent/CN112435720B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Medicinal Chemistry (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Bioethics (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a prediction method based on a self-attention mechanism and multi-drug feature combination, which comprises the steps that drug molecules are compiled into two embedded features through extended connectivity fingerprints and Mol2Vec vectors, and drug features are extracted through a bidirectional gating circulation unit and neighborhood convolution; after the protein sequence in the medicine is embedded with the characteristics, extracting protein characteristics by using one-dimensional convolution and performing related attention enhancement with the medicine characteristics; splicing the drug characteristics and the protein characteristics, and enhancing the extraction of protein drug interaction information by using an attention mechanism; the spliced features were placed into a bidirectional gated cycle unit and predicted protein and drug interactions. Combining Morgan fingerprint compiling and Mol2vec vector embedding, so that the extracted medicine characteristic information is richer; the convolution network is found to be combined with a gating circulation unit to extract the characteristics of the protein and the medicine, and the attention mechanism is matched to enhance the extraction of the relationship characteristics between the protein and the medicine, so that the performance of the model can be effectively improved.

Description

Prediction method based on self-attention mechanism and multi-drug characteristic combination
Technical Field
The invention relates to the technical field of protein-drug interaction prediction, in particular to a prediction method based on an attention mechanism and multi-drug characteristic combination.
Background
Predictive work on protein-drug interactions is crucial in early drug screening, with 75% of the entire pharmaceutical industry being devoted to new drug research according to the American Association of drug research and manufacturers' statistics. In addition, only less than 5% of compounds obtained by primary screening can be used in clinical experiments, the traditional large-scale experimental screening usually takes 2-3 years, a large amount of time and energy are consumed by researchers, and the virtual screening of the drugs by using a computer is short in time and high in accuracy, so that the cost of drug screening can be effectively reduced. However, the virtual Drug screening is performed on the premise that the Interaction between different proteins and drugs is predicted (Protein-Drug Interaction, PDI).
The method mostly uses an MLP model to predict the protein-drug interaction, but the method cannot highlight local important information of drug characteristics and also cannot enable the prediction performance of the whole model to be optimal, so that a method for predicting the protein-drug interaction by using a Deep Long Short-Term Memory network (Deep LSTM) is provided, and the result is optimal in the prediction of the action of enzyme and G protein coupled receptors. Although the method still cannot predict the protein-drug interaction on a large scale, the intervention of time sequence information is shown to capture more effective identification characteristics of the interaction between the protein and the drug; therefore, research is needed to further enhance the effectiveness of the model in large-scale protein-drug interaction prediction.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned problems with the prediction of the existing protein-drug interactions.
Therefore, the technical problem solved by the invention is as follows: although the traditional method effectively gives out the detailed characteristics of molecules, the lack of structural information of drug molecules often directly influences the performance of protein-drug interaction prediction; with the increase of drug types, the identification degree of the existing graph volume model to the molecular structure is gradually reduced, so that the overall performance of the model is reduced, and the existing method mostly uses an MLP model to predict the protein-drug interaction. However, the method cannot highlight local important information of the medicine characteristics, and the prediction performance of the whole model cannot be optimal.
In order to solve the technical problems, the invention provides the following technical scheme: the drug molecules are compiled into two embedded characteristics through an extended connectivity fingerprint and a Mol2Vec vector, and the drug characteristics are extracted through a bidirectional gating circulation unit and neighborhood convolution; after the protein sequence in the medicine is embedded with the characteristics, extracting protein characteristics by using one-dimensional convolution and performing related attention enhancement with the medicine characteristics; splicing the drug characteristics and the protein characteristics, and enhancing the extraction of protein drug interaction information by using an attention mechanism; the spliced features were placed into a bidirectional gated cycle unit and predicted protein and drug interactions.
As a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the method comprises the following steps of extracting the drug characteristics, namely embedding the characteristics of the drug by combining two modes of expanding connectivity fingerprints and compiling Mol2Vec vectors, firstly extracting the characteristics of the embedded characteristics through a bidirectional gate control circulation unit, splicing the drug characteristics obtained in the two modes, and then further extracting the characteristics of the drug by utilizing a one-dimensional convolution neural network; and finally, sending the obtained result and the protein characteristics into a classifier together so as to obtain the medicine characteristics.
As a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the extended connectivity fingerprint comprises a circular fingerprint, and encoding the pharmacomumature formula using the extended connectivity fingerprint comprises: the environment and connectivity of each atom are analyzed on a given radius, then all possible structures are subjected to hash coding, and finally the coding information is compressed to a preset length by using a hash algorithm.
As a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the Mol2Vec vector compiling comprises that the Mol2Vec vector compiling is evolved from natural language processing, molecular substructures which point to directions similar to those of chemically-related substructures can be learned, and the compound is finally coded into the vector by summing vectors of the substructures.
As a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the protein feature extraction comprises the steps of preprocessing the protein sequence, dividing 22 amino acids into 6 amino acids according to the biochemical features of the protein sequence, and comprising the following steps: a ═ H, R, K }, B ═ D, E, N, Q }, C ═ C, X }, D ═ S, T, P, a, G, U }, E ═ M, I, L, V } and F ═ F, Y, W }, so that the sequence "MSPLNQSAEGLPQEASNRSLN" can be converted into "eddebddbdedbbddbadeb", the method yields a combined number of 6 × 6 ═ 216 feature matrices with significantly reduced dimensionality; meanwhile, the protein and medicine features are extracted by utilizing a one-dimensional convolution network, and the formula of the convolution extracted features is as follows:
Figure BDA0002818025270000031
wherein: functions x (t) and q (t) are variables of convolution, p is an integral variable, t is an amount for shifting the function q (-p), and is convolution, and the protein sequence is subjected to feature embedding, one-dimensional convolution, maximum pooling and full connection to obtain 128-dimensional features, and is put into a classifier together with the drug features.
As a group of the inventionA preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, wherein: said enhancing the attention associated with said drug characteristics comprises setting said drug molecular feature vector to FdrugThe protein proton sequence feature vector is P ═ { P ═ P1,P2,…,PiAnd construct a structure about FdrugThe attention matrix of (a) can be used to calculate which of the sub-sequences are more important to the drug molecule by assigning more weight to the protein proton sequence, and the formula is as follows:
Wattention=f(WinterFdrug+Binter)
P′i=σ(WattentionPi)
wherein: f is a function that can be learned by gradient descent, WinterAnd BinterFor trainable weights and biases in the model, WattentionAs an attention matrix, Pi' to focus on protein features after learning.
As a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the method for enhancing the extraction of protein-drug interaction information by using the self-attention mechanism comprises the step of giving a spliced PDI characteristic vector cinteractionConstructing a self-attention matrix Wself-attenEmphasis is given to the interaction information region learning, whose formula is expressed as follows:
Wself-atten=f(Wintercinteraction+Binter)
c′interaction=Wself-attencinteraction
as a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the method for extracting the drug characteristics further comprises the step of providing additional drug characteristics by utilizing a message transmission network, wherein the message transmission network is used for predicting quantum chemical properties and is very prominent to be represented on a small sample model, and the method mainly comprises the following three steps: message passing, for each atom, the features (atoms or bonds) of its neighbouring elements are propagated into a so-called message vector based on the graph structure; updating data, namely updating the embedded atomic features through message vectors; and (4) reading aggregation, and aggregating the atomic features in the molecules to obtain molecular feature vectors.
As a preferred embodiment of the prediction method based on the combination of the self-attention mechanism and the multi-drug feature, the method comprises the following steps: the message passing network comprises a specific algorithm of the message passing network, which comprises the following steps: firstly, constructing an initial state set, wherein each state is used for each node in the graph, and then allowing each node to exchange information with the neighbor of the node for message transmission, so that the state of each node comprises the perception of the direct neighbor of the node; repeating the steps, each node obtains the information of the second-order neighborhood, further reaches the expected times of 'message rounds', collects the node states of all the contexts and converts the node states into the characteristics representing the whole graph, and the formula of the node update weight is as follows:
Figure BDA0002818025270000041
Figure BDA0002818025270000042
wherein: mtAs a function of the message, utFor the node update function, N (v) is the set of neighbors of the node in the graph,
Figure BDA0002818025270000043
is the hidden state of the node at time t,
Figure BDA0002818025270000044
for each node, messages are passed from its neighbors and aggregated from its surroundings into a message vector for the corresponding message vector
Figure BDA0002818025270000045
Finally updating the atom hidden state g by the message vectorv
The invention has the beneficial effects that: a method for extracting medicine characteristics in a composite mode is provided, Morgan fingerprint compiling and Mol2vec vector embedding are combined, details of medicines are expressed, substructure information of the medicines is provided in detail, and extracted medicine characteristic information is richer; amino acids are classified according to biological activity, and sparsity of protein features is effectively reduced. Meanwhile, the convolution network is found to be combined with a gate control circulation unit to extract the characteristics of the protein and the medicine, and the attention mechanism is matched to enhance the extraction of the relationship characteristics between the protein and the medicine, so that the performance of the model can be effectively improved; and a GUI interface which is easy to operate is designed, a using method is provided, and the usability of the model in actual work is enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a flow chart illustrating a prediction method based on a combination of an attention-free mechanism and multiple drug features according to a first embodiment of the present invention;
FIG. 2 is a Morgan fingerprint code diagram of a drug based on the self-attention mechanism and a multi-drug feature combination prediction method according to a first embodiment of the present invention;
FIG. 3 is a plot of the drug Mol2Vec vector compilation based on the prediction method of the combination of the self-attention mechanism and the multi-drug feature according to the first embodiment of the present invention;
FIG. 4 is a diagram of a model for drug feature extraction based on a prediction method of self-attention mechanism and multi-drug feature combination according to a first embodiment of the present invention;
FIG. 5 is a block diagram of an algorithm for a prediction method based on a combination of an attention mechanism and multiple drug features according to a first embodiment of the present invention;
FIG. 6 is a graph of a protein drug interaction simulation based on a prediction method of the combination of the self-attention mechanism and the multi-drug feature according to the third embodiment of the present invention;
FIG. 7 is a graph showing the results of a protein drug interaction test based on the prediction method of the combination of the self-attention mechanism and the multi-drug characteristics according to the third embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1 to 5, a first embodiment of the present invention provides a prediction method based on a combination of an attention mechanism and multiple drug features, including:
s1: drug molecules are compiled into two embedded characteristics through expanding connectivity fingerprints and Mol2Vec vectors, and drug characteristics are extracted through a bidirectional gating circulation unit and neighborhood convolution. In which it is to be noted that,
the extraction of the medicine features comprises the steps that the purpose of extracting the medicine features is to extract identification features of medicines, so that a classifier can better understand the medicine properties and distinguish the differences among different medicines, therefore, excellent medicine features need to have identifiability, typicality and rich information content, the classifier can better fit a hyperplane for classification, and the accuracy of a model is improved; combining two modes of expanding connectivity fingerprints (Morgan fingerprints) and Mol2Vec vector compiling to embed the characteristics of the medicine, firstly extracting the characteristics of the embedded characteristics through a bidirectional gate control circulation unit, splicing the medicine characteristics obtained in the two modes, and then further extracting the characteristics of the medicine by utilizing a one-dimensional convolution neural network; and finally, sending the obtained result and the protein characteristics into a classifier together so as to obtain the medicine characteristics.
Further, the Morgan fingerprint is a circular fingerprint, and referring to fig. 2, the encoding of the formula of the drug using the Morgan fingerprint includes: analyzing the environment and connectivity of each atom on a given radius, performing hash coding on all possible structures, and compressing coding information to a preset length by using a hash algorithm; because the fingerprint coding mode has comprehensive representativeness and the content can be directly obtained from the database, the Morgan fingerprints are used as the characteristic representation of the medicines in many protein-medicine interaction prediction networks, but the Morgan fingerprints are too discrete and relatively large in size, and the rational representation of the substructure information of the medicines is difficult.
Referring to fig. 3, Mol2Vec vector compilation is evolved from Word2Vec in Natural Language Processing (NLP), can learn molecular substructures pointing to directions similar to chemically-related substructures, and finally codes a compound as a vector by summing vectors of the substructures, wherein the compilation mode can clearly show the substructural features of a medicament, has strong typicality and has important supplementary effect on the Morgan features; in order to obtain more abundant and distinctive drug characteristics, the invention combines the two modes to perform characteristic embedding on the drug, and the model refers to the black area in fig. 4.
Further, extracting drug characteristics also includes providing additional drug characteristics by using a Message Passing network (MPNN) for predicting quantum chemical properties, which is very prominent on a small sample model, and mainly includes three steps: message passing, for each atom, the features (atoms or bonds) of its neighbouring elements are propagated into a so-called message vector based on the graph structure; updating data, namely updating the embedded atomic features through message vectors; reading out aggregation, wherein atomic features in molecules are aggregated to obtain molecular feature vectors; the specific algorithm of the message passing network comprises: firstly, constructing an initial state set, wherein each state is used for each node in the graph, and then allowing each node to exchange information with the neighbor of the node for message transmission, so that the state of each node comprises the perception of the direct neighbor of the node; repeating the steps, each node obtains the information of the second-order neighborhood, further reaches the expected times of 'message rounds', collects the node states of all the contexts and converts the node states into the characteristics representing the whole graph, and the formula of the node update weight is as follows:
Figure BDA0002818025270000071
Figure BDA0002818025270000072
wherein: mtAs a function of the message, utFor the node update function, N (v) is the set of neighbors of the node in the graph,
Figure BDA0002818025270000073
is the hidden state of the node at time t,
Figure BDA0002818025270000074
for each node, messages are passed from its neighbors and aggregated from its surroundings into a message vector for the corresponding message vector
Figure BDA0002818025270000075
Finally updating the atom hidden state g by the message vectorv
S2: after the protein sequence in the medicine is embedded with the characteristics, the protein characteristics are extracted by utilizing one-dimensional convolution and the attention of the protein sequences is enhanced relative to the medicine characteristics. In which it is to be noted that,
extracting protein features includes pre-treating protein sequence, classifying 22 kinds of amino acids into 6 kinds based on their biochemical features, including: a ═ H, R, K }, B ═ D, E, N, Q }, C ═ C, X }, D ═ S, T, P, a, G, U }, E ═ M, I, L, V } and F ═ F, Y, W }, so that the sequence "MSPLNQSAEGLPQEASNRSLN" can be converted into "eddebddbdedbbddbadeb", the method yields a combined number of 6 × 6 ═ 216 feature matrices with significantly reduced dimensionality; meanwhile, the protein and medicine features are extracted by utilizing a one-dimensional convolution network, and the formula of the convolution extracted features is as follows:
Figure BDA0002818025270000081
wherein: the functions x (t) and q (t) are convolution variables, p is an integral variable, t is an amount for shifting the function q (-p), and is convolution, and the protein sequence is subjected to feature embedding, one-dimensional convolution, maximum pooling and full connection to obtain 128-dimensional features, and is put into a classifier together with the drug features.
S3: the drug characteristics and the protein characteristics are spliced, and the extraction of protein drug interaction information is enhanced by utilizing a self-attention mechanism. In which it is to be noted that,
the attention enhancement related to the drug characteristics comprises setting the molecular feature vector of the drug as FdrugThe protein proton sequence feature vector is P ═ { P ═ P1,P2,…,PiAnd construct a structure about FdrugThe attention matrix of (a) can be used to calculate which of the sub-sequences are more important to the drug molecule by assigning more weight to the protein proton sequence, and the formula is as follows:
Wattention=f(WinterFdrug+Binter)
P′i=σ(WattentionPi)
wherein: f is a function that can be learned by gradient descent, WinterAnd BinterFor trainable weights and biases in the model, WattentionIs an attention matrix, P'iTo focus on the learned protein characteristics.
Enhancing the extraction of protein-drug interaction information using a self-attention mechanism includes giving a spliced PDI feature vector cinteractionConstructing a self-attention matrix Wself-attenEmphasis is given to the interaction information region learning, whose formula is expressed as follows:
Wself-atten=f(Wintercinteraction+Binter)
c′interaction=Wself-attencinteraction
s4: the spliced features were placed into a two-way gated circulation unit and the protein and drug interactions predicted. In which it is to be noted that,
c 'is spliced characteristic'interact′dxp``11ionPutting the two-way gating circulation unit for training and inputting the layer of characteristics into a classifier to predict a final result; the invention uses binary cross entropy as a loss function of network training, and the formula is expressed as follows:
Figure BDA0002818025270000082
wherein: theta is the weight of the entire model, yiFor the label of the i-th training sample,
Figure BDA0002818025270000083
outputting a result for the network of the ith training sample; to prevent overfitting, the present invention constrains network optimization using the L2 paradigm as a penalty term:
Figure BDA0002818025270000091
wherein: w and b are weight and bias of each layer of model, lambda is a penalty factor, a dropout layer is embedded in the last two layers of the model to solve the problem, and in order to give consideration to training efficiency and classification results, an Adam optimizer is used for carrying out weight optimization on the depth network.
Example 2
As a second embodiment of the present invention, in order to better verify and explain the technical effects adopted in the method of the present invention, in the present embodiment, three data sets are selected for testing, and the test results are compared by means of scientific demonstration to verify the real effects of the method;
before performing the experiment, three data sets of BindingDB, Kinase and Human are selected for verifying the effect of the model, wherein the BindingDB data set is divided into a training set, a verification set and a test set according to the scheme shown in the following table 1, wherein the verification machine and the test set comprise PDI samples of which no ligand or protein is observed in the training set, so that the generalization of the model to unknown drugs and proteins can be evaluated by combining the DB data sets.
Table 1: BindingDB dataset distribution.
Dataset Protein Drug Positive Negative
Train 758 43160 28240 21915
Dev 472 5077 2831 2776
Test 466 5016 2706 2802
The Kinase dataset is constructed based on a KIBA dataset comprising 229 protein samples and 1644 drug samples, KIBA has been developed for various scoring mechanisms for testing activity, such as IC50, ki, Kd, etc., and compared with various biological activity scores, the Kinase dataset can greatly reduce the deviation in the dataset, and the positive and negative sample numbers of Kinase are extremely unbalanced, as shown in Table 2 below:
table 2: kinase dataset distribution.
Dataset Positive Negative
Train 19183 72282
Test 3990 15695
The Human data set contains 852 Human proteins and 1052 drug molecules, and 3369 positive samples and 2843 highly reliable negative samples exist in the data, but the data set is not divided into a training set and a testing set, so the model is evaluated on the data set by adopting a cross-entropy verification mode.
In the experiment of this embodiment, hardware configurations are an intel core i7-8700k central processing unit and an intein GeForce RTX 2060s display card, an operating system is Windows10, wherein training and evaluation of a model respectively use a Keras deep learning framework and a sklern machine learning tool in Python3 environment, in the process of training a network model, effects caused by different parameters are greatly different when a weight is optimized, a learning rate of the weight is preferentially determined here, parameter optimization is performed on other parameters by a grid search method on the basis of determining the learning rate, and through multiple rounds of experiments, the hyper-parameter setting shown in the following table 3 is determined:
table 3: and setting parameters.
Name Value
Learning rate 0.0001
Learning decay 0.001
Cnn filters 128
Cnn stride 10,15
Dropout 0.05
Regularizer 0.0001
The present embodiment evaluates each model using 2 indexes,respectively, the area under the ROC curve and the area under the PR curve, where the area under the ROC curve is expressed as: AUC, each point on the ROC curve is coordinated by the values of two indices: true Positive Rate (TPR):
Figure BDA0002818025270000101
and False Positive Rate (FPR):
Figure BDA0002818025270000102
the area under the PR curve is expressed as: each point on the aucr, PR curve is coordinated by the values of two indices: precision ratio (Precision, P):
Figure BDA0002818025270000103
and Recall (Recall, R):
Figure BDA0002818025270000104
wherein: TP is the number of positive case prediction errors, FP is the number of negative case prediction errors, TN is the number of positive case prediction errors, and FN is the number of negative case prediction errors.
The method for extracting the drug characteristics of DeepconvDTI is adopted initially in the experiment, namely only Morgan vectors are used for compiling drug molecules and the model training of the invention is combined, and the AUC (optimal result) of the test is equal to 0.954; experiments show that the method for extracting the drug characteristics ignores the information of the overall molecular substructure, and the information is very important for predicting the protein-drug interaction, so that the invention provides that the Mol2Vec vector and the Morgan vector which cover the drug substructure information are spliced, and then the characteristics are extracted by a convolution network; however, in the study of Withnall et al on the graph network active learning of the implicit structural features of the molecules, it is mentioned that the Message Passing graph network (MPNN) can enable the model to have the capability of learning the molecular structure, based on this, the present invention guesses that the features extracted by adding the graph network can further improve the prediction capability of the model, however, the experimental result of adding the drug molecular features extracted by the MPNN model on the original basis does not reach the expectation, and under the condition that the sample size is relatively large, the advantages of the graph network are relatively limited, and the experimental result is shown in table 4 below, it can be seen that the features compiled by adding the MPNN model on the original basis, the recognition rate of the model is reduced to 0.951, the effect is not ideal, so that the mode of combining the Morgan fingerprint and the morl 2Vec vector compilation is finally used as the feature information of the drug in this document.
Then, after the invention goes through a plurality of experiments to finally determine the method for extracting the drug characteristics and uses the bidirectional gating cycle unit in combination with softmax as a classifier, the use scheme of the attention module is also discussed, as shown in table 4 below, in the table, O indicates that the upper module is used, X indicates that the upper module is not used, when no attention module is used, the AUC of the trained model on the test set is equal to 0.954, the invention expects that the extracted characteristics of the protein are more related to the corresponding drug characteristics, therefore, the attention module is added between the protein characteristics and the drug characteristics, the result of the AUPR is improved by 0.3% on the original basis, on the basis, we continue to add the self-attention module between the combined characteristic layer and the bidirectional gating unit layer to optimize the experimental result, the trained model is equal to 0.961 on the test AUC set, and the invention is supplemented with the experiment of only adding the self-attention module, the AUC of the test can reach 0.960, which is 1% higher than the initial result, and the effect is very obvious.
Table 4: based on ablation experiments on the BindingDB dataset.
Figure BDA0002818025270000111
Figure BDA0002818025270000121
The experiment evaluated the models on the above 3 data sets and performed comparative experiments using conventional models of various types, including nearest neighbor model (KNN), Random Forest (RF), L2 logistic regression, support vector machine, and CPI-GNN model, but since the details of parameters of other models than CPI-GNN model are not mentioned here, the effects of the former four types of models will not be discussed except for the human data set, and in addition, the discussion of GraphDTA model, dedcepi model, GCN model, and TransformeiCPI model, which are the most typical models used in predicting PDI in recent two years, has important reference significance.
GraphDTA, GCN, CPI-GNN, TransformeiCPI, depcdti and the models presented herein were compared in sequence on the BindingDB dataset, where the first four experimental data were all current, the depconconvdti model and the model data of the present invention were optimal values for regulatory references during the experiment, and the BindingDB dataset contains a large number of protein and drug samples that were not contained in the training set, with the results shown in table 5 below:
table 5: comparative experiments on BindingDB dataset.
Ways AUC AUPR
GraphDTA 0.929 0.917
GCN 0.927 0.913
CPI-GNN 0.603 0.543
TransformerCPI 0.951 0.949
DeepConvDTI 0.944 0.947
Ours 0.961 0.962
The table shows that compared with other leading edge models, the model provided by the invention is better, the AUC is improved by 1.5% compared with baseline, and is improved by 1% compared with the highest value; as the number of negative samples in practical prediction application is far greater than that of positive samples, the performance of the model is guaranteed under unbalanced data, and in order to verify the effect of the model on the unbalanced data of the positive samples and the negative samples, the existing models in the Kinase data set are compared in the experiment, as shown in the following table 6, compared with the other four models, the model of the invention still has excellent performance on the unbalanced data set, and the performance of the model cannot be reduced due to the increase of the negative samples.
Table 6: comparative experiments on Kinase data set.
Ways AUC AUPR
GraphDTA 0.934 0.935
GCN 0.928 0.930
CPI-GNN 0.922 0.922
TransformerCPI 0.926 0.923
Ours 0.937 0.962
Finally, the performance of the model of the invention is verified again on a more common Human data set, because the data set is not divided into a training set and a test set, the model needs to be evaluated in a cross entropy verification mode, and meanwhile, in order to ensure the comparability of the experiment, the experiment adopts the same division proportion as that of the predecessor to divide the data into 4: 1, the evaluation system is consistent with the traditional method, the mean value and the variance of the obtained optimal values are shown in the following table 7 after ten different divisions, and the model provided by the invention is more excellent compared with other similar models no matter the precision or the stability is not difficult to see.
Table 7: comparative experiments on Human data sets.
Figure BDA0002818025270000131
Figure BDA0002818025270000141
Example 3
Referring to fig. 6 to 7, in order to better verify and explain the practicability of the method of the present invention, the method of the present invention is used to screen and cure the application of the alzheimer disease drug;
dementia is one of the noteworthy problems in public health management, wherein more than 80% of dementia cases suffer from Alzheimer's Disease (AD), and currently available therapies only help to temporarily relieve symptoms, but do not cure the disease or reverse the disease process with respect to neuropathology, so that a new treatment to delay or arrest the disease progression remains an urgent medical need, and it is well accepted according to the theory of AD that the loss of cholinergic neurons leads to a decrease in the neurotransmitter acetylcholine (Ach), so that inhibition of acetylcholinesterase (AChE) can increase the level of Ach, i.e. cognitive ability; meanwhile, researches show that the content of butyrylcholinesterase (BuChE) is kept unchanged at the late stage of the disease, even hydrolyzable ACh is increased, so that adverse effects brought by the activity reduction of acetylcholinesterase (AChE) at the late stage of the disease are replaced, and a mouse experiment for knocking out an acetylcholinesterase gene supports the hypothesis, and further proves that the selective inhibition of BuChE is positively correlated with the improvement of cognitive performance and memory; in other words, inhibition of acetylcholinesterase and butyrylcholinesterase is an important means for treating Alzheimer's Disease (AD), and therefore, the present invention designs a set of drug screening tools with practical significance to screen drugs inhibiting acetylcholinesterase and butyrylcholinesterase based on the proposed model, puts the protein to be tested and sufficient drug molecules into the system, the system will give the number of the drug of Top15 and the histogram (from high to low) of the predicted value of the interaction between the protein and each drug, for the significance of the model to be effectively verified, the test data selected in this example are not in the data set of the training model, but it is noteworthy that acetylcholinesterase exists in the data set of the training model, the amino acid sequence of butyrylcholinesterase and acetylcholinesterase which do not exist in the training data have 65% similarity, the principle of the PDI depth model is to infer unknown interaction relationship based on the existing interaction relationship, therefore, the tested information is often associated with the known information, otherwise, the test result is not based.
In the embodiment, the drug data provided by Rajnish Kumar et al is used as a test target, the test set is 35 compounds determined by Rajnish Kumar et al through manual screening from an Asinex library, and 2-dimensional structural formulas of drug molecules and Asinex numbers are given, the drug molecular formula for predicting PDI is obtained from PubChem according to the corresponding numbers and structural formulas, meanwhile, the Inhibition rates (Inhibition Rate, IR) of each drug molecule on Ache and buffer are given by Rajnish Kumar, according to the criteria defined herein, the IR <0.5 is recorded as no interaction, the IR >0.5 is recorded as interaction, and the test data shown in the following table 8 is obtained.
Table 8: drug test set.
CID AChE BuChE CID AChE BuChE
1148028(A1) 0 0 135644857(B6) 0 0
1120622(A2) 0 1 6489641(B7) 1 0
709041(A3) 0 0 1292545(C1) 1 0
1153034(A4) 0 0 6498716(C2) 0 1
135411325(A5) 0 0 6498728(C3) 0 0
1453054(A6) 0 0 6498729(C4) 0 1
43817564(A7) 0 0 5305813(C5) 1 1
3228454(B1) 0 0 3123873(C6) 0 0
651119(B2) 0 0 3201566(C7) 0 0
2684623(B3) 0 0 1448624(D1) 0 0
1096744(B4) 0 0 1439318(D2) 0 0
1071391(B5) 0 0 6411211(D3) 1 0
1126267(D4) 1 0 3149085(E2) 0 0
807832(D5) 0 0 1171875(E3) 0 0
1166551(D6) 0 0 3146341(E4) 0 0
651744(D7) 0 0 72030131(E5) 0 0
715450(E1) 0 0 6496850(E6) 0 0
It is noted that none of the tested drugs exist in the training set but similar structures and identical functional groups are not excluded, and the invention obtains the histogram of TOP15 by systematic prediction according to the high and low predicted interaction values, as shown in FIG. 7, from which it can be seen that the protein drug combinations with interactions in Table 8 exist substantially in the predicted Top15 range, which indicates the practical applicability of the model provided by the invention.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (9)

1. A prediction method based on a self-attention mechanism and multi-drug feature combination is characterized by comprising the following steps: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
the drug molecules are compiled into two embedded characteristics through an extended connectivity fingerprint and a Mol2Vec vector, and the drug characteristics are extracted through a bidirectional gating circulation unit and neighborhood convolution;
after the protein sequence in the medicine is embedded with the characteristics, extracting protein characteristics by using one-dimensional convolution and performing related attention enhancement with the medicine characteristics;
splicing the drug characteristics and the protein characteristics, and enhancing the extraction of protein drug interaction information by using an attention mechanism;
the spliced features were placed into a bidirectional gated cycle unit and predicted protein and drug interactions.
2. The method of claim 1, wherein the method comprises: the extraction of the characteristics of the medicine comprises the following steps,
combining two modes of expanding connectivity fingerprints and compiling Mol2Vec vectors to embed the characteristics of the medicine, firstly extracting the characteristics of the embedded characteristics through a bidirectional gating circulation unit, splicing the characteristics of the medicine obtained in the two modes, and then further extracting the characteristics of the medicine by utilizing a one-dimensional convolutional neural network; and finally, sending the obtained result and the protein characteristics into a classifier together so as to obtain the medicine characteristics.
3. The method of claim 2, wherein the method comprises: the extended connectivity fingerprint includes a set of one or more extended connectivity fingerprints,
the extended connectivity fingerprint is a circular fingerprint, and encoding the pharmacoemulsification formula using the extended connectivity fingerprint comprises: the environment and connectivity of each atom are analyzed on a given radius, then all possible structures are subjected to hash coding, and finally the coding information is compressed to a preset length by using a hash algorithm.
4. A prediction method based on the combination of the self-attention mechanism and the multi-drug feature as claimed in claim 2 or 3, characterized in that: the Mol2Vec vector compilation includes,
the Mol2Vec vector compilation evolves from natural language processing, can learn molecular substructures that point in a similar direction as chemically related substructures, and finally encode the compound as a vector by summing the vectors of the individual substructures.
5. The method of any of claims 1 to 3, wherein the method comprises: the extracted protein features comprise the following components in percentage by weight,
the protein sequence is pretreated, 22 amino acids are divided into 6 according to the biochemical characteristics, and the method comprises the following steps: a ═ H, R, K }, B ═ D, E, N, Q }, C ═ C, X }, D ═ S, T, P, a, G, U }, E ═ M, I, L, V } and F ═ F, Y, W }, so that the sequence "MSPLNQSAEGLPQEASNRSLN" can be converted into "eddebddbdedbbddbadeb", the method yields a combined number of 6 × 6 ═ 216 feature matrices with significantly reduced dimensionality; meanwhile, the protein and medicine features are extracted by utilizing a one-dimensional convolution network, and the formula of the convolution extracted features is as follows:
Figure FDA0002818025260000021
wherein: functions x (t) and q (t) are variables of convolution, p is an integral variable, t is an amount for shifting the function q (-p), and is convolution, and the protein sequence is subjected to feature embedding, one-dimensional convolution, maximum pooling and full connection to obtain 128-dimensional features, and is put into a classifier together with the drug features.
6. The method of claim 1, wherein the method comprises: said attention enhancement associated with said drug profile comprises,
setting the molecular characteristic vector of the drug as FdrugThe protein proton sequence feature vector is P ═ { P ═ P1,P2,…,PiAnd construct a structure about FdrugThe attention matrix of (a) can be used to calculate which of the sub-sequences are more important to the drug molecule by assigning more weight to the protein proton sequence, and the formula is as follows:
Wattention=f(WinterFdrug+Binter)
P′i=σ(WattentionPi)
wherein: f is a function that can be learned by gradient descent, WinterAnd BinterFor trainable weights and biases in the model, WattentionIs an attention matrix, P'iTo focus on the learned protein characteristics.
7. The method of claim 1 or 6, wherein the method comprises the following steps: the extraction of protein drug interaction information by using the self-attention mechanism enhancement comprises the following steps,
given spliced PDI feature vector cinteractionConstructing a self-attention matrix Wself-attenEmphasis is given to the interaction information region learning, whose formula is expressed as follows:
Wself-atten=f(Wintercinteraction+Binter)
c′interaction=Wself-attencinteraction
8. the method of prediction based on a combination of an attention mechanism and multi-drug features according to claim 1 or 2, characterized by: the method for extracting the characteristics of the medicine also comprises the following steps,
additional drug features may also be provided using a messaging network for predicting quantum chemistry, which represents a very prominent feature on small sample models, consisting essentially of three steps: message passing, for each atom, the features (atoms or bonds) of its neighbouring elements are propagated into a so-called message vector based on the graph structure; updating data, namely updating the embedded atomic features through message vectors; and (4) reading aggregation, and aggregating the atomic features in the molecules to obtain molecular feature vectors.
9. The method of claim 8, wherein the method comprises: the messaging network may include a network of messages including,
the specific algorithm of the messaging network comprises: firstly, constructing an initial state set, wherein each state is used for each node in the graph, and then allowing each node to exchange information with the neighbor of the node for message transmission, so that the state of each node comprises the perception of the direct neighbor of the node; repeating the steps, each node obtains the information of the second-order neighborhood, further reaches the expected times of 'message rounds', collects the node states of all the contexts and converts the node states into the characteristics representing the whole graph, and the formula of the node update weight is as follows:
Figure FDA0002818025260000031
Figure FDA0002818025260000032
wherein: mtAs a function of the message, utFor the node update function, N (v) is the set of neighbors of the node in the graph,
Figure FDA0002818025260000033
is the hidden state of the node at time t,
Figure FDA0002818025260000034
for each node, messages are passed from its neighbors and aggregated from its surroundings into a message vector for the corresponding message vector
Figure FDA0002818025260000035
Finally updating the atom hidden state g by the message vectorv
CN202011403977.XA 2020-12-04 2020-12-04 Prediction method based on self-attention mechanism and multi-drug characteristic combination Active CN112435720B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011403977.XA CN112435720B (en) 2020-12-04 2020-12-04 Prediction method based on self-attention mechanism and multi-drug characteristic combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011403977.XA CN112435720B (en) 2020-12-04 2020-12-04 Prediction method based on self-attention mechanism and multi-drug characteristic combination

Publications (2)

Publication Number Publication Date
CN112435720A true CN112435720A (en) 2021-03-02
CN112435720B CN112435720B (en) 2021-10-26

Family

ID=74691194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011403977.XA Active CN112435720B (en) 2020-12-04 2020-12-04 Prediction method based on self-attention mechanism and multi-drug characteristic combination

Country Status (1)

Country Link
CN (1) CN112435720B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066539A (en) * 2021-03-22 2021-07-02 上海商汤智能科技有限公司 Prediction method and related device and equipment
CN113241128A (en) * 2021-04-29 2021-08-10 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113299354A (en) * 2021-05-14 2021-08-24 中山大学 Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network
CN113744799A (en) * 2021-09-06 2021-12-03 中南大学 End-to-end learning-based compound and protein interaction and affinity prediction method
CN114530205A (en) * 2021-08-31 2022-05-24 天津工业大学 Organ chip database vectorization scheme for artificial intelligence algorithm
CN114792573A (en) * 2022-04-18 2022-07-26 北京百度网讯科技有限公司 Drug combination effect prediction method, model training method, device and equipment
WO2023233396A1 (en) * 2022-05-29 2023-12-07 B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University System and method of predicting efficacy of treatment

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090298162A1 (en) * 2005-02-16 2009-12-03 Michel Bouvier Biosensors for monitoring receptor-mediated g-protein activation
US20100056622A1 (en) * 2008-08-27 2010-03-04 Lauterbach Edward C Methods of Using Ramelteon to Treat Patients Suffering from a Variety of Neurodegenerative Diseases
WO2013180636A1 (en) * 2012-06-01 2013-12-05 Ridgeview Diagnostics Ab Method for the selection of compounds
CN104239751A (en) * 2014-09-05 2014-12-24 南京理工大学 GPCR(G Protein-Coupled Receptor)-drug interaction prediction method based on postprocessing study
CN106529205A (en) * 2016-11-03 2017-03-22 中南大学 Drug target relation prediction method based on drug substructure and molecule character description information
CN106778032A (en) * 2016-12-14 2017-05-31 南京邮电大学 Ligand molecular magnanimity Feature Selection method in drug design
CN108959841A (en) * 2018-04-16 2018-12-07 华南农业大学 A kind of drug targeting albumen effect prediction technique based on DBN algorithm
CN109712678A (en) * 2018-12-12 2019-05-03 中国人民解放军军事科学院军事医学研究院 Relationship Prediction method, apparatus and electronic equipment
CN109887541A (en) * 2019-02-15 2019-06-14 张海平 A kind of target point protein matter prediction technique and system in conjunction with small molecule
CN110289050A (en) * 2019-05-30 2019-09-27 湖南大学 A kind of drug based on figure convolution sum term vector-target interaction prediction method
CN110322962A (en) * 2019-07-03 2019-10-11 重庆邮电大学 A kind of method automatically generating diagnostic result, system and computer equipment
CN110910951A (en) * 2019-11-19 2020-03-24 江苏理工学院 Method for predicting protein and ligand binding free energy based on progressive neural network
CN111081316A (en) * 2020-03-25 2020-04-28 元码基因科技(北京)股份有限公司 Method and device for screening new coronary pneumonia candidate drugs
CN111222338A (en) * 2020-01-08 2020-06-02 大连理工大学 Biomedical relation extraction method based on pre-training model and self-attention mechanism
CN111667884A (en) * 2020-06-12 2020-09-15 天津大学 Convolutional neural network model for predicting protein interactions using protein primary sequences based on attention mechanism
CN111785320A (en) * 2020-06-28 2020-10-16 西安电子科技大学 Drug target interaction prediction method based on multilayer network representation learning
CN111882044A (en) * 2020-08-05 2020-11-03 四川大学 Eutectic prediction method and deep learning framework based on graph neural network
CN111919258A (en) * 2017-12-01 2020-11-10 韩国科学技术院 Method for predicting drug-to-drug or drug-to-food interaction using structural information of drug
CN111985245A (en) * 2020-08-21 2020-11-24 江南大学 Attention cycle gating graph convolution network-based relation extraction method and system

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090298162A1 (en) * 2005-02-16 2009-12-03 Michel Bouvier Biosensors for monitoring receptor-mediated g-protein activation
US20100056622A1 (en) * 2008-08-27 2010-03-04 Lauterbach Edward C Methods of Using Ramelteon to Treat Patients Suffering from a Variety of Neurodegenerative Diseases
WO2013180636A1 (en) * 2012-06-01 2013-12-05 Ridgeview Diagnostics Ab Method for the selection of compounds
CN104239751A (en) * 2014-09-05 2014-12-24 南京理工大学 GPCR(G Protein-Coupled Receptor)-drug interaction prediction method based on postprocessing study
CN106529205A (en) * 2016-11-03 2017-03-22 中南大学 Drug target relation prediction method based on drug substructure and molecule character description information
CN106778032A (en) * 2016-12-14 2017-05-31 南京邮电大学 Ligand molecular magnanimity Feature Selection method in drug design
CN111919258A (en) * 2017-12-01 2020-11-10 韩国科学技术院 Method for predicting drug-to-drug or drug-to-food interaction using structural information of drug
CN108959841A (en) * 2018-04-16 2018-12-07 华南农业大学 A kind of drug targeting albumen effect prediction technique based on DBN algorithm
CN109712678A (en) * 2018-12-12 2019-05-03 中国人民解放军军事科学院军事医学研究院 Relationship Prediction method, apparatus and electronic equipment
CN109887541A (en) * 2019-02-15 2019-06-14 张海平 A kind of target point protein matter prediction technique and system in conjunction with small molecule
CN110289050A (en) * 2019-05-30 2019-09-27 湖南大学 A kind of drug based on figure convolution sum term vector-target interaction prediction method
CN110322962A (en) * 2019-07-03 2019-10-11 重庆邮电大学 A kind of method automatically generating diagnostic result, system and computer equipment
CN110910951A (en) * 2019-11-19 2020-03-24 江苏理工学院 Method for predicting protein and ligand binding free energy based on progressive neural network
CN111222338A (en) * 2020-01-08 2020-06-02 大连理工大学 Biomedical relation extraction method based on pre-training model and self-attention mechanism
CN111081316A (en) * 2020-03-25 2020-04-28 元码基因科技(北京)股份有限公司 Method and device for screening new coronary pneumonia candidate drugs
CN111667884A (en) * 2020-06-12 2020-09-15 天津大学 Convolutional neural network model for predicting protein interactions using protein primary sequences based on attention mechanism
CN111785320A (en) * 2020-06-28 2020-10-16 西安电子科技大学 Drug target interaction prediction method based on multilayer network representation learning
CN111882044A (en) * 2020-08-05 2020-11-03 四川大学 Eutectic prediction method and deep learning framework based on graph neural network
CN111985245A (en) * 2020-08-21 2020-11-24 江南大学 Attention cycle gating graph convolution network-based relation extraction method and system

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
ALEXANDRE VARNEK等: ""Application of the mol2vec Technology to Large-size Data Visualization and Analysis"", 《MOLECULAR INFORMATICS》 *
BOWEN TANG等: ""A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility"", 《JOURNAL OF CHEMINFORMATICS》 *
JEONGHEE JO等: ""The Message Passing Neural Networks for Chemical Property Prediction on SMILES"", 《METHODS》 *
SABRINA JAEGER等: ""Mol2Vec:Unsupervised Machine Learning Approach with Chemical Intuition"", 《JOURNAL OF CHEMICAL INFORMATION & MODELING》 *
ZHENG S等: ""Predicting drug-protein interaction using quasi-visual question answering system"", 《NATURE MACHINE INTELLIGENCE》 *
ZHENG XIA等: ""Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces"", 《BMC SYSTEMS BIOLOGY》 *
丁林松等: ""基于序列的蛋白质—药物相互作用预测研究"", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *
张生东: ""具有注意力机制的BiLSTM-CRF药物相互作用提取"", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》 *
李伟等: ""深度学习在药物设计与发现中的应用"", 《药学学报》 *
李叙潼等: ""人工智能算法在药物细胞敏感性预测中的应用"", 《科学通报》 *
陈鑫等: ""药物表示学习研究进展"", 《清华大学学报(自然科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066539A (en) * 2021-03-22 2021-07-02 上海商汤智能科技有限公司 Prediction method and related device and equipment
CN113241128A (en) * 2021-04-29 2021-08-10 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113241128B (en) * 2021-04-29 2022-05-13 天津大学 Molecular property prediction method based on molecular space position coding attention neural network model
CN113299354A (en) * 2021-05-14 2021-08-24 中山大学 Small molecule representation learning method based on Transformer and enhanced interactive MPNN neural network
CN113299354B (en) * 2021-05-14 2023-06-30 中山大学 Small molecule representation learning method based on transducer and enhanced interactive MPNN neural network
CN114530205A (en) * 2021-08-31 2022-05-24 天津工业大学 Organ chip database vectorization scheme for artificial intelligence algorithm
CN113744799A (en) * 2021-09-06 2021-12-03 中南大学 End-to-end learning-based compound and protein interaction and affinity prediction method
CN113744799B (en) * 2021-09-06 2023-10-13 中南大学 Method for predicting interaction and affinity of compound and protein based on end-to-end learning
CN114792573A (en) * 2022-04-18 2022-07-26 北京百度网讯科技有限公司 Drug combination effect prediction method, model training method, device and equipment
WO2023233396A1 (en) * 2022-05-29 2023-12-07 B. G. Negev Technologies And Applications Ltd., At Ben-Gurion University System and method of predicting efficacy of treatment

Also Published As

Publication number Publication date
CN112435720B (en) 2021-10-26

Similar Documents

Publication Publication Date Title
CN112435720B (en) Prediction method based on self-attention mechanism and multi-drug characteristic combination
Ching et al. Opportunities and obstacles for deep learning in biology and medicine
Wang et al. A computational-based method for predicting drug–target interactions by using stacked autoencoder deep neural network
Hu et al. Large-scale prediction of drug-target interactions from deep representations
Tan et al. Evolutionary computing for knowledge discovery in medical diagnosis
Espejo et al. A survey on the application of genetic programming to classification
Uzma et al. Gene encoder: a feature selection technique through unsupervised deep learning-based clustering for large gene expression data
Velu et al. Visual data mining techniques for classification of diabetic patients
CN113936735A (en) Method for predicting binding affinity of drug molecules and target protein
CN110853756B (en) Esophagus cancer risk prediction method based on SOM neural network and SVM
Huang et al. Drug–drug similarity measure and its applications
CN113571125A (en) Drug target interaction prediction method based on multilayer network and graph coding
CN114882970B (en) Medicine interaction effect prediction method based on pre-training model and molecular diagram
Abasabadi et al. Hybrid feature selection based on SLI and genetic algorithm for microarray datasets
Zhong et al. Clinical charge profiles prediction for patients diagnosed with chronic diseases using Multi-level Support Vector Machine
CN115985503B (en) Cancer prediction system based on ensemble learning
CN114999566B (en) Drug repositioning method and system based on word vector characterization and attention mechanism
CN117198408A (en) Multimode comprehensive integrated drug repositioning system and method
Hu et al. Cancer gene selection with adaptive optimization spiking neural P systems and hybrid classifiers
Dweekat et al. Addressing cancer readmission prediction model drift: A case study
Arteta Albert et al. Intelligent Indexing—Boosting Performance in Database Applications by Recognizing Index Patterns
Alzubaidi et al. A multivariate feature selection framework for high dimensional biomedical data classification
Bonetta Valentino et al. Machine learning using neural networks for metabolomic pathway analyses
Shiuh et al. Prediction of Thyroid Disease using Machine Learning Approaches and Featurewiz Selection
Zhang et al. scCompressSA: dual-channel self-attention based deep autoencoder model for single-cell clustering by compressing gene–gene interactions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230505

Address after: Room 2871, Building 1, No. 388 Huqiu Road, Huqiu Street, Gusu District, Suzhou City, Jiangsu Province, 215008

Patentee after: Ditu (Suzhou) Biotechnology Co.,Ltd.

Address before: Room 6037, building 3, 112-118 Gaoyi Road, Baoshan District, Shanghai

Patentee before: Shanghai Litu Information Technology Co.,Ltd.

TR01 Transfer of patent right