CN114743590A - Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium - Google Patents

Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium Download PDF

Info

Publication number
CN114743590A
CN114743590A CN202110028123.6A CN202110028123A CN114743590A CN 114743590 A CN114743590 A CN 114743590A CN 202110028123 A CN202110028123 A CN 202110028123A CN 114743590 A CN114743590 A CN 114743590A
Authority
CN
China
Prior art keywords
drug
neural network
medicine
extracting
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110028123.6A
Other languages
Chinese (zh)
Inventor
宋弢
田庆雨
刘嘉丽
刘大岩
杜珍珍
钟悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202110028123.6A priority Critical patent/CN114743590A/en
Publication of CN114743590A publication Critical patent/CN114743590A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/30Drug targeting using structural data; Docking or binding prediction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a system for predicting drug target affinity based on a graph convolution neural network, and belongs to the technical field of drug relocation. The system comprises three channels, wherein the characteristic vector of the two-dimensional representation of the medicine, the characteristic vector of the context association relationship of the medicine SMILES expression and the characteristic vector of the context association relationship of the protein sequence are respectively extracted, then the three characteristic vectors are spliced together and input into a fully-connected neural network, and further the predicted value of the medicine target affinity is obtained. The input of the model is the two-dimensional representation of the drug, the SMILES expression and the protein sequence of the drug, and finally the predicted value of the affinity of the drug and the target is obtained.

Description

Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium
Technical Field
The invention relates to the technical field of drug relocation prediction, in particular to a drug-target affinity prediction system based on a graph-convolution neural network, a computer device and a storage medium.
Background
Experiments to confirm new drug-target interactions (DTIs) are not easy, since in vitro experiments are laborious and time consuming. Even if validated DTI is used to develop new drugs (including non-approved drugs), the approval of such new drugs for human use may take many years, with estimated costs that may exceed $ 10 billion. Furthermore, development of new drugs requires huge investment, but often fails. Indeed, according to the report of Thomson Reuters life science consultant, during 2008 to 2010, the second phase of 108 new and recycled drugs failed, 51% due to insufficient efficacy. This observation highlights the following requirements: (1) new and more suitable drug targets, (2) silicon wafer methods that can improve drug discovery efficiency, screen large amounts of drugs at the initial stage of the drug discovery process, leading to drugs that may exhibit better efficacy. In this regard, methods for predicting DTIs, and in particular for predicting the binding affinity of a drug to a target, are of great interest.
Most of the methods developed to date use binary classification to predict whether there is an interaction between a drug and its target. However, predicting the strength of binding between a drug and its target is of more reference and challenging. Such DTI may not be effective if the strength is not sufficient. Therefore, the method for developing and predicting the binding affinity of the drug and the target has important value.
Disclosure of Invention
Embodiments of the present invention provide a drug-target affinity prediction system, computer device, storage medium based on a graph-volume neural network to provide a basic understanding of some aspects of the disclosed embodiments, a brief summary of which is provided below. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
According to a first aspect of embodiments of the present invention, a drug-target affinity prediction system based on a atlas neural network is provided.
In some alternative embodiments, the system includes a bidirectional gated cyclic unit (BiGRU) model that includes a sequential processing model of two gated cyclic units (GRUs), one input being a forward input and the other input being an inverse input, a two-way recurrent neural network with only input gates and forgetting gates. The input to the model is the SMILES expression of the drug, and the final output is a 200-dimensional vector to represent the SMILES expression.
Optionally, the gating cycle unit (GRU) performs sufficient feature extraction on the multivariate time sequence, and continuously learns the long-term dependency relationship of the multivariate time sequence, which specifically includes: firstly, two gating states are obtained through the last transmitted state and the input of the current node, namely a reset gate (reset gate) for controlling reset and a update gate (update gate), after a gating signal is obtained, data after reset is obtained through the reset gate, the data and the input of the current node are spliced, the data are shrunk to the range of-1 to 1 through a hyperbolic tangent function, finally, the states are updated to the range of 0 to 1 through the update gate, and the more gating signals are close to 1, the more data are represented to be memorized.
Optionally, the system includes three long-short term memory network unit (LSTM) models, where the long-short term memory network unit model includes a neural network model composed of a forgetting gate, an input gate, and an output gate. The input to the model is a protein sequence and the output is a 192-dimensional vector to represent the protein sequence.
Optionally, the system includes four graph convolutional neural network (GNN) models, including a graph convolutional neural network model (GCN), a graph attention neural network model (GAT), a graph isomorphic network model (GIN), and a graph convolutional neural network and graph attention neural network combined model (GCN _ GAT). The input of the model is a two-dimensional molecular graph converted from a SMILES character string, and the final output is a 128-dimensional vector to represent the two-dimensional molecular graph.
Optionally, the system comprises two fully connected neural network models. The input of the model is a vector formed by splicing the three output vectors, and the output is the drug-target affinity.
Optionally, the existing drug target affinity data set is used for the model, the data set is divided according to a 20% test set and an 80% training set, and parameters of the model are perfected through a training process.
According to a second aspect of an implementation of the present invention, a computer device is provided.
In some optional embodiments, the computer device includes a memory, a graphics card, a central processing unit, and an executable program stored in the memory and capable of being processed by the central processing unit and the graphics card in parallel, and the central processing unit implements the following steps when executing the program: constructing a drug target affinity prediction model based on a graph convolution neural network, wherein the drug target affinity prediction model based on the graph convolution neural network comprises the following steps: extracting the medicine characteristics, extracting the target characteristics, extracting the medicine two-dimensional characteristics and predicting the medicine target affinity. Firstly, extracting the context association relationship of the medicine by utilizing a Bi-GRU network, then extracting the two-dimensional representation characteristic of the medicine by utilizing a graph convolution neural network, simultaneously extracting the context association relationship of the protein by utilizing an LSTM network, and finally predicting the target affinity of the medicine by utilizing a fully-connected neural network.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
aiming at increasing capital consumption in the process of drug research and development, the traditional process of drug research and development has long research and development period and high cost, the invention provides a system for predicting drug target affinity, which can reduce the time cost and material cost of drug research and development and reduce the selection range of candidate drugs.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a block diagram illustrating a system for predicting drug target affinity based on a graph-convolution neural network, according to an exemplary embodiment.
Detailed Description
The technical solution of the present invention is further described below with reference to specific embodiments.
As shown in fig. 1, the system for predicting drug target affinity based on the graph convolution neural network described in this embodiment specifically includes:
1) selecting two data sets of Davis and KIBA, and dividing the data sets respectively in a mode that 80% of the data sets are used as training sets and 20% of the data sets are used as testing sets.
2) Using RDKIT tool, SMILES was converted to a two dimensional matrix representation, SMILES was converted to vector form against a dictionary, protein sequences were encoded, and all data were saved in pt files.
3) Calling data in a pt file, inputting the two-dimensional matrix representation of SMILES into four graph convolution neural networks to obtain 128-dimensional feature vectors, wherein the graph convolution neural networks can be selected by a user, inputting the vector representation of SMILES into a Bi-GRU network to obtain 200-dimensional feature vectors, and inputting the vector representation of protein sequences into an LSTM network to obtain 192-dimensional feature vectors.
4) The three eigenvectors are connected and input into a two-layer fully-connected neural network to obtain the predicted affinity of the drug and the target, then the difference between the predicted value and the true value is determined through an MSE loss function, and in the process, CI and MSE are used for representing the relation between the predicted value and the true value.
Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments disclosed herein, it should be understood that the disclosed methods, articles of manufacture (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
It should be understood that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. The present invention is not limited to the procedures and structures that have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (7)

1. A system for predicting drug target affinity based on a graph-convolution neural network belongs to the technical field of drug relocation. The system comprises three channels, wherein the characteristic vector of the two-dimensional representation of the medicine, the characteristic vector of the context association relationship of the medicine SMILES expression and the characteristic vector of the context association relationship of the protein sequence are respectively extracted, then the three characteristic vectors are spliced together and input into a fully-connected neural network, and further the predicted value of the medicine target affinity is obtained. The input of the model is the two-dimensional representation of the drug, the SMILES expression and the protein sequence of the drug, and finally the predicted value of the affinity of the drug and the target is obtained.
2. The system of claim 1, wherein the three-channel method is used for extracting features, and each channel is different, and specifically, for extracting feature vectors represented by two-dimensional medicine, four graph neural networks are used, namely a graph convolutional neural network, a graph attention neural network, a graph isomorphic neural network and a graph convolution neural network and a graph attention network combined network, and when extracting context correlation feature vectors represented by medicine SMILES, SMILES needs to be input into a three-layer Bi-GRU network, and when extracting context correlation feature vectors represented by protein sequences, protein sequences need to be input into a 3-layer LSTM network.
3. The system input according to claim 2, wherein the system input is a vector representation of a SMILES sequence and a vector representation of a protein sequence.
4. The system of claim 1, wherein the three-channel model is trained using existing drug target affinity score data and refined model parameters are obtained.
5. The system of claim 1, wherein after stitching the three eigenvectors, input into a fully-connected network predicts drug target affinity and utilizes back-propagation optimization parameters.
6. A computer device comprising a memory, a graphics card, a central processor, and an executable program stored on the memory that can be processed in parallel by the central processor and the graphics card, the central processor implementing the following steps when executing the program: constructing a drug target affinity prediction model based on a graph convolution neural network, wherein the drug target affinity prediction model based on the graph convolution neural network comprises the following steps: extracting the medicine characteristics, extracting the target characteristics, extracting the two-dimensional characteristics of the medicine and predicting the affinity of the medicine target. Firstly, extracting the context association relationship of the medicine by utilizing a Bi-GRU network, then extracting the two-dimensional representation characteristic of the medicine by utilizing a graph convolution neural network, simultaneously extracting the context association relationship of the protein by utilizing an LSTM network, and finally predicting the target affinity of the medicine by utilizing a fully-connected neural network.
7. The computer apparatus of claim 6, wherein during the training phase, the back propagation algorithm is used to continuously optimize the parameters of the model to improve the fitting ability of the model.
CN202110028123.6A 2021-01-07 2021-01-07 Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium Pending CN114743590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110028123.6A CN114743590A (en) 2021-01-07 2021-01-07 Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110028123.6A CN114743590A (en) 2021-01-07 2021-01-07 Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium

Publications (1)

Publication Number Publication Date
CN114743590A true CN114743590A (en) 2022-07-12

Family

ID=82274242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110028123.6A Pending CN114743590A (en) 2021-01-07 2021-01-07 Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN114743590A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713965A (en) * 2022-10-28 2023-02-24 兰州大学 Computing method for predicting compound-protein affinity based on GECo model
CN117435995A (en) * 2023-12-20 2024-01-23 福建理工大学 Biological medicine classification method based on residual map network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115713965A (en) * 2022-10-28 2023-02-24 兰州大学 Computing method for predicting compound-protein affinity based on GECo model
CN117435995A (en) * 2023-12-20 2024-01-23 福建理工大学 Biological medicine classification method based on residual map network
CN117435995B (en) * 2023-12-20 2024-03-19 福建理工大学 Biological medicine classification method based on residual map network

Similar Documents

Publication Publication Date Title
CN110689920B (en) Protein-ligand binding site prediction method based on deep learning
US20160350649A1 (en) Method and apparatus of learning neural network via hierarchical ensemble learning
US9984323B2 (en) Compositional prototypes for scalable neurosynaptic networks
CN111507768A (en) Determination method of potential user, model training method and related device
CN114743590A (en) Drug-target affinity prediction system based on graph convolution neural network, computer device and storage medium
CN112070277A (en) Hypergraph neural network-based drug-target interaction prediction method
Keceli et al. Deep learning-based multi-task prediction system for plant disease and species detection
CN112951328B (en) MiRNA-gene relation prediction method and system based on deep learning heterogeneous information network
CN112562791A (en) Drug target action depth learning prediction system based on knowledge graph, computer equipment and storage medium
Tavakoli Modeling genome data using bidirectional LSTM
CN112652358A (en) Drug recommendation system, computer equipment and storage medium for regulating and controlling disease target based on three-channel deep learning
CN112530515A (en) Novel deep learning model for predicting protein affinity of compound, computer equipment and storage medium
CN112562781A (en) Novel coding scheme, computer device and storage medium for predicting compound protein affinity based on deep learning
CN112530514A (en) Novel depth model, computer device, storage medium for predicting compound protein interaction based on deep learning method
CN112542211A (en) Method for predicting protein affinity of compound based on single attention mechanism, computer device and storage medium
CN112582020A (en) Method for predicting compound protein affinity based on edge attention mechanism, computer device and storage medium
CN109214515A (en) A kind of deep neural network inference method and calculate equipment
Koeppe et al. Explainable artificial intelligence for mechanics: physics-informing neural networks for constitutive models
CN105659260A (en) Dynamically assigning and examining synaptic delay
CN112132269A (en) Model processing method, device, equipment and storage medium
CN117012303A (en) DTI prediction method, system, storage medium and device based on reinforcement learning
KR20180087069A (en) A method for predicting drug-target interactions via self-training
CN117037917A (en) Cell type prediction model training method, cell type prediction method and device
CN116403657A (en) Drug response prediction method and device, storage medium and electronic device
Fan et al. A lightweight multiscale convolutional neural network for garbage sorting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication