CN115910196A - Method and system for predicting drug-target protein interaction - Google Patents

Method and system for predicting drug-target protein interaction Download PDF

Info

Publication number
CN115910196A
CN115910196A CN202211322181.0A CN202211322181A CN115910196A CN 115910196 A CN115910196 A CN 115910196A CN 202211322181 A CN202211322181 A CN 202211322181A CN 115910196 A CN115910196 A CN 115910196A
Authority
CN
China
Prior art keywords
target protein
drug
molecule
molecules
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211322181.0A
Other languages
Chinese (zh)
Inventor
张越
胡玉晴
刘晓勇
赵慧民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202211322181.0A priority Critical patent/CN115910196A/en
Publication of CN115910196A publication Critical patent/CN115910196A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a method and a system for predicting drug-target protein interaction, wherein the method comprises the following steps: acquiring the molecular data of a drug to be predicted and the molecular data of a target protein to be predicted, and preprocessing the molecular data; extracting the characteristics of the preprocessed drug molecule data and the target protein molecule data; screening and splicing the implicit characteristics of the drug molecules and the target protein molecules; predicting the interaction relationship between the drug molecule and the target protein molecule. The method extracts the implicit characteristics of the drug molecules to be predicted and the target protein molecules to be predicted by adopting the variational self-encoder, screens the drug molecules and the target protein molecules by an attention mechanism, and predicts the interaction relationship between the drug molecules and the target protein molecules based on the multilayer perceptron, so that the prediction precision is improved, and the prediction of the interaction relationship between the drug and the target protein can be better realized.

Description

Method and system for predicting drug-target protein interaction
Technical Field
The invention relates to the technical field of biological medicines, in particular to a method and a system for predicting drug-target protein interaction.
Background
The interaction between the drug and the target is the combination between the drug and the target, and is very important for the discovery of new drugs and the reuse of the existing drugs. With the discovery of new drugs, the field of drug development is expanding, and there is increasing interest in understanding the relocation of existing drugs and the new interactions of approved drugs.
From the present, the predicted drug-target protein interactions can be divided into three categories, namely molecular docking-based methods, machine learning-based methods and deep learning-based methods. Among them, the molecular docking-based method, which seeks the best site for binding to the protein structure, is time-consuming, and many data sets lack the three-dimensional structure of the protein, which makes the final prediction of the interaction between drug proteins less accurate; the method based on machine learning usually needs manual marking of features, and for the manually marked features, people are required to have related experience and professional knowledge and time-consuming work, and errors may occur in manual operation, so that the final prediction result and a true value have deviation; the method based on deep learning is a popular operation at present, is applied to various fields of bioinformatics, can further improve the interaction performance between drug proteins through a deep learning framework structure and network parameters, can learn the high-dimensional characteristics of drugs and proteins, and obtains more accurate information, thereby obtaining higher precision.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method and a system for predicting the interaction between a drug and a target protein.
The invention provides a method for predicting drug-target protein interaction, which comprises the following steps:
acquiring drug molecule data of a drug to be predicted and target protein molecule data of a target protein to be predicted, and preprocessing the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted to obtain preprocessed drug molecule data and target protein molecule data;
extracting the characteristics of the preprocessed drug molecule data and the target protein molecule data to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules;
screening and splicing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules to obtain the characteristic relationship between the drug molecules and the target protein molecules;
predicting the interaction relationship between the drug molecule and the target protein molecule.
The preprocessing of the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted comprises the following steps:
acquiring a drug molecular structure sequence of the drug to be predicted according to the drug molecular data of the drug to be predicted, wherein the drug molecular structure sequence of the drug to be predicted is compiled based on a simplified molecular linear input specification;
and performing digital vector conversion operation on the drug molecular structure sequence of the drug to be predicted based on the embedded layer to obtain the drug molecular structure sequence vector of the drug to be predicted.
The preprocessing of the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted further comprises:
obtaining a protein sequence of a target protein molecule of the target protein to be predicted according to the target protein molecule data of the target protein to be predicted;
and carrying out digital vector conversion operation on the protein sequence of the target protein molecule of the target protein to be predicted based on the embedded layer to obtain the target protein sequence vector of the target protein to be predicted.
The characteristic extraction of the preprocessed drug molecule data and the target protein molecule data comprises the following steps:
and carrying out feature extraction on the preprocessed drug molecule data and target protein molecule data based on a variational self-encoder.
The characteristic extraction of the preprocessed drug molecule data and target protein molecule data based on the variational self-encoder comprises the following steps:
performing feature extraction on the preprocessed drug molecule data and target protein molecule data based on a first gated convolution network, a first random temporary discarding function, a first activating function, a second gated convolution network, a second random temporary discarding function, a second activating function and a third gated convolution network of an encoder of a variational self-encoder to obtain features of the drug molecules and features of the target protein molecules;
compressing the characteristics of the drug molecules and the characteristics of the target protein molecules based on the pooling layer, and performing Gaussian distribution calculation based on the first full-link layer and the second full-link layer to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules.
The feature extraction of the preprocessed drug molecule data and the target protein molecule data based on the variational self-encoder further comprises the following steps:
reconstructing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules by a decoder based on a variational self-encoder to obtain original characteristic data of the drug molecule data and the target protein molecule data;
and performing feature extraction on the original feature data of the drug molecule data and the target protein molecule data by using an encoder based on a variational self-encoder to obtain more accurate latent features of the drug molecules and the target protein molecules.
The step of reconstructing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules by the decoder based on the variational self-encoder comprises the following steps:
converting the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule into a new dimension based on a third full junction layer;
deconvoluting the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule in a new dimension based on a first deconvolution layer, a third activation function, a second deconvolution layer, a fourth activation function, a third deconvolution layer and a fifth activation function, and outputting a deconvolution processing result of the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule in the new dimension;
and performing dimension conversion processing on the deconvolution processing results of the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules with the new dimensions based on a fourth full-link layer to obtain original characteristic data of the drug molecule data and the target protein molecule data.
The screening and splicing treatment of the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules comprises the following steps:
and screening and splicing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules based on an attention mechanism to obtain more precise implicit characteristics of the drug molecules and the target protein molecules.
The predicting the interaction relationship between the drug molecule and the target protein molecule comprises:
and predicting the characteristic relation between the drug molecules and the target protein molecules based on a multilayer perceptron.
The invention also provides a drug-target protein interaction prediction system, which comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program to realize the drug-target protein interaction prediction method.
The method processes the acquired drug molecular data of the drug to be predicted and the target protein molecular data of the target protein to be predicted into a digital vector form, achieves unified dimensionality, and facilitates computer identification and subsequent feature extraction operation; multiple feature extraction and reconstruction operations are performed through a variational self-encoder, so that more accurate hidden features of drug molecules of the drug to be predicted and target protein molecules of target protein to be predicted can be obtained; screening the implicit characteristics of the drug molecules of the drug to be predicted and the target protein molecules of the target protein to be predicted based on an attention mechanism, so as to obtain more precise characteristic representation; the interaction relation between the drug molecules of the drug to be predicted and the target protein molecules of the target protein to be predicted is predicted through the multilayer perceptron, so that the prediction precision is improved, and the prediction of the interaction relation between the drug and the target protein can be better realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for predicting drug-target protein interactions in an embodiment of the invention;
FIG. 2 is a structural flow diagram for prediction of drug-target protein interactions in an example of the invention;
FIG. 3 is a flowchart illustrating the pre-processing of drug molecule data of the drug to be predicted and target protein molecule data of the target protein to be predicted according to an embodiment of the present invention;
fig. 4 is a flowchart of feature extraction performed on the preprocessed drug molecule data and target protein molecule data by the variational-based self-encoder in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the present invention, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, actions, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, actions, components, parts, or combinations thereof may be present or added.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
The embodiment of the invention relates to a method for predicting drug-target protein interaction, which comprises the following steps: acquiring drug molecule data of a drug to be predicted and target protein molecule data of a target protein to be predicted, and preprocessing the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted to obtain preprocessed drug molecule data and target protein molecule data; extracting the characteristics of the preprocessed drug molecule data and the target protein molecule data to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules; screening and splicing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules to obtain the characteristic relationship between the drug molecules and the target protein molecules; predicting the interaction relationship between the drug molecule and the target protein molecule.
In an alternative implementation manner of this embodiment, as shown in fig. 1 and fig. 2, fig. 1 shows a flowchart of a method for predicting drug-target protein interaction in this embodiment of the present invention, and fig. 2 shows a flowchart of a structure of drug-target protein interaction prediction in this embodiment of the present invention, which includes the following steps:
s101, obtaining drug molecule data of a drug to be predicted and target protein molecule data of a target protein to be predicted, and preprocessing the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted to obtain preprocessed drug molecule data and target protein molecule data;
in an optional implementation manner of this embodiment, as shown in fig. 3, fig. 3 shows a flowchart for preprocessing drug molecular data of the drug to be predicted and target protein molecular data of the target protein to be predicted in this embodiment of the present invention, which includes the following steps:
s301, acquiring a drug molecular structure sequence of the drug to be predicted and a protein sequence of a target protein molecule of the target protein to be predicted according to the drug molecular data of the drug to be predicted and the target protein molecular data of the target protein to be predicted;
in an optional implementation manner of this embodiment, the drug molecular structure sequence of the drug to be predicted is obtained according to the drug molecular data of the drug to be predicted.
The method comprises the steps of obtaining a medicine molecular structure sequence of a medicine to be predicted according to medicine molecular data of the medicine to be predicted, and converting the medicine molecular structure sequence of the medicine to be predicted into an integer form according to an established medicine character dictionary.
Specifically, the drug molecular structure sequence of the drug to be predicted is some character strings, such as: drug: CC1= C2C = C (C = CC2= NN 1) C3= …, and the drug Molecular structure sequence of the drug to be predicted is written based on SMILES (Simplified Molecular Input Line Entry System), which is a specification for explicitly describing a Molecular structure using an ASCII (American Standard Code for Information exchange) character string, and is used for representing the drug Molecular structure sequence of the drug to be predicted in the embodiment of the present invention.
Specifically, the ratio of drug: CC1= C2C = C (C = CC2= NN 1) C3= …, and the molecular structural sequence of this drug is, for example, "C" represents a carbon atom, "N" represents a nitrogen atom, "and" = "represents a double bond, and a number indicates a ring in the structure, and two atoms at the position of the break after opening are denoted by the same number, and indicate that the atoms are bonded to each other, and further, for an atom whose valence is insufficient, the atom is complemented with a hydrogen atom, and the hydrogen atom is usually omitted in actual writing.
In an alternative implementation of this embodiment, the drug molecular structure sequence of the drug to be predicted is converted into integer numerical form according to an established drug character dictionary.
Specifically, each character in the drug molecular structure sequence of the drug to be predicted is correspondingly converted into an integer number in comparison with an established drug character dictionary, and the size of the drug character dictionary is 64.
In an alternative implementation manner of this embodiment, the protein sequence of the target protein molecule of the target protein to be predicted is obtained according to the target protein molecule data of the target protein to be predicted.
Obtaining the protein sequence of the target protein molecule of the target protein to be predicted according to the target protein molecule data of the target protein to be predicted, and converting the protein sequence of the target protein molecule of the target protein to be predicted into an integer form according to the established protein character dictionary.
Specifically, each character in the protein sequence of the target protein molecule of the target protein to be predicted represents an amino acid.
In an alternative implementation of this embodiment, the protein sequences of the target protein molecules of the target protein to be predicted are converted into integer form according to an established protein character dictionary.
Specifically, each character in the protein sequence of the target protein molecule of the target protein to be predicted is correspondingly converted into an integer number in comparison with an established protein character dictionary, and the size of the protein character dictionary is 25.
Specifically, in the protein sequence of the target protein molecule of the target protein to be predicted, "a" represents alanine and alanine, and is correspondingly converted into "1"; "C" represents cysteine, corresponding to the conversion to "2"; "E" represents glutamic acid, glutamate, corresponding to the conversion to "4", and so on, converting the protein sequence to integer numerical form.
S302, performing digital vector conversion operation on the drug molecular structure sequence of the drug to be predicted and the protein sequence of the target protein molecule of the target protein to be predicted based on the embedded layer to obtain the drug molecular structure sequence vector of the drug to be predicted and the target protein sequence vector of the target protein to be predicted.
In an alternative implementation manner of this embodiment, the drug molecular structure sequence of the drug to be predicted and the protein sequence of the target protein molecule of the target protein to be predicted are converted into a 128-dimensional digital vector form based on Embedding, where the Embedding parameters num Embedding of the drug molecular structure sequence (representing how many words in total) =64, embedding dim (representing how many dimensions of vector created per word) =128; the Embedding parameter num embeddings =25, embedding dim) =128 of the protein sequence of the target protein molecule.
The medicine molecular structure sequence of the medicine to be predicted and the protein sequence of the target protein molecule of the target protein to be predicted are converted into integer digital forms, and meanwhile, the unified dimension is 128 dimensions based on the embedded layer, so that the subsequent feature extraction operation is facilitated, and the data processing efficiency is improved.
S102, extracting the characteristics of the preprocessed drug molecule data and the preprocessed target protein molecule data to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules;
the method is based on a variational self-encoder to extract the characteristics of the preprocessed drug molecule data and target protein molecule data, and the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules are obtained.
In an alternative implementation manner of this embodiment, as shown in fig. 4, fig. 4 is a flowchart illustrating a feature extraction process performed on preprocessed drug molecule data and target protein molecule data by a variational self-encoder according to an embodiment of the present invention, and the process includes the following steps:
s401, extracting features of an encoder based on a variational self-encoder;
in an optional implementation manner of this embodiment, feature extraction is performed on the preprocessed drug molecule data and target protein molecule data based on a first gated convolution network, a first random temporary discarding function, a first activation function, a second gated convolution network, a second random temporary discarding function, a second activation function, and a third gated convolution network of an encoder of a variation self-encoder, so as to obtain features of the drug molecules and features of the target protein molecules;
specifically, the first Gated convolution network, the second Gated convolution network and the third Gated convolution network are Convolutional Neural Networks Gated Neural Networks (Gated Convolutional Networks) with gating mechanisms, and the Gated convolution Networks are used for extracting characteristics of the preprocessed drug molecule data and the target protein molecule data.
More, the first random discarding function is located behind the first gated convolutional network, the second random discarding function is located behind the second gated convolutional network, and the random discarding function (dropout) function is that a part of neuron nodes are randomly and temporarily discarded with a certain probability in the training of the deep neural network, so that the regularization effect is achieved, the deep neural network is prevented from being over-fitted, the generalization capability of the deep neural network model is improved, and the robustness, the universality and the robustness of the model are improved.
Furthermore, the first activation function is located after the first random discard function, the second activation function is located after the second random discard function, and the activation function (Relu) is a linear rectification function, which is also called a modified linear unit, and has the functions of increasing the nonlinear relationship among the layers of the deep neural network, saving the amount of computation, reducing the interdependence relationship among the parameters, and alleviating the occurrence of the overfitting problem.
It should be noted that the number of filters set in the second gated convolutional network is 2 times that of the first gated convolutional network, and the number of filters set in the third gated convolutional network is 3 times that of the first gated convolutional network.
In an optional implementation manner of this embodiment, the features of the drug molecules and the features of the target protein molecules are compressed based on the pooling layer, and gaussian distribution calculation is performed based on the first full-link layer and the second full-link layer to obtain the implicit features of the drug molecules and the implicit features of the target protein molecules.
Specifically, the pooling layer (adaptive avgpool1 d) is a form of down-sampling, and the over-fitting problem can be effectively prevented by using the pooling layer as well.
And more, respectively inputting the output results out obtained by the pooling layer into the first full-connection layer and the second full-connection layer, and performing calculation operation by combining Gaussian distribution with an interval of 0-1 to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules.
Specifically, the first full connection layer and the second full connection layer are full connection layers (FC), each node is connected to all nodes of the previous layer, and the full connection layers are used for integrating the extracted features, in the embodiment of the present invention, the implicit features of the drug molecules and the implicit features of the target protein molecules are obtained by combining gaussian distribution calculation.
The encoder based on the variational self-encoder performs feature extraction on the preprocessed drug molecular data and the target protein molecular data, so that the problem of overfitting can be effectively prevented, and the efficiency of feature extraction is improved.
S402, carrying out reconstruction operation on a decoder based on a variational self-encoder;
in an optional implementation manner of this embodiment, to obtain more accurate latent features, a decoder based on a variational self-encoder reconstructs the latent features of the drug molecules and the latent features of the target protein molecules, and original feature data of the drug molecule data and the target protein molecule data are obtained.
Specifically, a third full-link layer, a first deconvolution layer, a third activation function, a second deconvolution layer, a fourth activation function, a third deconvolution layer, a fifth activation function, and a fourth full-link layer of a decoder based on a variational self-encoder reconstruct the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules to obtain original characteristic data of the drug molecule data and the target protein molecule data.
Specifically, the third full junction layer is used for converting the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule into a new dimension, so that the subsequent deconvolution operation is facilitated.
Furthermore, the first deconvolution layer, the second deconvolution layer, and the third deconvolution layer are deconvolution layers (convTranspose 1 d), and the implicit features of the new-dimension object molecules and the implicit features of the target protein molecules are respectively converted into corresponding dimensions when the decoder performs decoding reconstruction operation.
Furthermore, the third activation function is located after the first deconvolution layer, the fourth activation function is located after the second deconvolution layer, the fifth activation function is located after the third deconvolution layer, and the third activation function, the fourth activation function, and the fifth activation function are Relu functions.
Further, the fourth full-link layer is configured to convert a deconvolution processing result of the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule, which are output by the deconvolution layer, into an original dimension of the drug molecule data and the target protein molecule data, so as to obtain a drug molecule structure sequence of the source sample and a protein sequence of the target protein molecule of the source sample.
And S403, outputting the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules.
Here, steps S401 to S402 are repeated, and the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule are output.
The variable self-encoder is used for extracting the characteristics of the drug molecular data and the target protein molecular data, and meanwhile, the encoder and the decoder based on the variable self-encoder are used for extracting and reconstructing the characteristics for multiple times, so that the error between the hidden characteristics and the original characteristics is continuously reduced, more accurate hidden characteristics are obtained, and the accuracy of characteristic extraction is improved.
S103, screening and splicing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules to obtain more precise implicit characteristics of the drug molecules and the target protein molecules;
in an optional implementation manner of this embodiment, the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules are screened based on an attention mechanism, and are spliced to obtain more precise implicit characteristics of the drug molecules and the target protein molecules.
It should be noted that, here, since the reconstruction operation of the decoder is a convolution and deconvolution operation, and since the convolution and deconvolution operations can only obtain local features, there is a possibility that the hidden feature information of the drug molecules and the target protein molecules is lost, the screening process is performed based on the attention mechanism, so that the key information in the hidden feature information of the drug molecules and the target protein molecules can be focused, and meanwhile, part of unnecessary hidden feature information is ignored, and the accuracy and precision of feature extraction are improved.
In an optional implementation manner of this embodiment, the hidden features of the drug molecules and the hidden features of the target protein molecules after the screening process are spliced, and are activated by a sigmod activation function, so as to obtain more precise hidden features of the drug molecules and the target protein molecules.
S104, predicting the interaction relation between the drug molecules and the target protein molecules.
In an alternative implementation of this embodiment, the prediction of the characteristic relationship between the drug molecule and the target protein molecule is based on a multi-layered perceptron.
Specifically, the Multilayer Perceptron (MLP) is a neural network composed of fully-connected layers including at least one hidden layer, and the output of each hidden layer is transformed by an activation function.
In summary, the embodiment of the invention processes the acquired drug molecular data of the drug to be predicted and the target protein molecular data of the target protein to be predicted into a digital vector form by a drug-target protein interaction prediction method, and achieves uniform dimensionality, thereby facilitating computer identification and subsequent feature extraction operation; multiple feature extraction and reconstruction operations are carried out through a variational self-encoder, so that more accurate hidden features of drug molecules of the drug to be predicted and target protein molecules of target protein to be predicted can be obtained; screening the implicit characteristics of the drug molecules of the drug to be predicted and the target protein molecules of the target protein to be predicted based on an attention mechanism, so as to obtain more precise characteristic representation; the interaction relation between the drug molecules of the drug to be predicted and the target protein molecules of the target protein to be predicted is predicted through the multilayer perceptron, so that the prediction precision is improved, and the prediction of the interaction relation between the drug and the target protein can be better realized.
The embodiment of the invention also relates to a drug-target protein interaction prediction system, which comprises: the system comprises a memory, a processor and a computer program stored on the memory, wherein the processor executes the computer program, and the memory stores data generated in the operation of the system.
It should be noted that the processor executes the computer program for implementing the method for predicting drug-target protein interaction according to the present invention.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, and the program may be stored in a computer-readable storage medium, and the storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, and the like.
In summary, the embodiment of the invention processes the acquired drug molecular data of the drug to be predicted and the target protein molecular data of the target protein to be predicted into a digital vector form by a drug-target protein interaction prediction system, and realizes uniform dimensionality, thereby facilitating computer identification and subsequent feature extraction operation; multiple feature extraction and reconstruction operations are performed through a variational self-encoder, so that more accurate hidden features of drug molecules of the drug to be predicted and target protein molecules of target protein to be predicted can be obtained; screening the implicit characteristics of the drug molecules of the drug to be predicted and the target protein molecules of the target protein to be predicted based on an attention mechanism, so as to obtain more precise characteristic representation; the interaction relation between the drug molecules of the drug to be predicted and the target protein molecules of the target protein to be predicted is predicted through the multilayer perceptron, so that the prediction precision is improved, and the prediction of the interaction relation between the drug and the target protein can be better realized.
In addition, the above embodiments of the present invention are described in detail, and the principle and the implementation manner of the present invention should be described by using specific embodiments, and the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for predicting drug-target protein interactions, comprising:
acquiring drug molecule data of a drug to be predicted and target protein molecule data of a target protein to be predicted, and preprocessing the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted to obtain preprocessed drug molecule data and target protein molecule data;
extracting the characteristics of the preprocessed drug molecule data and the target protein molecule data to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules;
screening and splicing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules to obtain the characteristic relationship between the drug molecules and the target protein molecules;
predicting the interaction relationship between the drug molecule and the target protein molecule.
2. The method of predicting a drug molecule-target protein interaction of claim 1, wherein said pre-processing the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted comprises:
acquiring a drug molecular structure sequence of the drug to be predicted according to the drug molecular data of the drug to be predicted, wherein the drug molecular structure sequence of the drug to be predicted is compiled based on a simplified molecular linear input specification;
and performing digital vector conversion operation on the drug molecular structure sequence of the drug to be predicted based on the embedded layer to obtain the drug molecular structure sequence vector of the drug to be predicted.
3. The method of predicting drug molecule-target protein interaction of claim 2, wherein the pre-processing the drug molecule data of the drug to be predicted and the target protein molecule data of the target protein to be predicted further comprises:
obtaining a protein sequence of a target protein molecule of the target protein to be predicted according to the target protein molecule data of the target protein to be predicted;
and carrying out digital vector conversion operation on the protein sequence of the target protein molecule of the target protein to be predicted based on the embedded layer to obtain the target protein sequence vector of the target protein to be predicted.
4. The method of drug-target protein interaction prediction according to claim 1, wherein the performing feature extraction on the preprocessed drug molecule data and target protein molecule data comprises:
and carrying out feature extraction on the preprocessed medicine molecular data and the target protein molecular data based on a variational self-encoder.
5. The method of predicting a drug-target protein interaction of claim 4, wherein the performing feature extraction on the pre-processed drug molecular data and the target protein molecular data by the variational-based self-encoder comprises:
performing feature extraction on the preprocessed drug molecule data and target protein molecule data based on a first gated convolution network, a first random temporary discarding function, a first activating function, a second gated convolution network, a second random temporary discarding function, a second activating function and a third gated convolution network of an encoder of a variational self-encoder to obtain features of the drug molecules and features of the target protein molecules;
compressing the characteristics of the drug molecules and the characteristics of the target protein molecules based on the pooling layer, and performing Gaussian distribution calculation based on the first full-link layer and the second full-link layer to obtain the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules.
6. The method of drug-target protein interaction prediction according to claim 4, wherein the performing feature extraction on the preprocessed drug molecule data and the target protein molecule data based on the variational self-encoder further comprises:
reconstructing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules by a decoder based on a variational self-encoder to obtain original characteristic data of the drug molecule data and the target protein molecule data;
and performing feature extraction on the original feature data of the drug molecule data and the target protein molecule data by using an encoder based on a variational self-encoder to obtain more accurate latent features of the drug molecules and the target protein molecules.
7. The method of claim 6, wherein the step of reconstructing the implicit features of the drug molecule and the implicit features of the target protein molecule by the variational self-encoder-based decoder comprises:
converting the implicit characteristic of the drug molecule and the implicit characteristic of the target protein molecule into a new dimension based on a third full junction layer;
deconvoluting the implicit feature of the drug molecule in a new dimension and the implicit feature of the target protein molecule based on a first deconvolution layer, a third activation function, a second deconvolution layer, a fourth activation function, a third deconvolution layer and a fifth activation function, and outputting a deconvolution result of the implicit feature of the drug molecule in the new dimension and the implicit feature of the target protein molecule;
and performing dimension conversion processing on the deconvolution processing results of the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules with the new dimensions based on a fourth full-link layer to obtain original characteristic data of the drug molecule data and the target protein molecule data.
8. The method of predicting a drug-target protein interaction of claim 1, wherein said screening and splicing the latent features of the drug molecule and the latent features of the target protein molecule comprises:
and screening and splicing the implicit characteristics of the drug molecules and the implicit characteristics of the target protein molecules based on an attention mechanism to obtain more precise implicit characteristics of the drug molecules and the target protein molecules.
9. The method of predicting a drug-target protein interaction of claim 1, wherein predicting the interaction relationship between the drug molecule and the target protein molecule comprises:
and predicting the characteristic relation between the drug molecules and the target protein molecules based on a multilayer perceptron.
10. A drug-target protein interaction prediction system comprising a memory, a processor, and a computer program stored on the memory, wherein the processor executes the computer program to implement a drug-target protein interaction prediction method of any one of claims 1-9.
CN202211322181.0A 2022-10-26 2022-10-26 Method and system for predicting drug-target protein interaction Pending CN115910196A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211322181.0A CN115910196A (en) 2022-10-26 2022-10-26 Method and system for predicting drug-target protein interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211322181.0A CN115910196A (en) 2022-10-26 2022-10-26 Method and system for predicting drug-target protein interaction

Publications (1)

Publication Number Publication Date
CN115910196A true CN115910196A (en) 2023-04-04

Family

ID=86482596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211322181.0A Pending CN115910196A (en) 2022-10-26 2022-10-26 Method and system for predicting drug-target protein interaction

Country Status (1)

Country Link
CN (1) CN115910196A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246697A (en) * 2023-05-11 2023-06-09 上海微观纪元数字科技有限公司 Target protein prediction method and device for medicines, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689965A (en) * 2019-10-10 2020-01-14 电子科技大学 Drug target affinity prediction method based on deep learning
CN113160894A (en) * 2021-04-23 2021-07-23 平安科技(深圳)有限公司 Method, device, equipment and storage medium for predicting interaction between medicine and target
CN114783514A (en) * 2022-05-18 2022-07-22 上海天鹜科技有限公司 Method for predicting binding affinity of drug molecules and target protein
CN114974409A (en) * 2022-05-31 2022-08-30 浙江大学 Zero-sample learning-based drug virtual screening system for newly discovered target

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689965A (en) * 2019-10-10 2020-01-14 电子科技大学 Drug target affinity prediction method based on deep learning
CN113160894A (en) * 2021-04-23 2021-07-23 平安科技(深圳)有限公司 Method, device, equipment and storage medium for predicting interaction between medicine and target
CN114783514A (en) * 2022-05-18 2022-07-22 上海天鹜科技有限公司 Method for predicting binding affinity of drug molecules and target protein
CN114974409A (en) * 2022-05-31 2022-08-30 浙江大学 Zero-sample learning-based drug virtual screening system for newly discovered target

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUE ZHANG ETAL.: ""Drug-protein interaction prediction via variational autoencoders and attention mechanisms"", 《FRONTIERS》, pages 1 - 9 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116246697A (en) * 2023-05-11 2023-06-09 上海微观纪元数字科技有限公司 Target protein prediction method and device for medicines, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109918671B (en) Electronic medical record entity relation extraction method based on convolution cyclic neural network
US11657230B2 (en) Referring image segmentation
Park et al. Enhancing the interpretability of transcription factor binding site prediction using attention mechanism
WO2021169842A1 (en) Method and apparatus for updating data, electronic device, and computer readable storage medium
CN111782768B (en) Fine-grained entity identification method based on hyperbolic space representation and label text interaction
US11942075B2 (en) System and method for automated digital twin behavior modeling for multimodal conversations
CN111459491A (en) Code recommendation method based on tree neural network
CN114333852A (en) Multi-speaker voice and human voice separation method, terminal device and storage medium
Brendel et al. Application of deep learning on single-cell RNA sequencing data analysis: a review
CN111651573B (en) Intelligent customer service dialogue reply generation method and device and electronic equipment
CN111223532A (en) Method, apparatus, device, medium for determining a reactant of a target compound
CN116417093A (en) Drug target interaction prediction method combining transducer and graph neural network
CN115910196A (en) Method and system for predicting drug-target protein interaction
CN112507061A (en) Multi-relation medical knowledge extraction method, device, equipment and storage medium
CN113961736A (en) Method and device for generating image by text, computer equipment and storage medium
CN114999565A (en) Drug target affinity prediction method based on representation learning and graph neural network
CN114913938B (en) Small molecule generation method, equipment and medium based on pharmacophore model
Sabeti et al. Data discovery and anomaly detection using atypicality for real-valued data
CN114997174A (en) Intention recognition model training and voice intention recognition method, device and related equipment
Bhaskar et al. Molecular graph generation via geometric scattering
CN114428860A (en) Pre-hospital emergency case text recognition method and device, terminal and storage medium
CN114664391A (en) Molecular feature determination method, related device and equipment
CN116227597A (en) Biomedical knowledge extraction method, device, computer equipment and storage medium
CN116779060A (en) Drug design method based on autoregressive model
Zhang et al. Learning audio sequence representations for acoustic event classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination