CN115966249B - protein-ATP binding site prediction method and device based on fractional order neural network - Google Patents

protein-ATP binding site prediction method and device based on fractional order neural network Download PDF

Info

Publication number
CN115966249B
CN115966249B CN202310115169.0A CN202310115169A CN115966249B CN 115966249 B CN115966249 B CN 115966249B CN 202310115169 A CN202310115169 A CN 202310115169A CN 115966249 B CN115966249 B CN 115966249B
Authority
CN
China
Prior art keywords
protein
prediction
fractional
training set
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310115169.0A
Other languages
Chinese (zh)
Other versions
CN115966249A (en
Inventor
王艺舒
陈晓敏
郭梦瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202310115169.0A priority Critical patent/CN115966249B/en
Publication of CN115966249A publication Critical patent/CN115966249A/en
Application granted granted Critical
Publication of CN115966249B publication Critical patent/CN115966249B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a protein-ATP binding site prediction method and device based on fractional order neural network, and relates to the technical field of protein-ligand binding site prediction. Comprising the following steps: the required characteristics of the model are extracted from the digitized information of the protein and integrated into a feature matrix as input. Then, the parameter updating process of the back propagation process of the convolutional neural network is modified into fractional gradient iteration, and test data show that the prediction effect of the convolutional neural network modified by fractional gradient is superior to that of the prior machine learning and integer-order deep learning model. The combination of the deep learning method and the fractional differential provides a protein-ATP binding site prediction method, and the accuracy is improved. The invention focuses on adding the fractional order gradient defined by Caputo to the full connection layer of the single-start predictor, and improving the performance of the predictor on the premise of ensuring convergence and chain rule.

Description

protein-ATP binding site prediction method and device based on fractional order neural network
Technical Field
The invention relates to the technical field of protein-ligand binding site prediction, in particular to a protein-ATP binding site prediction method and device based on fractional order neural network.
Background
Protein is an important substance constituting life, and its research has not been stopped. Initially, the composition of proteins was an elusive problem, today, computer technology is rapidly evolving, scientists have determined more and more primary structures of proteins using computers and have built specialized databases for querying and using, for example, PDB protein databases [ h.m. Berman, j. Westbrook, z. Feng, g. Gillliland, t.n. Bhat, h. Weissig, i.n. Shindyalov, p.e. bourne (2000) The Protein Data Bank Nucleic Acids Research, 28:235-242 ]. However, other information about proteins, such as tertiary structure, and determination of binding sites for other substances is not an easy matter.
The prediction of protein-ligand interaction sites is of great significance for the determination of drug targeting interaction sites, and the determination of protein structures and binding sites with other compounds is of great significance for the promotion of drug action, the improvement of in vivo biochemical reaction rate and efficiency, such as enzymatic reaction, ATP binding and the like. Protein-ligand interactions are critical for various biological processes such as membrane transport, cell movement, muscle contraction, signal transduction, transcription and replication of DNA [ Liu Guixia, pei Zhiyao, song Jiazhi ] protein-ATP binding site prediction based on deep learning [ J ]. University of gilin (engineering edition), 2022, 52 (01): 187-194]. In the process of drug discovery, protein-ligand interaction is an important basis for determining a drug targeting action point, and has guiding significance for developing new drugs for cancers, diabetes, alzheimer disease and other diseases. Thus, accurate recognition of protein binding sites is of great importance both for protein functional annotation and for determination of targets for pharmaceutical action.
Among these ligands, ATP is called nucleoside triphosphates, which act as a small molecule compound that can function as a coenzyme in cells and also play an important role in various metabolic processes [ Hu Jun, li Yang, zhang Yang, etc. ATPbend: accuratein-ATP binding site prediction by combining sequence-profiling and structure-based comparisons [ J ]. Journal of Chemical Information & Modeling, 2018, 58:501-510 ]. ATP binding sites are important drug targets for antibacterial and anticancer chemotherapy. However, identification of protein ligand binding sites by wet laboratory experimental techniques is generally costly and time consuming, and by the 6 th 2019, 7055 proteins in the Protein Database (PDB) are labeled as ATP binding, accounting for about 4.62% of all recorded (4 Song Jiazhi, liang Yanchun, liu Guixia, etc. A Novel Prediction Method for ATP-Binding Sites From Protein Primary Sequences Based on Fusion of Deep Convolutional Neural Network and Ensemble Learning [ J ]. IEEE Access, 2020, 8:21485-21495), and the number of known ATP binding proteins is far from adequate in the face of large-scale protein sequences in the post-genomic era. At present, algorithms such as machine learning and the like rapidly develop, a method for determining a binding site on protein by a computer continuously appears, bioinformatics also continuously develops, however, the traditional calculation method has the problems of lower accuracy and high false positive rate of a prediction result [ Hong Jiajun ], and protein function prediction and drug target spot discovery research based on deep learning [ D ]. Hangzhou: university of Zhejiang, 2020]. To reveal the intrinsic mechanism of protein-ligand interactions, a large number of wet laboratory works have been performed with thousands of protein-ligand interaction structural complexes deposited in PDBs. However, identification of protein ligand binding sites by wet laboratory experimental techniques is often costly and time consuming. Because of the importance of protein-ligand interactions and the difficulty of experimentally identifying binding sites, developing efficient, automated computational methods to rapidly predict protein-ligand binding sites has become an increasingly important issue in bioinformatics. Especially when faced with large-scale protein sequences of the post-genomic era.
AI techniques, such as well known machine learning, deep learning, etc., can be used for the determination of protein-ligand interaction sites and greatly improve the experimental rate (compared to wet laboratories) and are good methods that can be selected and continue to be explored at present. The model is trained and checked by using a proper data set, so that the number of times and the experiment cost for carrying out wet experiments are greatly saved. However, these methods have problems that the prediction accuracy is poor, the misprediction rate is high, and how to improve the prediction accuracy and further reduce the time cost is a valuable problem.
In biomedicine, understanding the interaction of proteins with ATP facilitates protein functional annotation and drug development. Accurate identification of protein-ATP binding residues is an important but challenging task to gain knowledge of protein and ATP interactions, especially if only protein sequence information is provided. With the development of deep learning algorithms, convolutional Neural Networks (CNNs) have been widely used in a variety of bioinformatic fields. However, convolutional neural networks are usually implemented only by increasing the depth of convolutional layer stacks to improve classifier performance; on the other hand, even if the gradient algorithm in the convolutional neural network is an objective function, the traditional gradient descent algorithm cannot converge explosion to a real extreme point.
Disclosure of Invention
Aiming at the problems of low convergence rate, improved prediction effect, unbalanced data distribution and the like of a convolutional neural network model applied to the existing protein-ATP prediction problem in the prior art, the invention provides a protein-ATP binding site prediction method and device based on a fractional order neural network.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, a method for predicting a protein-ATP binding site based on fractional order neural networks is provided, the method being applied to an electronic device, comprising the steps of:
s1: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting characteristics of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the characteristics into a characteristic matrix;
s2: using the weighted cross entropy as a loss function of the prediction model, and based on the loss function, adjusting the prediction iterative algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iterative algorithm;
s3: constructing a fractional derivative based on the Caputo definition, and modifying the adjusted prediction iteration algorithm based on the fractional derivative;
S4: replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iteration algorithm to construct a new prediction model; inputting the feature matrix into a new prediction model, outputting a prediction result, and finishing the prediction of the protein-ATP binding site based on the fractional order neural network.
Alternatively, the training set is the untreated original protein sequence ATP-227.
Optionally, in S1, an initial prediction model is constructed, a training set is obtained based on a PDB protein database, features of target residues and residues adjacent to the target residues in the training set are collected through a sliding window technology, and the features are integrated into a feature matrix, including:
s11: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises target residues, and adjacent residues of the target residues are respectively arranged at the left side and the right side of the target residues;
s12: operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, and inputting a training set to obtain a PSSM matrix of the training set;
s13: acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
S14: performing One-hot coding on amino acids in a training set to obtain One-hot coding vectors of each amino acid; wherein, the coding mode is one-hot coding according to the dipole and the amino acid classification mode of the coil side chain;
s15: and extracting features of the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in a training set, and integrating the features into a feature matrix.
Optionally, in S2, using weighted cross entropy as a loss function of the prediction model, adjusting a prediction iteration algorithm of each amino acid class by giving different weights based on the loss function, to obtain an adjusted prediction iteration algorithm, including:
the cross entropy of the ith sample is defined as shown in the following equation (1):
Figure SMS_1
(1)
wherein ,
Figure SMS_2
, />
Figure SMS_3
if the ith sample belongs to the p-th class, then
Figure SMS_4
,/>
Figure SMS_5
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_6
(2)
wherein ,
Figure SMS_7
is used for the weight of various kinds of materials,
Figure SMS_8
a value encoded for One-hot; n represents the number of samples, +.>
Figure SMS_9
Optionally, in S3, the fractional derivative defined by Caputo is as follows formula (3):
Figure SMS_10
(3)
Wherein f (t) is an objective function, alpha is an order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure SMS_11
is gamma function, t 0 Is of initial value, f (m) Representing the m-order derivative of f, τ represents the time constant.
Optionally, modifying the adjusted prediction iteration algorithm based on the fractional derivative in step S3 includes:
the fractional gradient method is shown in the following formula (4):
Figure SMS_12
(4)
where μ is the iteration step or learning rate, K is the number of iterations,
Figure SMS_13
represents the x < th 0 Step iteration step length;
will be described in equation (4)
Figure SMS_14
Replaced by->
Figure SMS_15
Then a modified fractional gradient method of the following equation (5) is obtained:
Figure SMS_16
(5)
bringing the above equation (5) into equation (3) and simplifying the modified predictive iterative algorithm resulting in equation (6) as follows:
Figure SMS_17
(6)
the prediction iterative algorithm of the above formula (6) converges to a true extreme point x.
Optionally, in step S4, replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm, and constructing a new prediction model, including:
replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a fully connected layer of the convolutional neural network in the new prediction model, wherein a back propagation gradient of the fully connected layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient pass-through layers, the two types of gradient pass-through layers comprising: the transfer gradient connecting the nodes between the two layers, and updating the gradient.
In one aspect, there is provided a fractional neural network-based protein-ATP binding site prediction apparatus for use in an electronic device, the apparatus comprising:
the feature extraction module is used for constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting features of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the features into a feature matrix;
the function modification module is used for utilizing the weighted cross entropy as a loss function of the prediction model, and based on the loss function, adjusting the prediction iteration algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iteration algorithm;
the algorithm modification module is used for constructing a fractional derivative defined based on the Caputo and modifying the adjusted prediction iterative algorithm based on the fractional derivative;
the result output module is used for replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iteration algorithm to construct a new prediction model; inputting the feature matrix into a new prediction model, outputting a prediction result, and finishing the prediction of the protein-ATP binding site based on the fractional order neural network.
Alternatively, the training set is the untreated original protein sequence ATP-227.
Optionally, the feature extraction module is further configured to obtain a training set based on the PDB protein database, determine a sliding window size, and include a target residue in the sliding window, where adjacent residues of the target residue are respectively located on left and right sides of the target residue;
operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, and inputting a training set to obtain a PSSM matrix of the training set;
acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
performing One-hot coding on amino acids in a training set to obtain One-hot coding vectors of each amino acid; wherein, the coding mode is one-hot coding according to the dipole and the amino acid classification mode of the coil side chain;
and extracting features of the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in a training set, and integrating the features into a feature matrix.
In one aspect, an electronic device is provided that includes a processor and a memory having at least one instruction stored therein that is loaded and executed by the processor to implement a fractional mesh-based protein-ATP binding site prediction method as described above.
In one aspect, a computer-readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a fractional neural network-based protein-ATP binding site prediction method as described above is provided.
The technical scheme provided by the embodiment of the invention has at least the following beneficial effects:
in the scheme, a deep learning method and a fractional differential combination method are provided for predicting the protein-ATP binding site, and the accuracy is improved. The invention focuses on adding the fractional order gradient defined by Caputo to the full connection layer of the single-start predictor, and improving the performance of the predictor on the premise of ensuring convergence and chain rule.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for predicting protein-ATP binding sites based on fractional order neural networks according to an embodiment of the invention;
FIG. 2 is a flow chart of a method for predicting protein-ATP binding sites based on fractional order neural networks according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a forward propagation algorithm of a protein-ATP binding site prediction method based on fractional order neural networks according to an embodiment of the present invention;
FIG. 4 is a graph showing an update procedure of a protein-ATP binding site prediction method based on fractional order neural networks according to an embodiment of the present invention;
FIG. 5 is a graph of predicted outcome of a fractional neural network-based protein-ATP binding site prediction method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a protein-ATP binding site predicting device based on fractional order neural networks according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Description of the embodiments
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a protein-ATP binding site prediction method based on fractional order neural networks, which can be realized by electronic equipment, wherein the electronic equipment can be a terminal or a server. A flowchart of a method for combining multi-scale convolution with self-attention encoding fractional order neural network-based protein-ATP binding site prediction as shown in fig. 1, the process flow of the method may include the steps of:
S101: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting characteristics of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the characteristics into a characteristic matrix;
s102: using the weighted cross entropy as a loss function of the prediction model, and based on the loss function, adjusting the prediction iterative algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iterative algorithm;
s103: constructing a fractional derivative based on the Caputo definition, and modifying the adjusted prediction iteration algorithm based on the fractional derivative;
s104: replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iteration algorithm to construct a new prediction model; inputting the feature matrix into a new prediction model, outputting a prediction result, and finishing the prediction of the protein-ATP binding site based on the fractional order neural network.
Alternatively, the training set is the untreated original protein sequence ATP-227.
Optionally, in S101, an initial prediction model is constructed, a training set is obtained based on a PDB protein database, features of target residues and residues adjacent to the target residues in the training set are collected through a sliding window technology, and the features are integrated into a feature matrix, including:
S111: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises target residues, and adjacent residues of the target residues are respectively arranged at the left side and the right side of the target residues;
s112: operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, and inputting a training set to obtain a PSSM matrix of the training set;
s113: acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
s114: performing One-hot coding on amino acids in a training set to obtain One-hot coding vectors of each amino acid; wherein, the coding mode is one-hot coding according to the dipole and the amino acid classification mode of the coil side chain;
s115: and extracting features of the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in a training set, and integrating the features into a feature matrix.
Optionally, in S102, using weighted cross entropy as a loss function of the prediction model, based on the loss function, adjusting a prediction iteration algorithm of each amino acid class by giving different weights, to obtain an adjusted prediction iteration algorithm, including:
The cross entropy of the ith sample is defined as shown in the following equation (1):
Figure SMS_18
(1)
wherein ,
Figure SMS_19
, />
Figure SMS_20
if the ith sample belongs to the p-th class, then
Figure SMS_21
,/>
Figure SMS_22
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_23
(2)
wherein ,
Figure SMS_24
is used for the weight of various kinds of materials,
Figure SMS_25
a value encoded for One-hot; n represents the number of samples and,
Figure SMS_26
optionally, in S103, the fractional derivative defined by Caputo is as follows formula (3):
Figure SMS_27
(3)
wherein f (t) is an objective function, alpha is an order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure SMS_28
is gamma function, t 0 Is of initial value, f (m) Representing the m-order derivative of f, τ represents the time constant.
Optionally, modifying the adjusted prediction iteration algorithm based on the fractional derivative in step S103 includes:
the fractional gradient method is shown in the following formula (4):
Figure SMS_29
(4)
wherein mu is iteration step length or learning rate, and K is iteration times;
will be described in equation (4)
Figure SMS_30
Replaced by->
Figure SMS_31
Then a modified fractional gradient method of the following equation (5) is obtained:
Figure SMS_32
(5)
bringing the above equation (5) into equation (3) and simplifying the modified predictive iterative algorithm resulting in equation (6) as follows:
Figure SMS_33
(6)
the prediction iterative algorithm of the above formula (6) converges to a true extreme point x.
Optionally, in step S104, a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model is replaced by a modified prediction iterative algorithm, so as to construct a new prediction model, which includes:
replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a fully connected layer of the convolutional neural network in the new prediction model, wherein a back propagation gradient of the fully connected layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient pass-through layers, the two types of gradient pass-through layers comprising: the transfer gradient connecting the nodes between the two layers, and updating the gradient.
In the embodiment of the invention, a protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differential, and the accuracy is improved. Firstly, data sets ATP-227 and ATP-14 are selected as training sets and test sets, and the required characteristics of the model are extracted from the digitized information of the protein and integrated into a characteristic matrix to be used as input. Then, the parameter updating process of the back propagation process of the convolutional neural network is modified into fractional gradient iteration, and test data show that the prediction effect of the convolutional neural network modified by fractional gradient is superior to that of the prior machine learning and integer-order deep learning model. The invention focuses on adding the fractional order gradient defined by Caputo to the full connection layer of the single-start predictor, and improving the performance of the predictor on the premise of ensuring convergence and chain rule.
The embodiment of the invention provides a protein-ATP binding site prediction method based on fractional order neural networks,
the method may be implemented by an electronic device, which may be a terminal or a server. A flowchart of a method for combining multi-scale convolution with self-attention encoding fractional order neural network-based protein-ATP binding site prediction as shown in fig. 2, the process flow of the method may include the steps of:
s201: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises target residues, and adjacent residues of the target residues are respectively arranged at the left side and the right side of the target residues;
in one possible embodiment, the training set is the untreated original protein sequence ATP-227. The present invention utilizes two commonly used classical data sets in protein-ATP binding site prediction, in which the untreated original protein sequence is selected: ATP-227 and ATP-14. ATP-227 is 227 protein chains bound to ATP, which were published in the PDB protein database 3/10/2010. Together, these 227 chains contain 3393 ATP-binding residues, and 80409 non-ATP-binding residues. Meanwhile, 14 protein chains are selected from ATP-17 (the other three protein sequences cannot find the corresponding fasta file in the PDB database according to the protein ID), and the protein chains are named as ATP-14, and as an independent test set, the similarity of any one chain of ATP-14 and ATP-227 is ensured to be less than 41 percent. Fasta sequence files of the dataset were downloaded in bulk from the PDB protein database, ATP-227 as training set and ATP-14 as test set.
In one possible embodiment, the number of amino acids in each protein sequence is high, the ratio of unbound and bound residues is high, and studies have shown that the binding properties of the target residue are affected by its neighboring residues, and thus the sliding window technique is used to collect the characteristics of the target residue and its neighboring residues. The sliding window of size L contains the target residue and features (L-1)/2 adjacent residues on the left and right sides of the target residue, respectively. In this embodiment, l=15 is finally selected by comparing the performance of different window sizes. Namely, the value of one sliding window is as follows: 000000010000000.
s202: and running psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, inputting a training set, and obtaining a PSSM matrix of the training set.
In a possible implementation, the PSSM matrix further includes other information, and in this embodiment, only the first 20 columns may be truncated.
S203: and obtaining a protein secondary structure in the training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector.
In a possible embodiment, the present invention selects 3-state secondary structure representations, i.e., helix (C), helix (H) and strand (E), for protein secondary structure, run in blast environment using psippred 4.02. Solvent accessibility was obtained using ASAquick. The extraction of the above three features is based on fasta sequence files.
S204: performing One-hot coding on amino acids in a training set to obtain One-hot coding vectors of each amino acid; wherein the coding mode is one-hot coding according to the amino acid classification modes of dipoles and coil side chains.
In One possible embodiment, there are a number of classifications of amino acids for One-hot encoding, which are encoded herein according to dipoles and roll side chains, each amino acid being represented by a vector of 1*7, e.g., alanine (Ala) belonging to the first class, then its One-hot encoding is [0,0,0,0,0,0,1] and tyrosine (Tyr) belonging to the fourth class, and its One-hot encoding is [0,0,0,1,0,0,0].
S205: and extracting features of the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain a target residue in a training set and features of the target residue, and integrating the features into a feature matrix.
In a possible implementation, the feature extraction is performed through a sliding window, so that in this embodiment, a PSSM matrix of 15×20, a protein secondary structure vector of 15×3, a solvent accessibility vector of 15×1, and an One-hot encoding vector of 15×7 are obtained. In this embodiment, the data sets ATP-227 and ATP-14 are used as training and testing sets, and the required features of the model are extracted from the digitized information of the protein and integrated into a feature matrix as input of a new predictive model.
S206: using weighted cross entropy as a loss function of a prediction model, and based on the loss function, adjusting a prediction iteration algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iteration algorithm;
in a possible embodiment, the invention employs modifying the loss function to solve the data imbalance problem, i.e., cross entropy. Using weighted cross entropy as a loss function, adjusting the predictions for each class by assigning different weights, including:
the cross entropy of the ith sample is defined as shown in the following equation (1):
Figure SMS_34
(1)
wherein ,
Figure SMS_35
if the ith sample belongs to the p-th class, & gt>
Figure SMS_36
,/>
Figure SMS_37
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_38
(2)/>
wherein ,
Figure SMS_39
is used for the weight of various kinds of materials,
Figure SMS_40
is the One-hot encoded value.
In a possible embodiment, the invention uses weighted cross entropy as a loss function, and adjusts the prediction of each class by giving different weights, so as to solve the unbalanced learning problem. Class weights are calculated by Scikit-learn, balanced class weights are determined by the formula (number):
Figure SMS_41
wherein ,
Figure SMS_42
representing the number of samples in each class, +.>
Figure SMS_43
Represents the number of categories, here +. >
Figure SMS_44
Bincount (y) is a function of the numpy library in python, and gives the number of occurrences in y of each element in y. We choose a threshold that maximizes the MCC value.
S207: constructing fractional derivatives defined based on Caputo, and modifying the adjusted prediction iteration algorithm based on the fractional derivatives;
in a possible embodiment, the invention chooses to study the fractional gradient under this definition, since the fractional derivative of the Caputo definition has very good properties, i.e. the derivative of the constant is equal to 0.
The fractional derivative defined by Caputo is given by the following equation (3):
Figure SMS_45
(3)
wherein f (t) is an objective function, alpha is an order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure SMS_46
is gamma function, t 0 Is of initial value, f (m) Representing the m-order derivative of f, τ represents the time constant.
In a possible implementation, let f (x) be the smooth convex function, x be the unique extremum point of f (x), each iteration step of the conventional integer-order gradient method is:
Figure SMS_47
where μ is the iteration step or learning rate, K is the number of iterations,
Figure SMS_48
represents the x < th 0 Step iteration step size. The fractional order gradient method can be written as:
Figure SMS_49
(4)
in a possible embodiment, if fractional derivatives are directly applied, the fractional step method described above cannot converge to the true extreme point x of f (x), but only to an extreme point under the definition of the caliuto fractional derivative, which is the same as the initial value x 0 Is related to the order, and in most cases is not equal to x.
To ensure that the algorithm converges to the true extreme point, another fractional step method is considered in the subsequent iteration process, i.e., x0 is replaced with xk-1: will be described in equation (4)
Figure SMS_50
Replaced by->
Figure SMS_51
Then a modified fractional gradient method of the following equation (5) is obtained:
Figure SMS_52
(5)
wherein 0 < alpha < 1.
Bringing the above equation (5) into equation (3) yields:
Figure SMS_53
when only the first term is retained and its absolute value is introduced, the fractional gradient method of 0 < α < 2 is reduced to: resulting in a modified iterative algorithm of the following equation (6):
Figure SMS_54
(6)
the iterative algorithm of the above formula (6) converges to a true extreme point x.
S208: replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iteration algorithm to construct a new prediction model; inputting the feature matrix into a new prediction model, outputting a prediction result, and finishing the prediction of the protein-ATP binding site based on the fractional order neural network.
In one possible embodiment, a fully connected layer of the convolutional neural network is constructed, wherein the back propagation gradient of the fully connected layer employs a mixture of fractional and integer orders to ensure that the chain law is established. Two types of gradient pass-through layers are provided, one is a transfer gradient connecting nodes between the two layers, and the other is an update gradient for in-layer parameters.
In one possible implementation, a schematic diagram of the forward propagation algorithm is shown in FIG. 3
Figure SMS_55
Indicate->
Figure SMS_56
The individual node is at the first
Figure SMS_57
Layer output:
Figure SMS_58
here, the
Figure SMS_59
Representation->
Figure SMS_60
Layer weight(s)>
Figure SMS_61
Indicating the degree of deviation->
Figure SMS_62
Representing the output of the previous layer, and the function +.>
Figure SMS_63
I.e. as an activation function.
To ensure that the chain law holds true, the propagation gradient is still an integer gradient:
Figure SMS_64
but in updating the gradient we use fractional step updates:
Figure SMS_65
the update process is shown in fig. 4.
In a possible implementation manner, ATP-17 is used as a test set to test the model, the model is output as a one-dimensional prediction probability matrix of each site of a protein sequence, and according to the standard of the maximum MCC, a threshold value is set to be 0.80, namely when the prediction probability of a site is greater than 0.8, the site is judged to be a binding site, and the binding site is expressed by 1, and otherwise, the binding site is expressed by 0. We performed 15 repeated experiments on the test set, and selected the correlation coefficients (MCC) of accuracy (Acc), sensitivity (Sen), specificity (Spe) and Ma Xiusi as evaluation indexes, and compared the traditional convolutional neural network, and the average values of the multiple experiments were taken as the following table:
table 1 evaluation index table
Figure SMS_66
Then, in comparison with the predictors of several protein-ATP binding sites that perform well in the prior art, nsitePred, targetATPsite, targetS and ATPseq, respectively, the predictions on ATP-17 are shown in the following Table:
Table 2 comparison of the results of the existing predictor with the predictor of the present invention
Figure SMS_67
The predicted result of the protein 2YAA sequence is shown in FIG. 5. The invention can accurately predict the binding site.
In the embodiment of the invention, a protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differential, and the accuracy is improved. Firstly, data sets ATP-227 and ATP-14 are selected as training sets and test sets, and the required characteristics of the model are extracted from the digitized information of the protein and integrated into a characteristic matrix to be used as input. Then, the parameter updating process of the back propagation process of the convolutional neural network is modified into fractional gradient iteration, and test data show that the prediction effect of the convolutional neural network modified by fractional gradient is superior to that of the prior machine learning and integer-order deep learning model. The invention focuses on adding the fractional order gradient defined by Caputo to the full connection layer of the single-start predictor, and improving the performance of the predictor on the premise of ensuring convergence and chain rule.
FIG. 6 is a block diagram illustrating a fractional neural network based protein-ATP binding site predicting device according to an example embodiment. Referring to fig. 6, the apparatus 300 includes:
The feature extraction module 310 is configured to construct an initial prediction model, acquire a training set based on a PDB protein database, collect features of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrate the features into a feature matrix;
the function modification module 320 is configured to use the weighted cross entropy as a loss function of the prediction model, and adjust a prediction iteration algorithm of each amino acid type by giving different weights based on the loss function, so as to obtain an adjusted prediction iteration algorithm;
the algorithm modification module 330 is configured to construct a fractional derivative defined based on the Caputo, and modify the adjusted prediction iterative algorithm based on the fractional derivative;
the result output module 340 is configured to replace a parameter update process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; inputting the feature matrix into a new prediction model, outputting a prediction result, and finishing the prediction of the protein-ATP binding site based on the fractional order neural network.
Alternatively, the training set is the untreated original protein sequence ATP-227.
Optionally, the feature extraction module 310 is further configured to obtain a training set based on the PDB protein database, and determine a sliding window size; the sliding window comprises target residues, and adjacent residues of the target residues are respectively arranged at the left side and the right side of the target residues;
Operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, and inputting a training set to obtain a PSSM matrix of the training set;
acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
performing One-hot coding on amino acids in a training set to obtain One-hot coding vectors of each amino acid; wherein, the coding mode is one-hot coding according to the dipole and the amino acid classification mode of the coil side chain;
and extracting features of the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in a training set, and integrating the features into a feature matrix.
Optionally, the function modifying module 320 is configured to define the cross entropy of the ith sample as shown in the following formula (1):
Figure SMS_68
(1)
wherein ,
Figure SMS_69
if the ith sampleThe group p is->
Figure SMS_70
,/>
Figure SMS_71
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_72
(2)
wherein ,
Figure SMS_73
is used for the weight of various kinds of materials,
Figure SMS_74
Is the One-hot encoded value.
Optionally, the algorithm modification module 330 is configured to apply the fractional derivative defined by Caputo as shown in the following formula (3):
Figure SMS_75
(3)
wherein f (t) is an objective function, alpha is an order, m-1 < alpha < m, m is a positive integer,
Figure SMS_76
is gamma function, t 0 Is the initial value.
Optionally, the algorithm modification module 330 is configured to modify the iterative algorithm to converge the iterative algorithm to a true extremum point, and includes:
the fractional gradient method is shown in the following formula (4):
Figure SMS_77
(4)
wherein mu is iteration step length or learning rate, K is iteration timeThe number of the product is the number,
Figure SMS_78
represents the x < th 0 Step iteration step length;
will be described in equation (4)
Figure SMS_79
Replaced by->
Figure SMS_80
Then a modified fractional gradient method of the following equation (5) is obtained:
Figure SMS_81
(5)
the modified iterative algorithm of the following formula (6) is simplified by taking the above formula (5) into formula (3).
Figure SMS_82
(6)
The iterative algorithm of the above formula (6) converges to a true extreme point x.
Optionally, the result output module 340 is configured to replace a parameter update process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm, and construct a fully connected layer of the convolutional neural network in the new prediction model, where a back propagation gradient of the fully connected layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient pass-through layers, the two types of gradient pass-through layers comprising: the transfer gradient connecting the nodes between the two layers, and updating the gradient.
In the embodiment of the invention, a protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differential, and the accuracy is improved. Firstly, data sets ATP-227 and ATP-14 are selected as training sets and test sets, and the required characteristics of the model are extracted from the digitized information of the protein and integrated into a characteristic matrix to be used as input. Then, the parameter updating process of the back propagation process of the convolutional neural network is modified into fractional gradient iteration, and test data show that the prediction effect of the convolutional neural network modified by fractional gradient is superior to that of the prior machine learning and integer-order deep learning model. The invention focuses on adding the fractional order gradient defined by Caputo to the full connection layer of the single-start predictor, and improving the performance of the predictor on the premise of ensuring convergence and chain rule.
Fig. 7 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present invention, where the electronic device 400 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 401 and one or more memories 402, where at least one instruction is stored in the memories 402, and the at least one instruction is loaded and executed by the processors 401 to implement the following steps of a fractional-order neural network-based protein-ATP binding site prediction method:
S1: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting characteristics of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the characteristics into a characteristic matrix;
s2: using the weighted cross entropy as a loss function of the prediction model, and based on the loss function, adjusting the prediction iterative algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iterative algorithm;
s3: constructing a fractional derivative based on the Caputo definition, and modifying the adjusted prediction iteration algorithm based on the fractional derivative;
s4: replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iteration algorithm to construct a new prediction model; inputting the feature matrix into a new prediction model, outputting a prediction result, and finishing the prediction of the protein-ATP binding site based on the fractional order neural network.
In an exemplary embodiment, a computer readable storage medium, such as a memory comprising instructions executable by a processor in a terminal to perform the above-described fractional-order-neural-net-based protein-ATP binding site prediction method, is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (2)

1. A fractional neural network-based protein-ATP binding site prediction method, the method steps comprising:
s1: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting characteristics of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the characteristics into a characteristic matrix;
s2: using weighted cross entropy as a loss function of the initial prediction model, and based on the loss function, adjusting a prediction iteration algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iteration algorithm;
S3: constructing fractional derivatives defined based on Caputo, and modifying the adjusted prediction iteration algorithm based on the fractional derivatives;
s4: replacing a parameter updating process of a backward propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; inputting the feature matrix into the new prediction model, outputting a prediction result, and finishing protein-ATP binding site prediction based on a fractional order neural network;
in the step S1, the training set is an unprocessed original protein sequence ATP-227;
in the step S1, an initial prediction model is constructed, a training set is obtained based on a PDB protein database, features of target residues and adjacent residues of the target residues in the training set are collected through a sliding window technology, and the features are integrated into a feature matrix, wherein the method comprises the following steps:
s11: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises target residues, and adjacent residues of the target residues are respectively arranged at the left side and the right side of the target residues;
s12: operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, and inputting a training set to obtain a PSSM matrix of the training set;
S13: acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
s14: carrying out one-hot coding on amino acids in a training set to obtain one-hot coding vectors of each amino acid; wherein, the coding mode is one-hot coding according to the dipole and the amino acid classification mode of the coil side chain;
s15: performing feature extraction on a PSSM matrix, a protein secondary structure vector and a one-hot coding vector of each amino acid through a sliding window to obtain features of target residues in a training set and adjacent residues of the target residues, and integrating the features into a feature matrix;
in the step S2, a weighted cross entropy is used as a loss function of the initial prediction model, and based on the loss function, a prediction iteration algorithm of each amino acid type is adjusted by giving different weights, so as to obtain an adjusted prediction iteration algorithm, which includes:
the cross entropy of the ith sample is defined as shown in the following equation (1):
Figure QLYQS_1
(1)
wherein ,
Figure QLYQS_2
represents cross entropy->
Figure QLYQS_3
j=1, 2, …, if the ith sample belongs to the p-th class, then
Figure QLYQS_4
jp,/>
Figure QLYQS_5
Representing the prediction probability that the ith sample belongs to the p-th class;
The weighted cross entropy is defined as shown in the following equation (2):
Figure QLYQS_6
(2)/>
wherein ,
Figure QLYQS_7
weights of various kinds->
Figure QLYQS_8
j=1, 2, … 7, one-hot encoded value; n represents the number of samples, +.>
Figure QLYQS_9
Representing the predicted probability of the ith sample;
in S3, the fractional derivative defined by Caputo is represented by the following formula (3):
Figure QLYQS_10
(3)
wherein ,f(t)as an objective function, alpha is an order, 0<α<1,m-1<α<m is a positive integer,
Figure QLYQS_11
is gamma function, t 0 Is of initial value, is->
Figure QLYQS_12
Representation pairfSolving m-order derivative, wherein τ represents a time constant;
in step S3, modifying the adjusted prediction iteration algorithm based on the fractional derivative includes:
the fractional gradient method is shown in the following formula (4):
Figure QLYQS_13
(4)
wherein mu is the iteration step length, K is the iteration number,
Figure QLYQS_14
represent the firstx 0 Step iteration step length;
will be described in equation (4)
Figure QLYQS_15
Replaced by->
Figure QLYQS_16
Then a modified fractional gradient method of the following equation (5) is obtained:
Figure QLYQS_17
(5)
bringing the above equation (5) into equation (3) and simplifying the modified predictive iterative algorithm resulting in equation (6) as follows:
Figure QLYQS_18
(6)
the prediction iterative algorithm of the formula (6) converges to a true extreme point x;
in the step S4, a parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model is replaced by a modified prediction iterative algorithm, and a new prediction model is constructed, including:
Replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a fully connected layer of the convolutional neural network in the new prediction model, wherein a back propagation gradient of the fully connected layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient pass-through layers, the two types of gradient pass-through layers comprising: the transfer gradient connecting the nodes between the two layers, and updating the gradient.
2. A fractional neural network-based protein-ATP binding site prediction device, wherein the device is applied to the method of claim 1, the device comprising:
the feature extraction module is used for constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting features of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the features into a feature matrix;
the function modification module is used for utilizing the weighted cross entropy as a loss function of the initial prediction model, and based on the loss function, the prediction iteration algorithm of each amino acid type is adjusted by giving different weights, so that an adjusted prediction iteration algorithm is obtained;
The algorithm modification module is used for constructing fractional derivatives defined based on Caputo and modifying the adjusted prediction iterative algorithm based on the fractional derivatives;
the result output module is used for replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; inputting the feature matrix into the new prediction model, outputting a prediction result, and finishing protein-ATP binding site prediction based on a fractional order neural network;
the training set is unprocessed original protein sequence ATP-227;
the feature extraction module is further used for acquiring a training set based on a PDB protein database, determining the size of a sliding window, and if the sliding window contains target residues, adjacent residues of the target residues are respectively arranged on the left side and the right side of the target residues;
operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, and inputting a training set to obtain a PSSM matrix of the training set;
acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
Carrying out one-hot coding on amino acids in a training set to obtain one-hot coding vectors of each amino acid; wherein, the coding mode is one-hot coding according to the dipole and the amino acid classification mode of the coil side chain;
performing feature extraction on a PSSM matrix, a protein secondary structure vector and a one-hot coding vector of each amino acid through a sliding window to obtain a target residue in a training set and features of the target residue, and integrating the features into a feature matrix;
the function modifying module is further configured to define the cross entropy of the ith sample as shown in the following formula (1):
Figure QLYQS_19
(1)
wherein ,
Figure QLYQS_20
represents cross entropy->
Figure QLYQS_21
j=1, 2, …, if the ith sample belongs to the p-th class, then
Figure QLYQS_22
,/>
Figure QLYQS_23
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure QLYQS_24
(2)
wherein ,
Figure QLYQS_25
weights of various kinds->
Figure QLYQS_26
j=1, 2, … 7, one-hot encoded value; n represents the number of samples, +.>
Figure QLYQS_27
Representing the predicted probability of the ith sample;
the algorithm modification module is also used for the fractional derivative defined by Caputo as shown in the following formula (3):
Figure QLYQS_28
(3)
wherein ,f(t)as an objective function, alpha is an order, 0<α<1,m-1<α<m is a positive integer,
Figure QLYQS_29
is gamma function, t 0 Is of initial value, is->
Figure QLYQS_30
Representation pair fSolving m-order derivative, wherein τ represents a time constant;
the algorithm modification module is further configured to modify an iterative algorithm to converge the iterative algorithm to a true extremum point, and includes:
the fractional gradient method is shown in the following formula (4):
Figure QLYQS_31
(4)
wherein mu is the iteration step length, K is the iteration number,
Figure QLYQS_32
represent the firstx 0 Step iteration step length;
will be described in equation (4)
Figure QLYQS_33
Replaced by->
Figure QLYQS_34
Then a modified fractional gradient method of the following equation (5) is obtained:
Figure QLYQS_35
(5)
the modified iterative algorithm of the following formula (6) is simplified by taking the above formula (5) into formula (3).
Figure QLYQS_36
(6)/>
The iterative algorithm of the formula (6) converges to a true extreme point x;
the result output module is also used for replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a full-connection layer of the convolutional neural network in the new prediction model, wherein the back propagation gradient of the full-connection layer adopts the mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient pass-through layers, the two types of gradient pass-through layers comprising: the transfer gradient connecting the nodes between the two layers, and updating the gradient.
CN202310115169.0A 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network Active CN115966249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310115169.0A CN115966249B (en) 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310115169.0A CN115966249B (en) 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network

Publications (2)

Publication Number Publication Date
CN115966249A CN115966249A (en) 2023-04-14
CN115966249B true CN115966249B (en) 2023-05-26

Family

ID=85888059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310115169.0A Active CN115966249B (en) 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network

Country Status (1)

Country Link
CN (1) CN115966249B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020167667A1 (en) * 2019-02-11 2020-08-20 Flagship Pioneering Innovations Vi, Llc Machine learning guided polypeptide analysis
CN112214222A (en) * 2020-10-27 2021-01-12 华中科技大学 Sequential structure for realizing feedforward neural network in COStream and compiling method thereof
CN114882945A (en) * 2022-07-11 2022-08-09 鲁东大学 Ensemble learning-based RNA-protein binding site prediction method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110970090B (en) * 2019-11-18 2021-06-29 华中科技大学 Method for judging similarity between polypeptide to be processed and positive data set peptide fragment
US20210174903A1 (en) * 2019-12-10 2021-06-10 Protein Evolution Inc. Enhanced protein structure prediction using protein homolog discovery and constrained distograms
CN112767997B (en) * 2021-02-04 2023-04-25 齐鲁工业大学 Protein secondary structure prediction method based on multi-scale convolution attention neural network
CN113593631B (en) * 2021-08-09 2022-11-29 山东大学 Method and system for predicting protein-polypeptide binding site

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020167667A1 (en) * 2019-02-11 2020-08-20 Flagship Pioneering Innovations Vi, Llc Machine learning guided polypeptide analysis
CN112214222A (en) * 2020-10-27 2021-01-12 华中科技大学 Sequential structure for realizing feedforward neural network in COStream and compiling method thereof
CN114882945A (en) * 2022-07-11 2022-08-09 鲁东大学 Ensemble learning-based RNA-protein binding site prediction method

Also Published As

Publication number Publication date
CN115966249A (en) 2023-04-14

Similar Documents

Publication Publication Date Title
Vanhaelen et al. Design of efficient computational workflows for in silico drug repurposing
Zhu et al. DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines
Berger et al. Computational solutions for omics data
Raza Application of data mining in bioinformatics
Malebary et al. ProtoPred: advancing oncological research through identification of proto-oncogene proteins
Wang et al. Protein‐protein interaction networks as miners of biological discovery
Yu et al. Constructing query-driven dynamic machine learning model with application to protein-ligand binding sites prediction
Gao et al. Ens‐PPI: A Novel Ensemble Classifier for Predicting the Interactions of Proteins Using Autocovariance Transformation from PSSM
Kusuma et al. Prediction of ATP-binding sites in membrane proteins using a two-dimensional convolutional neural network
Sriwastava et al. Predicting protein-protein interaction sites with a novel membership based fuzzy SVM classifier
Li et al. A mouse protein interactome through combined literature mining with multiple sources of interaction evidence
Clancy et al. From proteomes to complexomes in the era of systems biology
Yu et al. The applications of deep learning algorithms on in silico druggable proteins identification
Hou et al. An overview of bioinformatics methods for modeling biological pathways in yeast
US20020072887A1 (en) Interaction fingerprint annotations from protein structure models
Vyas et al. Application of genetic programming (GP) formalism for building disease predictive models from protein-protein interactions (PPI) data
Hu et al. TargetDBP+: enhancing the performance of identifying DNA-binding proteins via weighted convolutional features
KR20180017827A (en) Method and System of Predicting protein-binding regions in RNA Using Nucleotide Profiles and Compositions
Kanchan et al. Integrative omics data mining: Challenges and opportunities
Jiang et al. Protein-protein interaction sites prediction using batch normalization based CNNs and oversampling method borderline-SMOTE
Niu et al. Deep learning framework for integrating multibatch calibration, classification, and pathway activities
Xia et al. Leveraging scaffold information to predict protein–ligand binding affinity with an empirical graph neural network
Herman et al. Selection of organisms for the co-evolution-based study of protein interactions
Xiong et al. Survey of computational approaches for prediction of DNA-binding residues on protein surfaces
Sun et al. Tetramer protein complex interface residue pairs prediction with LSTM combined with graph representations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant