CN115966249A - Fractional order neural network-based protein-ATP binding site prediction method and device - Google Patents

Fractional order neural network-based protein-ATP binding site prediction method and device Download PDF

Info

Publication number
CN115966249A
CN115966249A CN202310115169.0A CN202310115169A CN115966249A CN 115966249 A CN115966249 A CN 115966249A CN 202310115169 A CN202310115169 A CN 202310115169A CN 115966249 A CN115966249 A CN 115966249A
Authority
CN
China
Prior art keywords
protein
prediction
training set
neural network
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310115169.0A
Other languages
Chinese (zh)
Other versions
CN115966249B (en
Inventor
王艺舒
陈晓敏
郭梦瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN202310115169.0A priority Critical patent/CN115966249B/en
Publication of CN115966249A publication Critical patent/CN115966249A/en
Application granted granted Critical
Publication of CN115966249B publication Critical patent/CN115966249B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a fractional order neural network-based protein-ATP binding site prediction method and device, and relates to the technical field of protein-ligand binding site prediction. The method comprises the following steps: the features required by the model are extracted from the digitized information of the protein and integrated into a feature matrix as input. And then, a parameter updating process of a back propagation process of the convolutional neural network is modified into fractional order gradient iteration by selecting the convolutional neural network, and test data shows that the prediction effect of the convolutional neural network modified by the fractional order is superior to that of the existing machine learning and integer order deep learning models. A protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differentiation, and the accuracy is improved. The invention is characterized in that the fractional order gradient defined by Caputo is added to the full-link layer of the single-start predictor, and the performance of the predictor is improved on the premise of ensuring convergence and chain rule.

Description

Fractional order neural network-based protein-ATP binding site prediction method and device
Technical Field
The invention relates to the technical field of prediction of protein-ligand binding sites, in particular to a fractional order neural network-based protein-ATP binding site prediction method and device.
Background
Protein has not been studied as an important substance constituting life without stopping. Initially, protein composition was a elusive problem, and today, with The rapid development of computer technology, scientists used computers to determine The primary structure of more and more proteins and to build specialized databases for querying and using, for example, PDB Protein databases [ h.m. Berman, j. Westbrook, z. Feng, g. Gillliland, t.n. Bhat, h. Weissig, i.n. Shindyalov, p.e. bourne, (2000) The Protein Data Bank Nucleic Acids Research, 28: 235-242 ]. However, the determination of other information about the protein, such as tertiary structure, and binding sites for other substances is not easy.
The prediction of protein-ligand interaction sites has important significance for determining drug target action sites, and the determination of protein structures and binding sites with other compounds has promotion significance for exerting drug effects and improving the rate and efficiency of in vivo biochemical reactions, such as enzymatic reactions, ATP binding and the like. Protein-ligand interactions are critical for various biological processes, such as membrane trafficking, cell motility, muscle contraction, signal transduction, transcription and replication of DNA [ Liugui nephelia, bright jelly, songzhi. 187-194]. In the process of drug discovery, the protein-ligand interaction is an important basis for determining the target action point of the drug, and has guiding significance for the research and development of new drugs for diseases such as cancer, diabetes, alzheimer disease and the like. Therefore, accurate identification of protein binding sites is of great importance for functional annotation of proteins and for the determination of targets for drug action.
Among these ligands, ATP is called nucleoside triphosphate, and is a small molecule compound that can function as a coenzyme in cells and also plays an important role in various metabolic processes [ Hu Jun, li Yang, zhang Yang, etc. [ Hu protein-ATP binding site prediction by combination sequence-profiling and structure-based compositions [ J ]. Journal of Chemical Information & Modeling, 2018, 58: 501-510 ]. ATP binding sites are important drug targets for antibacterial and anticancer chemotherapy. However, identification of Protein ligand Binding Sites by wet laboratory experimental techniques is often costly and time consuming, as of 6 months 2019, 7055 proteins in the Protein Database (PDB) were labeled ATP Binding, accounting for approximately 4.62% [ [4] ATP Binding, liang Yanchun, liu Guixia, etc. a Novel Prediction Method for ATP-Binding Sites From Protein Sequences base on Fusion of Deep genomic era, 2020 IEEE Access, 8: 21485-21495] all records, and the number of known ATP Binding proteins is far From sufficient in the face of large-scale Protein Sequences in the late genomic era. Today, algorithms such as machine learning are rapidly developed, methods for determining binding sites on proteins through computers are continuously developed, bioinformatics is continuously developed, however, the conventional calculation methods have the problems of low accuracy and high false positive rate of prediction results [ honjiajun. Zhejiang university, 2020]. To reveal the intrinsic mechanism of protein-ligand interaction, a great deal of wet laboratory work was undertaken, with thousands of protein-ligand interaction structure complexes deposited in the PDB. However, identification of protein ligand binding sites by wet laboratory techniques is often costly and time consuming. Because of the importance of protein-ligand interactions and the difficulty of identifying binding sites experimentally, the development of efficient, automated computational methods to rapidly predict protein-ligand binding sites has become an increasingly important issue in bioinformatics. Particularly when faced with the large-scale protein sequences of the latter genome era.
AI techniques such as machine learning, deep learning, etc., which are well known, can be used for the determination of protein-ligand interaction sites, and greatly improve the experimental rate (compared to wet laboratories) with great efficiency, are good methods that can be selected and continued to be explored at present. The model is trained and checked by using a proper data set, so that the times of performing wet experiments and the experiment cost are greatly saved. However, these methods have some problems, the prediction accuracy is not good enough, the error prediction rate is high, and it is a very valuable research problem how to improve the prediction accuracy and further reduce the time cost.
In biomedicine, understanding the interaction of proteins with ATP is helpful for protein functional annotation and drug development. Accurate identification of protein-ATP binding residues is an important but challenging task to gain knowledge of protein-ATP interactions, especially where only protein sequence information is provided. With the development of deep learning algorithms, convolutional Neural Networks (CNNs) have been widely used in various fields of biological information. However, in order to improve the performance of the classifier, the convolutional neural network can be realized only by superposing convolutional layers deeper and deeper; on the other hand, the gradient algorithm in the convolutional neural network is not capable of exploding and converging to a real extreme point even if the gradient algorithm is an objective function.
Disclosure of Invention
The invention provides a method and a device for predicting a protein-ATP binding site based on a fractional order neural network, aiming at the problems that in the prior art, a convolutional neural network model applied to the current protein-ATP prediction problem is low in convergence rate, the prediction effect needs to be improved, data distribution is unbalanced and the like.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, a fractional order neural network-based protein-ATP binding site prediction method is provided, and applied to an electronic device, and includes the following steps:
s1: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting the characteristics of target residues and adjacent residues of the target residues in the training set by a sliding window technology, and integrating the characteristics into a characteristic matrix;
s2: using the weighted cross entropy as a loss function of the prediction model, and adjusting the prediction iterative algorithm of each amino acid type by giving different weights on the basis of the loss function to obtain an adjusted prediction iterative algorithm;
s3: constructing a fractional order derivative defined based on Caputo, and modifying the adjusted prediction iterative algorithm based on the fractional order derivative;
s4: replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; and inputting the characteristic matrix into a new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
Alternatively, the training set is the raw protein sequence ATP-227 without treatment.
Optionally, in S1, constructing an initial prediction model, obtaining a training set based on the PDB protein database, collecting features of target residues and adjacent residues of the target residues in the training set by a sliding window technique, and integrating the features into a feature matrix, including:
s11: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises a target residue, and adjacent residues of the target residue are respectively arranged at the left side and the right side of the target residue;
s12: running psi-blast in an annotated protein sequence Swissprot database through a search tool blast based on a local comparison algorithm, inputting a training set, and obtaining a PSSM matrix of the training set;
s13: acquiring a protein secondary structure in a training set, and expressing the protein secondary structure by a 3-state secondary structure expression method to obtain a protein secondary structure vector;
s14: carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the coding mode is one-hot coding according to the amino acid classification modes of a dipole and a scroll side chain;
s15: and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in the set training set, and integrating the features into a feature matrix.
Optionally, in S2, the obtaining of the adjusted prediction iterative algorithm by using the weighted cross entropy as a loss function of the prediction model and adjusting the prediction iterative algorithm of each amino acid type by giving different weights based on the loss function includes:
defining the cross entropy of the ith sample as shown in the following formula (1):
Figure SMS_1
(1)
wherein ,
Figure SMS_2
, />
Figure SMS_3
if the ith sample belongs to the p-th class, then
Figure SMS_4
,/>
Figure SMS_5
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_6
(2)
wherein ,
Figure SMS_7
is the weight of each type, is>
Figure SMS_8
Is the value after One-hot coding; n represents the number of samples and/or the number of samples>
Figure SMS_9
Alternatively, in S3, the fractional derivative defined by Caputo is as the following equation (3):
Figure SMS_10
(3)
wherein f (t) is a target function, alpha is an order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure SMS_11
is a gamma function, t 0 Is an initial value, f (m) Denotes m-order derivation for f, and τ denotes a time constant.
Optionally, modifying the adjusted prediction iteration algorithm based on the fractional derivative in step S3 includes:
the fractional order gradient method is shown in the following formula (4):
Figure SMS_12
(4)
where μ is the iteration step or learning rate, K is the number of iterations,
Figure SMS_13
denotes the x (th) order 0 Step iteration step length;
will be given in formula (4)
Figure SMS_14
Is replaced by>
Figure SMS_15
A modified fractional order gradient method is then obtained as shown in equation (5) below:
Figure SMS_16
(5)
substituting the above equation (5) into equation (3) and simplifying to obtain a modified predictive iterative algorithm as shown in equation (6) below:
Figure SMS_17
(6)
the prediction iterative algorithm of the above formula (6) converges, and the point of convergence to the true extreme is x.
Optionally, in step S4, replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model, including:
replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm, and constructing a full connection layer of the convolutional neural network in the new prediction model, wherein the back propagation gradient of the full connection layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient-passing layers, the two types of gradient-passing layers comprising: the transfer gradient of the junction between the two layers is connected, and the gradient is updated.
In one aspect, there is provided a fractional order neural network-based protein-ATP binding site prediction apparatus, which is applied to an electronic device, the apparatus including:
the characteristic extraction module is used for constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting the characteristics of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the characteristics into a characteristic matrix;
the function modification module is used for utilizing the weighted cross entropy as a loss function of the prediction model, and adjusting the prediction iterative algorithm of each amino acid type by giving different weights on the basis of the loss function to obtain an adjusted prediction iterative algorithm;
the algorithm modification module is used for constructing a fractional derivative defined based on Caputo and modifying the adjusted prediction iteration algorithm based on the fractional derivative;
the result output module is used for replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with the modified prediction iterative algorithm to construct a new prediction model; and inputting the characteristic matrix into a new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
Alternatively, the training set is the raw protein sequence ATP-227 without treatment.
Optionally, the feature extraction module is further configured to obtain a training set based on the PDB protein database, and determine the size of the sliding window, where the sliding window includes a target residue, and adjacent residues of the target residue are respectively located on the left and right sides of the target residue;
running psi-blast in an annotated protein sequence Swissprot database through a search tool blast based on a local comparison algorithm, inputting a training set, and obtaining a PSSM matrix of the training set;
acquiring a protein secondary structure in a training set, and expressing the protein secondary structure by a 3-state secondary structure expression method to obtain a protein secondary structure vector;
carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the coding mode is one-hot coding according to the amino acid classification modes of a dipole and a scroll side chain;
and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in the set training set, and integrating the features into a feature matrix.
In one aspect, an electronic device is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to implement the above-mentioned fractional order neural network-based protein-ATP binding site prediction method.
In one aspect, a computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to implement a fractional order neural network-based protein-ATP binding site prediction method as described above is provided.
The technical scheme of the embodiment of the invention at least has the following beneficial effects:
in the scheme, a method for predicting the protein-ATP binding site is provided by combining a deep learning method and fractional differentiation, and the accuracy is improved. The invention is characterized in that the fractional order gradient defined by Caputo is added to the full-link layer of the single-start predictor, and the performance of the predictor is improved on the premise of ensuring convergence and chain rule.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a fractional order neural network-based protein-ATP binding site prediction method provided by an embodiment of the present invention;
FIG. 2 is a flow chart of a fractional order neural network-based protein-ATP binding site prediction method provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a forward propagation algorithm of a fractional order neural network-based protein-ATP binding site prediction method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating an updated process of a fractional order neural network-based protein-ATP binding site prediction method according to an embodiment of the present invention;
FIG. 5 is a diagram of the result of a fractional order neural network-based protein-ATP binding site prediction method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a fractional order neural network-based protein-ATP binding site prediction apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed description of the preferred embodiments
To make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a protein-ATP binding site prediction method based on a fractional order neural network, which can be realized by an electronic device, wherein the electronic device can be a terminal or a server. As shown in fig. 1, the flowchart of the fractional order neural network-based protein-ATP binding site prediction method combining multi-scale convolution and self-attention coding may include the following steps:
s101: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting the characteristics of target residues and adjacent residues of the target residues in the training set by a sliding window technology, and integrating the characteristics into a characteristic matrix;
s102: using the weighted cross entropy as a loss function of a prediction model, and adjusting a prediction iterative algorithm of each amino acid type by giving different weights on the basis of the loss function to obtain an adjusted prediction iterative algorithm;
s103: constructing a fractional order derivative defined based on Caputo, and modifying the adjusted prediction iterative algorithm based on the fractional order derivative;
s104: replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; inputting the characteristic matrix into a new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
Alternatively, the training set is the raw protein sequence ATP-227 without treatment.
Optionally, in S101, constructing an initial prediction model, obtaining a training set based on the PDB protein database, collecting features of a target residue and adjacent residues of the target residue in the training set by a sliding window technique, and integrating the features into a feature matrix, including:
s111: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises a target residue, and adjacent residues of the target residue are respectively arranged at the left side and the right side of the target residue;
s112: running a psi-blast in an annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, inputting a training set, and obtaining a PSSM matrix of the training set;
s113: acquiring a protein secondary structure in a training set, and expressing the protein secondary structure by a 3-state secondary structure expression method to obtain a protein secondary structure vector;
s114: carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the coding mode is one-hot coding according to the amino acid classification modes of a dipole and a scroll side chain;
s115: and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain features of target residues and target residues in the set training set, and integrating the features into a feature matrix.
Optionally, in S102, the obtaining an adjusted prediction iterative algorithm by adjusting the prediction iterative algorithm for each amino acid type by giving different weights based on the loss function using the weighted cross entropy as the loss function of the prediction model includes:
defining the cross entropy of the ith sample as shown in the following formula (1):
Figure SMS_18
(1)
wherein ,
Figure SMS_19
, />
Figure SMS_20
if the ith sample belongs to the p-th class, then
Figure SMS_21
,/>
Figure SMS_22
Representing the prediction probability of the ith sample belonging to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_23
(2)
wherein ,
Figure SMS_24
based on the weight of each class>
Figure SMS_25
Is the value after One-hot coding; n represents the number of samples and/or the number of samples>
Figure SMS_26
Alternatively, in S103, the fractional derivative defined by Caputo is as the following formula (3):
Figure SMS_27
(3)
wherein f (t) is a target function, alpha is an order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure SMS_28
is a gamma function, t 0 Is an initial value, f (m) Denotes m-order derivation for f, and τ denotes a time constant.
Optionally, modifying the adjusted prediction iteration algorithm based on the fractional derivative in step S103 includes:
the fractional order gradient method is shown in the following formula (4):
Figure SMS_29
(4)
wherein mu is an iteration step length or a learning rate, and K is an iteration frequency;
will be given in formula (4)
Figure SMS_30
Is replaced by>
Figure SMS_31
A modified fractional gradient method is then obtained as shown in equation (5) below: />
Figure SMS_32
(5)
Substituting the above equation (5) into equation (3) and simplifying to obtain a modified predictive iterative algorithm as shown in equation (6) below:
Figure SMS_33
(6)
the prediction iteration algorithm of the above formula (6) converges, and the point of convergence to the true extreme is x.
Optionally, in step S104, replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model, including:
replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm, and constructing a full connection layer of the convolutional neural network in the new prediction model, wherein the back propagation gradient of the full connection layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient pass-through layers, the two types of gradient pass-through layers comprising: the transfer gradient of the junction between the two layers is connected, and the gradient is updated.
In the embodiment of the invention, a protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differentiation, and the accuracy is improved. Firstly, data sets ATP-227 and ATP-14 are selected as a training set and a testing set, characteristics required by a model are extracted from the digital information of the protein, and the characteristics are integrated into a characteristic matrix to be used as input. And then, a parameter updating process of a back propagation process of the convolutional neural network is modified into fractional order gradient iteration by selecting the convolutional neural network, and test data shows that the prediction effect of the convolutional neural network modified by the fractional order is superior to that of the existing machine learning and integer order deep learning models. The invention is characterized in that the fractional order gradient defined by Caputo is added to the full-connection layer of the single-start predictor, and the performance of the predictor is improved on the premise of ensuring convergence and a chain rule.
The embodiment of the invention provides a protein-ATP binding site prediction method based on a fractional order neural network,
the method may be implemented by an electronic device, which may be a terminal or a server. As shown in fig. 2, the flowchart of the method for predicting the protein-ATP binding site based on the fractional order neural network by combining the multi-scale convolution and the self-attention coding method, the processing flow of the method may include the following steps:
s201: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises a target residue, and adjacent residues of the target residue are respectively arranged at the left side and the right side of the target residue;
in one possible embodiment, the training set is the raw protein sequence ATP-227 without treatment. The invention utilizes two common classical data sets in protein-ATP binding site prediction, and selects an unprocessed original protein sequence: ATP-227 and ATP-14. ATP-227 is 227 protein chains bound to ATP published in the PDB protein database 3 months and 10 days before 2010. The 227 chain contains a total of 3393 ATP-binding residues, and 80409 non-ATP-binding residues. Meanwhile, 14 protein chains are selected from ATP-17 (the corresponding fasta file cannot be found in the PDB database according to protein ID) and named ATP-14 as an independent test set, and the similarity between any one chain in ATP-14 and ATP-227 can be ensured to be less than 41%. And downloading fasta sequence files of the data set from the PDB protein database in batches, wherein ATP-227 is a training set, and ATP-14 is a testing set.
In one possible embodiment, the characteristics of the target residue and its adjacent residues are collected using a sliding window technique, due to the large number of amino acids in each protein sequence, the high proportion of non-binding and binding residues, and studies showing that the binding properties of the target residue are affected by the adjacent residues. A sliding window of size L comprises the target residue and features (L-1)/2 adjacent residues on the left and right sides of the target residue, respectively. L =15 is finally selected in this embodiment by performance comparison of different window sizes. That is, the value of one sliding window is: 000000010000000.
s202: and operating psi-blast in the annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, inputting a training set, and obtaining a PSSM matrix of the training set.
In a possible implementation, the PSSM matrix further includes other information, and in this embodiment, only the first 20 columns are intercepted.
S203: and acquiring a protein secondary structure in the training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector.
In one possible embodiment, for protein secondary structure, the present invention selects 3-state secondary structure representations, helix (C), helix (H) and strand (E), operating in the blast environment using psicred 4.02. Solvent accessibility was obtained using ASAquick. The extraction of the above three features is based on the fasta sequence file.
S204: carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the coding mode is one-hot coding according to the amino acid classification mode of dipole and scroll side chains.
<xnotran> , One-hot , , , , 1*7 , (Ala) , One-hot [0,0,0,0,0,0,1], (Tyr) , One-hot [0,0,0,1,0,0,0]. </xnotran>
S205: and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain target residues in the set training set and features of the target residues, and integrating the features into a feature matrix.
In One possible embodiment, the extraction of features is performed by sliding windows, which in this example would result in a 15 x 20 PSSM matrix, a 15 x 3 protein secondary structure vector, a 15 x 1 solvent accessibility vector and a 15 x 7 One-hot encoding vector. In this embodiment, the data sets ATP-227 and ATP-14 are used as a training set and a test set, and the features required by the model are extracted from the digitized information of the protein and integrated into a feature matrix as the input of a new prediction model.
S206: using the weighted cross entropy as a loss function of a prediction model, and adjusting a prediction iterative algorithm of each amino acid type by giving different weights based on the loss function to obtain an adjusted prediction iterative algorithm;
in one possible embodiment, the present invention employs a modification loss function to solve the data imbalance problem, i.e., cross entropy. Using the weighted cross entropy as a loss function to adjust the prediction for each class by assigning different weights, including:
defining the cross entropy of the ith sample as shown in the following formula (1):
Figure SMS_34
(1)
wherein ,
Figure SMS_35
if the ith sample belongs to the pth class, then +>
Figure SMS_36
,/>
Figure SMS_37
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_38
(2)
wherein ,
Figure SMS_39
based on the weight of each class>
Figure SMS_40
Is the One-hot encoded value.
In one possible implementation, the present invention can solve the unbalanced learning problem by using weighted cross entropy as a loss function and adjusting the prediction for each class by giving different weights. Class weights are calculated by Scikit-learn, balanced class weights are determined by the formula (numbering):
Figure SMS_41
wherein ,
Figure SMS_42
indicates the number of samples in each class, and>
Figure SMS_43
indicates the number of classes, in this document
Figure SMS_44
Bincount (y) is a function of the numpy library in python, and gives the number of occurrences of each element in y. We choose a threshold that maximizes the MCC value.
S207: constructing a fractional order derivative defined based on Caputo, and modifying the adjusted prediction iterative algorithm based on the fractional order derivative;
in one possible embodiment, the invention chooses to study the fractional gradient under the definition of Caputo, since the fractional derivative has very good properties, i.e. the derivative of the constant is equal to 0.
The fractional derivative defined by Caputo is as follows in equation (3):
Figure SMS_45
(3)
wherein f (t) is a target function, alpha is an order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure SMS_46
is a gamma function, t 0 Is an initial value, f (m) Denotes m-order derivation for f, and τ denotes a time constant.
In a possible embodiment, let f (x) be a smooth convex function, and x be a unique extreme point of f (x), each iteration step of the conventional integer order gradient method is:
Figure SMS_47
where μ is the iteration step or learning rate, K is the number of iterations,
Figure SMS_48
denotes the x (th) order 0 Step iteration step size. The fractional order gradient method can be written as:
Figure SMS_49
(4)
in a possible embodiment, if the fractional derivative is applied directly, the fractional step method cannot converge to the true extreme point x of f (x) but only to an extreme point defined by the fractional derivative of Caputo, the extreme point and the initial value x 0 And order, most often not equal to x.
To ensure that the algorithm converges to a true extreme point, another fractional step method is considered in the subsequent iteration process, i.e., x0 is replaced by xk-1: will be given in formula (4)
Figure SMS_50
Is replaced by>
Figure SMS_51
A modified fractional gradient method is then obtained as shown in equation (5) below:
Figure SMS_52
(5)
wherein 0 < alpha < 1.
Substituting the above equation (5) into equation (3) yields:
Figure SMS_53
when only the first term is retained and its absolute value is introduced, the fractional order gradient method of 0 < α < 2 is simplified to: a modified iterative algorithm is obtained as in equation (6) below:
Figure SMS_54
(6)/>
the iterative algorithm of the above formula (6) converges, and the point of convergence to the true extreme is x.
S208: replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; inputting the characteristic matrix into a new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
In one possible embodiment, a fully-connected layer of a convolutional neural network is constructed, wherein the back-propagation gradient of the fully-connected layer employs a mixture of fractional and integer order to ensure that the chain rule holds. Two types of gradients are set through the layers, one is a transitive gradient connecting the nodes between the two layers, and the other is an update gradient for intra-layer parameters.
In one possible embodiment, a schematic diagram of the forward propagation algorithm is shown in fig. 3
Figure SMS_55
Indicates the fifth->
Figure SMS_56
Each node being in the fifth>
Figure SMS_57
Output of the layer:
Figure SMS_58
here, the
Figure SMS_59
Represents->
Figure SMS_60
Layer weights, <' > based on>
Figure SMS_61
Indicates a degree of skewness, based on the measured value>
Figure SMS_62
Represents the output of the previous layer and the function->
Figure SMS_63
I.e. an activation function.
To ensure that the chain rule holds, the propagation gradient remains an integer gradient:
Figure SMS_64
but when updating the gradient, we use fractional step updates:
Figure SMS_65
the update process is shown in fig. 4.
In one possible embodiment, the model is tested using ATP-17 as the test set, and the model is output as a one-dimensional prediction probability matrix for each site of the protein sequence, and based on the criterion of maximizing MCC, we set the threshold value to 0.80, i.e. when the prediction probability of a site is greater than 0.8, it is judged as a binding site, and is represented by "1", and is represented by "0" otherwise. We performed 15 replicates on the test set, and selected accuracy (Acc), sensitivity (Sen), specificity (Spe) and Mausus Correlation Coefficient (MCC) as evaluation indexes, and compared the conventional convolutional neural network, and the average value of the multiple experiments is shown in the following table:
TABLE 1 evaluation index Table
Figure SMS_66
/>
Then, the results of predictions on ATP-17, for NsitePred, targetATpSite, targetS and ATPseq, respectively, compared to several predictors of protein-ATP binding sites that are better represented in the prior art, are shown in the following table:
TABLE 2 comparison of results of the prior predictor and the predictor of the present invention
Figure SMS_67
The results of the prediction of the protein 2YAA sequence are shown in FIG. 5. The invention can predict the binding site more accurately.
In the embodiment of the invention, a protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differentiation, and the accuracy is improved. Firstly, data sets ATP-227 and ATP-14 are selected as a training set and a testing set, characteristics required by a model are extracted from the digital information of the protein, and the characteristics are integrated into a characteristic matrix to be used as input. And then, a parameter updating process of a back propagation process of the convolutional neural network is modified into fractional order gradient iteration by selecting the convolutional neural network, and test data shows that the prediction effect of the convolutional neural network modified by the fractional order is superior to that of the existing machine learning and integer order deep learning models. The invention is characterized in that the fractional order gradient defined by Caputo is added to the full-connection layer of the single-start predictor, and the performance of the predictor is improved on the premise of ensuring convergence and a chain rule.
FIG. 6 is a block diagram of a fractional order neural net-based protein-ATP binding site prediction device, according to an exemplary embodiment. Referring to fig. 6, the apparatus 300 includes:
the feature extraction module 310 is configured to construct an initial prediction model, obtain a training set based on the PDB protein database, collect features of target residues and adjacent residues of the target residues in the training set by using a sliding window technique, and integrate the features into a feature matrix;
a function modifying module 320, configured to adjust the prediction iterative algorithm for each amino acid type by giving different weights based on the loss function by using the weighted cross entropy as a loss function of the prediction model, to obtain an adjusted prediction iterative algorithm;
the algorithm modifying module 330 is configured to construct a fractional derivative defined based on Caputo, and modify the adjusted prediction iteration algorithm based on the fractional derivative;
a result output module 340, configured to replace a parameter update process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm, so as to construct a new prediction model; and inputting the characteristic matrix into a new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
Alternatively, the training set is the raw protein sequence ATP-227 without treatment.
Optionally, the feature extraction module 310 is further configured to obtain a training set based on the PDB protein database, and determine a size of the sliding window; the sliding window comprises a target residue, and adjacent residues of the target residue are respectively arranged at the left side and the right side of the target residue;
running a psi-blast in an annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, inputting a training set, and obtaining a PSSM matrix of the training set;
acquiring a protein secondary structure in a training set, and expressing the protein secondary structure by a 3-state secondary structure expression method to obtain a protein secondary structure vector;
carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the encoding mode is one-hot encoding according to the amino acid classification mode of dipole and scroll side chains;
and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain target residues in a set training set and features of the target residues, and integrating the features into a feature matrix.
Optionally, the function modifying module 320 is configured to define the cross entropy of the ith sample as shown in the following formula (1):
Figure SMS_68
(1)
wherein ,
Figure SMS_69
if the ith sample belongs to the pth class, then->
Figure SMS_70
,/>
Figure SMS_71
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure SMS_72
(2)
wherein ,
Figure SMS_73
based on the weight of each class>
Figure SMS_74
Is the One-hot encoded value.
Optionally, the algorithm modifying module 330 is configured to define the fractional derivative according to the following equation (3):
Figure SMS_75
(3)
wherein f (t) is an objective function, alpha is an order, m-1 is more than alpha and less than m, m is a positive integer,
Figure SMS_76
is a gamma function, t 0 Is an initial value.
Optionally, the algorithm modifying module 330 is configured to modify the iterative algorithm to converge the iterative algorithm to a true extreme point, and includes:
the fractional order gradient method is shown in the following formula (4):
Figure SMS_77
(4)
where μ is the iteration step or learning rate, K is the number of iterations,
Figure SMS_78
denotes the x th 0 Step iteration step length;
will be given in formula (4)
Figure SMS_79
Is replaced by>
Figure SMS_80
A modified fractional gradient method is then obtained as shown in equation (5) below:
Figure SMS_81
(5)
substituting the above equation (5) into equation (3) and simplifying to obtain a modified iterative algorithm such as the following equation (6):
Figure SMS_82
(6)/>
the iterative algorithm of the above formula (6) converges, and the point of convergence to the true extreme is x.
Optionally, the result output module 340 is configured to replace a parameter update process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a fully connected layer of the convolutional neural network in the new prediction model, where a back propagation gradient of the fully connected layer is a mixture of a fractional order and an integer order; wherein the fully-connected layer comprises two types of gradient-passing layers, the two types of gradient-passing layers comprising: the transfer gradient connecting the nodes between the two layers, and the update gradient.
In the embodiment of the invention, a protein-ATP binding site prediction method is provided by combining a deep learning method and fractional differentiation, and the accuracy is improved. Firstly, data sets ATP-227 and ATP-14 are selected as a training set and a testing set, characteristics required by a model are extracted from the digital information of the protein, and the characteristics are integrated into a characteristic matrix to be used as input. And then, a parameter updating process of a back propagation process of the convolutional neural network is modified into fractional order gradient iteration by selecting the convolutional neural network, and test data shows that the prediction effect of the convolutional neural network modified by the fractional order is superior to that of the existing machine learning and integer order deep learning models. The invention is characterized in that the fractional order gradient defined by Caputo is added to the full-link layer of the single-start predictor, and the performance of the predictor is improved on the premise of ensuring convergence and chain rule.
Fig. 7 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present invention, where the electronic device 400 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 401 and one or more memories 402, where at least one instruction is stored in the memory 402, and the at least one instruction is loaded and executed by the processor 401 to implement the following steps of the fractional-order neural network-based protein-ATP binding site prediction method:
s1: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting the characteristics of target residues and adjacent residues of the target residues in the training set by a sliding window technology, and integrating the characteristics into a characteristic matrix;
s2: using the weighted cross entropy as a loss function of a prediction model, and adjusting a prediction iterative algorithm of each amino acid type by giving different weights on the basis of the loss function to obtain an adjusted prediction iterative algorithm;
s3: constructing a fractional order derivative defined based on Caputo, and modifying the adjusted prediction iteration algorithm based on the fractional order derivative;
s4: replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; and inputting the characteristic matrix into a new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
In an exemplary embodiment, a computer-readable storage medium, such as a memory including instructions executable by a processor in a terminal, to perform the fractional order neural network-based protein-ATP binding site prediction method is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A fractional order neural network-based protein-ATP binding site prediction method, the method steps comprising:
s1: constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting the characteristics of target residues and adjacent residues of the target residues in the training set by a sliding window technology, and integrating the characteristics into a characteristic matrix;
s2: using the weighted cross entropy as a loss function of the initial prediction model, and based on the loss function, adjusting the prediction iterative algorithm of each amino acid type by giving different weights to obtain an adjusted prediction iterative algorithm;
s3: constructing a fractional order derivative defined based on Caputo, and modifying the adjusted prediction iteration algorithm based on the fractional order derivative;
s4: replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; and inputting the characteristic matrix into the new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
2. The method of claim 1, wherein in S1, the training set is an unprocessed original protein sequence ATP-227.
3. The method of claim 1, wherein in S1, constructing an initial prediction model, obtaining a training set based on the PDB protein database, collecting features of target residues and adjacent residues of the target residues in the training set by a sliding window technique, and integrating the features into a feature matrix, comprises:
s11: acquiring a training set based on a PDB protein database, and determining the size of a sliding window; the sliding window comprises a target residue, and adjacent residues of the target residue are respectively arranged at the left side and the right side of the target residue;
s12: running psi-blast in an annotated protein sequence Swissprot database through a search tool blast based on a local comparison algorithm, inputting a training set, and obtaining a PSSM matrix of the training set;
s13: acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
s14: carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the coding mode is one-hot coding according to the amino acid classification modes of a dipole and a scroll side chain;
s15: and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain target residues in the set training set and features of the target residues, and integrating the features into a feature matrix.
4. The method according to claim 3, wherein in S2, the adjusted prediction iterative algorithm is obtained by using weighted cross entropy as a loss function of the initial prediction model and adjusting the prediction iterative algorithm of each amino acid type by assigning different weights based on the loss function, and the method comprises:
defining the cross entropy of the ith sample as shown in the following formula (1):
Figure QLYQS_1
(1)
wherein ,
Figure QLYQS_2
Figure QLYQS_3
if the ith sample belongs to the p-th class, then
Figure QLYQS_4
,/>
Figure QLYQS_5
Representing the prediction probability that the ith sample belongs to the p-th class;
the weighted cross entropy is defined as shown in the following equation (2):
Figure QLYQS_6
(2)/>
wherein ,
Figure QLYQS_7
is the weight of each type, is>
Figure QLYQS_8
Is the value after One-hot coding; n represents the number of samples and/or the number of samples>
Figure QLYQS_9
5. The method according to claim 4, wherein in S3, the fractional derivative defined by Caputo is as the following formula (3):
Figure QLYQS_10
(3)
wherein f (t) is an objective function, alpha is the order, 0 < alpha < 1, m-1 < alpha < m, m represents a constant, m is a positive integer,
Figure QLYQS_11
is a gamma function, t 0 Is an initial value, f (m) Denotes m-order derivation for f, and τ denotes a time constant.
6. The method of claim 5, wherein modifying the adjusted prediction iteration algorithm based on the fractional order derivative in step S3 comprises:
the fractional order gradient method is shown in the following formula (4):
Figure QLYQS_12
(4)
wherein mu is the iteration step length, K is the iteration times,
Figure QLYQS_13
denotes the x (th) order 0 Step iteration step length;
will be given in formula (4)
Figure QLYQS_14
Replacement by means of>
Figure QLYQS_15
A modified fractional gradient method is then obtained as shown in equation (5) below:
Figure QLYQS_16
(5)
substituting the above equation (5) into equation (3) and simplifying to obtain a modified predictive iterative algorithm as shown in equation (6) below:
Figure QLYQS_17
(6)
the prediction iterative algorithm of the above formula (6) converges, and the point of convergence to the true extreme is x.
7. The method according to claim 1, wherein in step S4, the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model is replaced with a modified prediction iterative algorithm to construct a new prediction model, which includes:
replacing a parameter updating process of a back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a fully connected layer of the convolutional neural network in the new prediction model, wherein the back propagation gradient of the fully connected layer adopts a mixture of fractional order and integer order; wherein the fully-connected layer comprises two types of gradient-passing layers, the two types of gradient-passing layers comprising: the transfer gradient connecting the nodes between the two layers, and the update gradient.
8. A fractional order neural network-based protein-ATP binding site prediction device, for use in the method of any one of claims 1-7, the device comprising:
the characteristic extraction module is used for constructing an initial prediction model, acquiring a training set based on a PDB protein database, collecting the characteristics of target residues and adjacent residues of the target residues in the training set through a sliding window technology, and integrating the characteristics into a characteristic matrix;
the function modification module is used for utilizing the weighted cross entropy as a loss function of the initial prediction model, and adjusting the prediction iterative algorithm of each amino acid type by giving different weights based on the loss function to obtain an adjusted prediction iterative algorithm;
the algorithm modification module is used for constructing a fractional derivative defined based on Caputo and modifying the adjusted prediction iteration algorithm based on the fractional derivative;
the result output module is used for replacing the parameter updating process of the back propagation process of the convolutional neural network in the initial prediction model with a modified prediction iterative algorithm to construct a new prediction model; inputting the characteristic matrix into the new prediction model, outputting a prediction result, and completing the protein-ATP binding site prediction based on the fractional order neural network.
9. The apparatus of claim 8, wherein the training set is an unprocessed raw protein sequence ATP-227.
10. The apparatus of claim 9, wherein the feature extraction module is further configured to obtain a training set based on a PDB protein database, determine a sliding window size, and include a target residue in the sliding window, wherein the target residue is adjacent to the target residue on each of left and right sides of the target residue;
running a psi-blast in an annotated protein sequence Swissprot database through a search tool blast based on a local alignment algorithm, inputting a training set, and obtaining a PSSM matrix of the training set;
acquiring a protein secondary structure in a training set, and representing the protein secondary structure by a 3-state secondary structure representation method to obtain a protein secondary structure vector;
carrying out One-hot coding on the amino acids in the training set to obtain One-hot coding vectors of each amino acid; wherein the encoding mode is one-hot encoding according to the amino acid classification mode of dipole and scroll side chains;
and (3) performing feature extraction on the PSSM matrix, the protein secondary structure vector and the One-hot coding vector of each amino acid through a sliding window to obtain target residues in the set training set and features of the target residues, and integrating the features into a feature matrix.
CN202310115169.0A 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network Active CN115966249B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310115169.0A CN115966249B (en) 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310115169.0A CN115966249B (en) 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network

Publications (2)

Publication Number Publication Date
CN115966249A true CN115966249A (en) 2023-04-14
CN115966249B CN115966249B (en) 2023-05-26

Family

ID=85888059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310115169.0A Active CN115966249B (en) 2023-02-15 2023-02-15 protein-ATP binding site prediction method and device based on fractional order neural network

Country Status (1)

Country Link
CN (1) CN115966249B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110970090A (en) * 2019-11-18 2020-04-07 华中科技大学 Method for judging similarity between polypeptide to be processed and positive data set peptide fragment
WO2020167667A1 (en) * 2019-02-11 2020-08-20 Flagship Pioneering Innovations Vi, Llc Machine learning guided polypeptide analysis
CN112214222A (en) * 2020-10-27 2021-01-12 华中科技大学 Sequential structure for realizing feedforward neural network in COStream and compiling method thereof
CN112767997A (en) * 2021-02-04 2021-05-07 齐鲁工业大学 Protein secondary structure prediction method based on multi-scale convolution attention neural network
US20210174903A1 (en) * 2019-12-10 2021-06-10 Protein Evolution Inc. Enhanced protein structure prediction using protein homolog discovery and constrained distograms
CN113593631A (en) * 2021-08-09 2021-11-02 山东大学 Method and system for predicting protein-polypeptide binding site
CN114882945A (en) * 2022-07-11 2022-08-09 鲁东大学 Ensemble learning-based RNA-protein binding site prediction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020167667A1 (en) * 2019-02-11 2020-08-20 Flagship Pioneering Innovations Vi, Llc Machine learning guided polypeptide analysis
CN110970090A (en) * 2019-11-18 2020-04-07 华中科技大学 Method for judging similarity between polypeptide to be processed and positive data set peptide fragment
US20210174903A1 (en) * 2019-12-10 2021-06-10 Protein Evolution Inc. Enhanced protein structure prediction using protein homolog discovery and constrained distograms
CN112214222A (en) * 2020-10-27 2021-01-12 华中科技大学 Sequential structure for realizing feedforward neural network in COStream and compiling method thereof
CN112767997A (en) * 2021-02-04 2021-05-07 齐鲁工业大学 Protein secondary structure prediction method based on multi-scale convolution attention neural network
CN113593631A (en) * 2021-08-09 2021-11-02 山东大学 Method and system for predicting protein-polypeptide binding site
CN114882945A (en) * 2022-07-11 2022-08-09 鲁东大学 Ensemble learning-based RNA-protein binding site prediction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MICHAEL BERNHOFER: "PredictProtein - Predicting Protein Structure and Function for 29 Years", NUCLEIC ACIDS RESEARCH *

Also Published As

Publication number Publication date
CN115966249B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
Dhakal et al. Artificial intelligence in the prediction of protein–ligand interactions: recent advances and future directions
Vanhaelen et al. Design of efficient computational workflows for in silico drug repurposing
Bader et al. Functional genomics and proteomics: charting a multidimensional map of the yeast cell
Raza Application of data mining in bioinformatics
Pua et al. Development of a comprehensive sequencing assay for inherited cardiac condition genes
Yu et al. Predicting protein-protein interactions in unbalanced data using the primary structure of proteins
Wang et al. Protein‐protein interaction networks as miners of biological discovery
US20040249791A1 (en) Method and system for developing and querying a sequence driven contextual knowledge base
Vyas et al. Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis
Li et al. A mouse protein interactome through combined literature mining with multiple sources of interaction evidence
Clancy et al. From proteomes to complexomes in the era of systems biology
Sun et al. Computational tools for aptamer identification and optimization
Sriwastava et al. Predicting protein-protein interaction sites with a novel membership based fuzzy SVM classifier
US20020072887A1 (en) Interaction fingerprint annotations from protein structure models
Xu et al. A systematic review of computational methods for predicting long noncoding RNAs
Sealfon et al. Machine learning methods to model multicellular complexity and tissue specificity
Haque et al. A common neighbor based technique to detect protein complexes in PPI networks
Tognon et al. A survey on algorithms to characterize transcription factor binding sites
Niu et al. Deep learning framework for integrating multibatch calibration, classification, and pathway activities
Roche et al. E (3) equivariant graph neural networks for robust and accurate protein-protein interaction site prediction
Xu et al. Comparative analysis of commonly used bioinformatics software based on omics
Jiang et al. Protein-protein interaction sites prediction using batch normalization based CNNs and oversampling method borderline-SMOTE
Chen et al. A deep learning approach to identify association of disease–gene using information of disease symptoms and protein sequences
CN115966249B (en) protein-ATP binding site prediction method and device based on fractional order neural network
Kumar et al. Bioinformatics in drug design and delivery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant