CN112447265A - Lysine acetylation site prediction method based on modular dense convolutional network - Google Patents

Lysine acetylation site prediction method based on modular dense convolutional network Download PDF

Info

Publication number
CN112447265A
CN112447265A CN202011344614.3A CN202011344614A CN112447265A CN 112447265 A CN112447265 A CN 112447265A CN 202011344614 A CN202011344614 A CN 202011344614A CN 112447265 A CN112447265 A CN 112447265A
Authority
CN
China
Prior art keywords
lysine acetylation
protein
dense
layer
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011344614.3A
Other languages
Chinese (zh)
Other versions
CN112447265B (en
Inventor
王会青
颜志良
刘丹
赵虹
赵健
赵静
赵森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202011344614.3A priority Critical patent/CN112447265B/en
Publication of CN112447265A publication Critical patent/CN112447265A/en
Application granted granted Critical
Publication of CN112447265B publication Critical patent/CN112447265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Biotechnology (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Physiology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The invention discloses a lysine acetylation site prediction method based on a modular dense convolutional network, which introduces protein structural characteristics and combines the protein structural characteristics with an original protein sequence and amino acid physical and chemical properties to construct a site characteristic space; the modular dense convolutional network is adopted to capture feature information of different levels, so that information loss and information crosstalk are reduced in the feature learning process; and a compression-excitation layer is introduced to evaluate the importance of different characteristics, and the abstract capability of the network is improved so as to identify potential lysine acetylation sites. The method can effectively solve the problem that the existing method only considers protein sequence level information and characteristic learning efficiency is low, more accurately predicts the potential lysine acetylation site, reduces the verification cost of the lysine acetylation site, and improves the research efficiency of lysine acetylation modification.

Description

Lysine acetylation site prediction method based on modular dense convolutional network
Technical Field
The invention relates to the field of lysine acetylation site prediction research and analysis, in particular to a lysine acetylation site prediction method based on a modular dense convolutional network.
Background
Lysine acetylation is a conservative posttranslational modification of proteins and is closely related to various metabolic diseases, and therefore, the recognition of lysine acetylation sites is of great significance for the research of the treatment of metabolic diseases. The structural characteristics of the protein comprise highly useful structural information, and a powerful basis is provided for the identification of protein posttranslational modification; in the characteristic learning process, information among different levels of characteristics is complemented, and the characteristic quality can be effectively improved by paying attention to the information of the different levels of characteristics. The existing deep learning method adopts the information of protein sequence level as input, and does not consider the structural characteristics of protein; only high-level features are considered during feature extraction, so that information is seriously lost, and a prediction result is reduced.
Disclosure of Invention
The invention aims to avoid the defects of the prior art and provides a lysine acetylation site prediction method based on a modular dense convolutional network.
The purpose of the invention can be realized by adopting the following technical measures, and the method for predicting the lysine acetylation site based on the modularized dense convolutional network is designed, and comprises the following steps:
describing lysine acetylation sites from three aspects of protein structural characteristics, protein original sequences and amino acid physicochemical attribute information, and constructing site initial characteristic space;
adopting a modularized dense convolution network, respectively extracting high-level characteristics of protein structural characteristics, protein original sequences and amino acid physicochemical properties from an initial characteristic space of a locus, and simultaneously paying attention to low-level characteristics and high-level characteristics through dense jump connection;
the importance of evaluation features of a compression-excitation (SE) layer is introduced, each feature map is weighted, and the self-adaptive dynamic fusion of the three types of information is realized;
constructing a lysine acetylation site classifier based on the fusion characteristics and the softmax layer, and predicting potential lysine acetylation sites;
training a lysine acetylation site prediction model based on a modular dense convolutional network;
the proposed model was evaluated by four types of experiments of ten-fold cross validation, independent test, model generalization ability test, and recognition ability to unknown lysine acetylation sites.
Wherein, describing lysine acetylation sites from three aspects of protein structural characteristics, protein original sequences and amino acid physicochemical attribute information, and the step of constructing site initial feature space comprises the following steps:
(1) collecting and pretreating experimental data of lysine acetylation sites;
(2) and converting the collected protein data into a numerical vector by an encoding mode, constructing a site initial feature space, and taking the site initial feature space as the input of a prediction model.
Wherein, the experimental data collection and pretreatment of lysine acetylation sites comprise the following steps:
6078, 3645 and 1860 experimentally validated human, mouse and E.coli lysine acetylated protein data were collected and downloaded from the Protein Lysine Modification Database (PLMD).
Considering that the SPIDER3 server cannot handle protein sequences containing non-standard amino acids, the present invention manually deletes these protein sequences. Taking human as an example, sequence redundancy elimination using CD-HIT tool avoids the model bias caused by large sequence homology, the threshold is set to 0.4, and 4977 acetylated protein sequences are retained. According to the invention, 10% (498) of 4977 acetylated protein sequences after filtration are randomly selected to construct an independent test data set, and the rest acetylated protein sequences are used as a training data set to facilitate comparison with other lysine acetylation site predictors.
The method comprises the following steps of converting collected protein data into numerical vectors in an encoding mode, constructing a site initial feature space, and using the site initial feature space as input of a prediction model, wherein the method comprises the following steps:
(1) using the original sequence information of the protein at the one-of-21 coding site, and representing the vector of the original sequence information of the protein with dimension L multiplied by 21 for the motif with length L;
(2) adopting the amino acid physicochemical attribute information of the Atchley factor coding sites, wherein each amino acid residue is represented by 5 Atchley factors, and for the motif with the length of L, obtaining the vector representation of the L multiplied by 5 dimensional amino acid physicochemical attribute information;
(3) protein structural property information was obtained by SPIDER3, including 8 indices of 3 attributes, secondary structure: α helix p (h), β chain p (c), γ loop p (e), local diaphyseal torsion angle:
Figure BDA0002799475020000031
ψ, θ, τ, accessible surface area: ASA. For motifs of length L, a vector representation of L × 8 dimensional protein structural property information will be obtained.
The method comprises the following steps of adopting a modularized dense convolution network, respectively extracting high-level features of protein structural characteristics, protein original sequences and amino acid physicochemical properties from an initial feature space of a locus, simultaneously paying attention to low-level features and high-level features through dense jump connection, and comprising the following steps of:
(1) introducing a design idea of a modular network structure, and constructing a structure, a sequence and a physical and chemical information module;
(2) and (3) extracting high-level features of each module by adopting a stacking dense rolling block, and realizing information complementation between different-level features by considering low-level features and high-level features at the same time through dense jump connection.
The design idea of a modular network structure is introduced, and three information modules of a structure, a sequence and physics and chemistry are constructed, and the method comprises the following steps:
a structure module, a sequence module, a physicochemical module and three feature extraction submodules are respectively constructed on the basis of the structural characteristics of the protein, the original sequence of the protein and the physicochemical properties of amino acid, and the parameter spaces among the submodules are mutually independent, so that the crosstalk among the three types of information is effectively avoided, and the quality of the features is improved.
The method comprises the following steps of adopting a stacking compact volume block to extract high-level features of each module, simultaneously considering low-level features and high-level features through dense jump connection, realizing information complementation among different-level features, and comprising the following steps of:
since the network structures of the structural module, the sequence module and the physicochemical module are the same, only the sequence module is explained here:
(1) first, the sequence module receives as input the one-of-21 code of the site motif of length L, and then generates a low-level profile of the original sequence information of the protein by means of the one-dimensional convolution layer, as shown in formula (1).
X0=σ(I*W+b) (1)
Wherein I is a one-of-21 code vector.
Figure BDA0002799475020000032
In the weight matrix, S is the size of the filter (S ═ 3), and D is the number of filters (D ═ 96). b is the bias term and σ is the activation function. X0Is the output of the one-dimensional convolution layer and has a size of L × D.
(2) And (3) extracting high-level characteristic representation of the protein original sequence information by adopting a dense convolution block, wherein the dense convolution process is shown as a formula (2).
Xl=σ([X0;X1;...;Xl-1]*W′+b′) (2)
Wherein, Xl-1A feature map generated for the first-1 convolutional layer in the dense convolutional block [ · C]Representing concatenation along a feature dimension.
Figure BDA0002799475020000041
For the weight matrix, D' is the total number of filters for 1 to l-1 layers of convolution in the dense volume block, and D "is the number of the l-th convolution layer filters in the dense volume block (D ═ 32). b' is a bias term, σ is an activation function, XlA feature map generated for the first convolutional layer in the dense convolutional block is shown. The output of the dense volume block is a low-level feature map X0Feature map X generated with each convolution layer in a dense convolution block1,X2,...,XlAre connected in series by characteristic dimensions, i.e. [ X ]0;X1;...;Xl]。
(3) And (3) carrying out convolution operation and activation operation on the characteristic diagram of the protein original sequence information obtained in the step (2) by adopting a transition layer, wherein the process of the transition layer is shown as a formula (3).
X=σ([X0;X1;...;Xl]*W″+b″) (3)
Wherein,
Figure BDA0002799475020000042
for the weight matrix, S 'is the size of the filter (S' ═ 1). b' is the bias term, σ is the activation function, and X is the output of the transition layer. And then, carrying out average pooling operation on the transition layer result so as to reduce the dimension of the characteristic diagram and reduce the risk of model overfitting.
(4) And (4) repeating the steps (2) and (3) to form a stacked dense volume block. The fourth step (2) is not followed by step (3), but instead by a global average pooling replacement.
Through the process, the high-level characteristic X of the original protein sequence of the site is extracted by the sequence module(seq)
Similarly, physicochemical and structural modules were also extracted through the above process to extract high-level features of the amino acid physicochemical properties and protein structural characteristics of the sites.
The method comprises the following steps of introducing the importance of evaluation features of a compression-excitation (SE) layer, weighting each feature map, and realizing the self-adaptive dynamic fusion of three types of information:
(1) introducing a compression-excitation (SE) layer to evaluate the importance of the features, and weighting each feature map;
(2) self-adaptive dynamic fusion of three kinds of information including protein structure characteristic, protein original sequence and amino acid physical and chemical properties.
Wherein, the importance of the compression-excitation (SE) layer evaluation features is introduced, and each feature map is weighted, comprising the following steps:
the sequence module is taken as an example for explanation:
(1) compression (squeeze): high-level feature X extracted from sequence module by global average pooling(seq)The global spatial information of (2) is compressed into the channel descriptor, and the compression process is as shown in formula (4).
Figure BDA0002799475020000051
Wherein z iscRepresents X(seq)C characteristic diagram of
Figure BDA0002799475020000052
Channel statistics of Fsq(. cndot.) denotes a compression operation, W and H denote feature diagrams, respectively
Figure BDA0002799475020000053
Width and height. Mixing X(seq)After calculating the statistical information of each feature map, X is obtained(seq)Channel descriptor of
Figure BDA0002799475020000054
(2) Excitation (excitation): trapping X with two fully-connected layers (FC)(seq)Channel dependence of, learning X(seq)The excitation process of the specificity weight of each feature map is shown in formula (5).
s=Fex(z,W)=σ(W2*δ(W1*z)) (5)
Wherein,
Figure BDA0002799475020000055
indicating learned X after the instigation operation(seq)The specific weight of each feature map in (1), Fex(. cndot.) denotes the firing operation. Delta and sigma respectively represent the activation functions of two fully-connected layers, the former being a ReLU function and the layer being a function with parameters
Figure BDA0002799475020000056
A reduction ratio of r (r-16); the latter is Sigmoid function, and the layer is with parameters
Figure BDA0002799475020000057
The dimension reduction layer ensures the dimension of s and the characteristic X(seq)The number of channels is the same.
(3) Feature scaling (scale): scaling X by activating(seq)Get the output of SE layer
Figure BDA0002799475020000061
Wherein
Figure BDA0002799475020000062
Is composed of
Figure BDA0002799475020000063
Any one of the elements of (a), (b), (c), (d,
Figure BDA0002799475020000064
is calculated as shown in equation (6).
Figure BDA0002799475020000065
Wherein,
Figure BDA0002799475020000066
representation of feature map
Figure BDA0002799475020000067
Each value of (a) is multiplied by a weight sc.
Similarly, the physicochemical and structural modules also get weighted high-level features through the SE layer.
The self-adaptive dynamic fusion of three types of information of protein structural characteristics, protein original sequences and amino acid physical and chemical properties comprises the following steps:
the SE layer is realized based on global average pooling and two full connection layers (FC), and the network structures of the structure module, the sequence module and the physicochemical module are the same, and the SE layer obtains weighted high-level features. Then, the output of each submodule is connected in series to obtain a fusion characteristic for classification
Figure BDA0002799475020000068
As the SE layer weights different feature graphs, the feature fusion process has self-adaptive dynamic characteristics.
Wherein, constructing a lysine acetylation site classifier based on the fusion characteristics and the softmax layer, predicting potential lysine acetylation sites, comprises the following steps:
softmax layer receives advanced features
Figure BDA0002799475020000069
And as an input, obtaining the prediction class of the sample after weighted summation and activation operation, wherein the forward propagation process of the softmax layer is shown as formula (7).
Figure BDA00027994750200000610
Wherein,
Figure BDA00027994750200000611
in order to be a weight matrix, the weight matrix,
Figure BDA00027994750200000612
is the bias term. P (y ═ i | x) represents the probability that the sample x is predicted as i ∈ {0, 1} class, and the class corresponding to the highest probability is the prediction class of the softmax classifier.
The method comprises the following steps of training a lysine acetylation site prediction model based on a modular dense convolutional network, wherein the training comprises the following steps:
(1) cross entropy is used as a cost function to minimize the training error:
Figure BDA0002799475020000071
where N is the total number of training samples, yjIs the true tag of the jth input motif, xjIs the jth input motif.
(2) L2 regularization is used in the training to mitigate the effects of overfitting, the final objective function of the model is:
minW(LC+λ∑(||W||2)2) (9)
wherein λ is a regularization coefficient, | W | | ceiling2Is the L2 norm of the weight matrix.
(3) The objective function was optimized using an Adam optimizer with learning rates and batch processing set to 0.0001 and 1000, respectively. The early stopping strategy and dropout technique are used to further prevent the model from being over-fitted.
(4) And a class re-weighting method is adopted, so that the influence of the positive samples is increased, and the model is forced to learn an abstract mechanism of the positive samples which occupies a small number.
(5) In the invention, a deep learning model is realized based on Keras 2.1.6 and TensorFlow 1.13.1, and model training and testing are carried out on a workstation which is provided with a Ubuntu 18.04.1LTS system and is provided with a GPU Nvidia Tesla V100-PCIE-32 GB.
Wherein, the proposed model is evaluated through four types of experiments including ten-fold cross validation, independent test, model generalization ability test and recognition ability of unknown lysine acetylation sites, and comprises the following steps:
(1) comparing the performance of a lysine acetylation site prediction model based on a modularized dense convolutional network and the performance of other prediction methods under the same reference training data set by adopting cross-folding verification;
(2) the prediction capability of a lysine acetylation site prediction model based on a modular dense convolutional network is further compared with that of other models in an independent test mode;
(3) further verifying that the lysine acetylation site prediction model based on the modularized dense convolution network has better generalization capability by adopting a generalization experiment mode;
(4) on an independent test set, top 20 candidate sites were validated and the ability to identify unknown lysine acetylation sites based on a lysine acetylation site prediction model of a modular dense convolutional network was evaluated.
Different from the prior art, the lysine acetylation site prediction method based on the modularized dense convolutional network introduces the structural characteristics of protein, and combines the structural characteristics with the original sequence of the protein and the physical and chemical properties of amino acid to construct a site feature space; the modular dense convolutional network is adopted to capture feature information of different levels, so that information loss and information crosstalk are reduced in the feature learning process; and a compression-excitation layer is introduced to evaluate the importance of different characteristics, and the abstract capability of the network is improved so as to identify potential lysine acetylation sites. The method can effectively solve the problem that the existing method only considers protein sequence level information and characteristic learning efficiency is low, more accurately predicts the potential lysine acetylation site, reduces the verification cost of the lysine acetylation site, and improves the research efficiency of lysine acetylation modification.
Drawings
FIG. 1 is a schematic flow chart of a lysine acetylation site prediction method based on a modular dense convolutional network according to the present invention;
FIG. 2 is the collected human data set information in the lysine acetylation site prediction method based on the modularized dense convolutional network, wherein the threshold value of CD-HIT is 0.4;
FIG. 3 is a schematic diagram of a dense convolutional network in a lysine acetylation site prediction method based on a modular dense convolutional network according to the present invention;
FIG. 4 is a schematic diagram of a compression-excitation module in a lysine acetylation site prediction method based on a modular dense convolutional network according to the present invention;
FIG. 5 shows ten times of cross validation performance of different methods on a human training data set under a redundancy removal threshold of 0.4 in a lysine acetylation site prediction method based on a modular dense convolutional network, bold as a highest value under a corresponding index;
FIG. 6 shows the prediction performance of the modular dense convolutional network-based lysine acetylation site prediction method on a human independent test data set under a 0.4 redundancy elimination threshold for different prediction methods;
FIG. 7 shows that in the lysine acetylation site prediction method based on the modularized dense convolutional network, each model predicts the performance on an independent test data set of Escherichia coli under a redundancy-removing threshold of 0.4, and the bold is the highest value under the corresponding index;
FIG. 8 is the prediction results of the first 20 candidate sites of acetylated proteins independently tested by human under 0.4 redundancy removing threshold in the lysine acetylation site prediction method based on the modularized dense convolutional network provided by the present invention, and bold is the site where acetylation modification actually occurs.
Detailed Description
The technical solution of the present invention will be further described in more detail with reference to the following embodiments. It is to be understood that the described embodiments are merely a few embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic flow chart of a lysine acetylation site prediction method based on a modular dense convolutional network according to the present invention.
The method comprises the following steps:
s110: describing lysine acetylation sites from three aspects of protein structural characteristics, protein original sequences and amino acid physicochemical attribute information, and constructing site initial feature space.
The step S110 includes:
1. collecting and pretreating experimental data of lysine acetylation sites;
2. and converting the collected protein data into a numerical vector by an encoding mode, constructing a site initial feature space, and taking the site initial feature space as the input of a prediction model.
Experimental data collection and pretreatment of lysine acetylation sites, comprising the steps of:
6078, 3645 and 1860 experimentally validated human, mouse and E.coli lysine acetylated protein data were collected and downloaded from the Protein Lysine Modification Database (PLMD).
Considering that the SPIDER3 server cannot handle protein sequences containing non-standard amino acids, the present invention manually deletes these protein sequences. Taking human as an example, sequence redundancy elimination using CD-HIT tool avoids the model bias caused by large sequence homology, the threshold is set to 0.4, and 4977 acetylated protein sequences are retained. According to the invention, 10% (498) of 4977 acetylated protein sequences after filtration are randomly selected to construct an independent test data set, and the rest acetylated protein sequences are used as a training data set to facilitate comparison with other lysine acetylation site predictors.
Converting collected protein data into a numerical vector by an encoding mode, constructing a site initial feature space, and using the site initial feature space as an input of a prediction model, wherein the method comprises the following steps of:
(1) using the original sequence information of the protein at the one-of-21 coding site, and representing the vector of the original sequence information of the protein with dimension L multiplied by 21 for the motif with length L;
(2) adopting the amino acid physicochemical attribute information of the Atchley factor coding sites, wherein each amino acid residue is represented by 5 Atchley factors, and for the motif with the length of L, obtaining the vector representation of the L multiplied by 5 dimensional amino acid physicochemical attribute information;
(3) protein structural property information was obtained by SPIDER3, including 8 indices of 3 attributes, secondary structure: α helix p (h), β chain p (c), γ loop p (e), local diaphyseal torsion angle:
Figure BDA0002799475020000101
ψ, θ, τ, accessible surface area: ASA. For motifs of length L, a vector representation of L × 8 dimensional protein structural property information will be obtained.
S120: and (3) adopting a modularized dense convolution network, respectively extracting high-level characteristics of protein structural characteristics, protein original sequences and amino acid physicochemical properties from the initial characteristic space of the sites, and simultaneously paying attention to low-level characteristics and high-level characteristics through dense jump connection.
The step S120 includes:
1. introducing a design idea of a modular network structure, and constructing a structure, a sequence and a physical and chemical information module;
2. and (3) extracting high-level features of each module by adopting a stacking dense rolling block, and realizing information complementation between different-level features by considering low-level features and high-level features at the same time through dense jump connection.
Introducing the design idea of a modular network structure, and constructing a structure, a sequence and a physical and chemical information module, wherein the method comprises the following steps:
a structure module, a sequence module, a physicochemical module and three feature extraction submodules are respectively constructed on the basis of the structural characteristics of the protein, the original sequence of the protein and the physicochemical properties of amino acid, and the parameter spaces among the submodules are mutually independent, so that the crosstalk among the three types of information is effectively avoided, and the quality of the features is improved.
The method adopts the stacking compact volume blocks to extract the high-level features of each module, simultaneously considers the low-level features and the high-level features through the dense jump connection, realizes the information complementation between the different-level features, and comprises the following steps:
since the network structures of the structural module, the sequence module and the physicochemical module are the same, only the sequence module is explained here:
(1) first, the sequence module receives as input the one-of-21 code of the site motif of length L, and then generates a low-level profile of the original sequence information of the protein by means of the one-dimensional convolution layer, as shown in formula (1).
X0=σ(I*W+b) (1)
Wherein I is a one-of-21 code vector.
Figure BDA0002799475020000111
In the weight matrix, S is the size of the filter (S ═ 3), and D is the number of filters (D ═ 96). b is the bias term and σ is the activation function. X0Is the output of the one-dimensional convolution layer and has a size of L × D.
(2) And (3) extracting high-level characteristic representation of the protein original sequence information by adopting a dense convolution block, wherein the dense convolution process is shown as a formula (2).
Xl=σ([X0;X1;...;Xl-1]*W′+b′) (2)
Wherein, Xl-1A feature map generated for the first-1 convolutional layer in the dense convolutional block [ · C]Representing concatenation along a feature dimension.
Figure BDA0002799475020000112
For the weight matrix, D' is the total number of filters for 1 to l-1 layers of convolution in the dense volume block, and D "is the number of the l-th convolution layer filters in the dense volume block (D ═ 32). b' is a bias term, σ is an activation function, XlA feature map generated for the first convolutional layer in the dense convolutional block is shown. The output of the dense volume block is a low-level feature map X0Feature map X generated with each convolution layer in a dense convolution block1,X2,...,XlAre connected in series by characteristic dimensions, i.e. [ X ]0;X1;...;Xl]。
(3) And (3) carrying out convolution operation and activation operation on the characteristic diagram of the protein original sequence information obtained in the step (2) by adopting a transition layer, wherein the process of the transition layer is shown as a formula (3).
X=σ([X0;X1;...;Xl]*W″+b″) (3)
Wherein,
Figure BDA0002799475020000121
for the weight matrix, S 'is the size of the filter (S' ═ 1). b' is a bias term and σ isAnd activating a function, wherein X is the output of the transition layer. And then, carrying out average pooling operation on the transition layer result so as to reduce the dimension of the characteristic diagram and reduce the risk of model overfitting.
(4) And (4) repeating the steps (2) and (3) to form a stacked dense volume block. The fourth step (2) is not followed by step (3), but instead by a global average pooling replacement.
Through the process, the high-level characteristic X of the original protein sequence of the site is extracted by the sequence module(seq)
Similarly, physicochemical and structural modules were also extracted through the above process to extract high-level features of the amino acid physicochemical properties and protein structural characteristics of the sites.
S130: and (3) introducing a compression-excitation (SE) layer to evaluate the importance of the features, weighting each feature map, and realizing the self-adaptive dynamic fusion of the three types of information.
The step S130 includes:
1. introducing a compression-excitation (SE) layer to evaluate the importance of the features, and weighting each feature map;
2. self-adaptive dynamic fusion of three kinds of information including protein structure characteristic, protein original sequence and amino acid physical and chemical properties.
The method introduces a compression-excitation (SE) layer to evaluate the importance of the features and weights each feature map, and comprises the following steps:
the sequence module is taken as an example for explanation:
(1) compression (squeeze): high-level feature X extracted from sequence module by global average pooling(seq)The global spatial information of (2) is compressed into the channel descriptor, and the compression process is as shown in formula (4).
Figure BDA0002799475020000131
Wherein z iscRepresents X(seq)C characteristic diagram of
Figure BDA0002799475020000132
Channel statistics of Fsq(. represents a compression operation, W and H are respectivelyRepresentation characteristic diagram
Figure BDA0002799475020000133
Width and height. Mixing X(seq)After calculating the statistical information of each feature map, X is obtained(seq)Channel descriptor of
Figure BDA0002799475020000134
(2) Excitation (excitation): trapping X with two fully-connected layers (FC)(seq)Channel dependence of, learning X(seq)The excitation process of the specificity weight of each feature map is shown in formula (5).
s=Fex(z,W)=σ(W2*δ(W1*z)) (5)
Wherein,
Figure BDA0002799475020000135
indicating learned X after the instigation operation(seq)The specific weight of each feature map in (1), Fex(. cndot.) denotes the firing operation. Delta and sigma respectively represent the activation functions of two fully-connected layers, the former being a ReLU function and the layer being a function with parameters
Figure BDA0002799475020000136
A reduction ratio of r (r-16); the latter is Sigmoid function, and the layer is with parameters
Figure BDA0002799475020000137
The dimension reduction layer ensures the dimension of s and the characteristic X(seq)The number of channels is the same.
(3) Feature scaling (scale): scaling X by activating(seq)Get the output of SE layer
Figure BDA0002799475020000138
Wherein
Figure BDA0002799475020000139
Is composed of
Figure BDA00027994750200001310
Any one of the elements of (a), (b), (c), (d,
Figure BDA00027994750200001311
is calculated as shown in equation (6).
Figure BDA00027994750200001312
Wherein,
Figure BDA00027994750200001313
representation of feature map
Figure BDA00027994750200001314
Each value and weight s ofcMultiplication.
Similarly, the physicochemical and structural modules also get weighted high-level features through the SE layer.
The self-adaptive dynamic fusion of three kinds of information of protein structure characteristic, protein original sequence and amino acid physical and chemical properties includes the following steps:
the SE layer is realized based on global average pooling and two full connection layers (FC), and the network structures of the structure module, the sequence module and the physicochemical module are the same, and the SE layer obtains weighted high-level features. Then, the output of each submodule is connected in series to obtain a fusion characteristic for classification
Figure BDA0002799475020000141
As the SE layer weights different feature graphs, the feature fusion process has self-adaptive dynamic characteristics.
S140: and constructing a lysine acetylation site classifier based on the fusion characteristics and the softmax layer, and predicting potential lysine acetylation sites.
The step S140 includes:
softmax layer receives advanced features
Figure BDA0002799475020000142
As input, by weighted evaluationAnd obtaining the prediction category of the sample after the activation operation, wherein the forward propagation process of the softmax layer is shown as an equation (7).
Figure BDA0002799475020000143
Wherein,
Figure BDA0002799475020000144
in order to be a weight matrix, the weight matrix,
Figure BDA0002799475020000145
is the bias term. P (y ═ i | x) represents the probability that the sample x is predicted as i ∈ {0, 1} class, and the class corresponding to the highest probability is the prediction class of the softmax classifier.
S150: training is based on a modular dense convolutional network lysine acetylation site prediction model.
The step S150 includes:
1. cross entropy is used as a cost function to minimize the training error:
Figure BDA0002799475020000146
where N is the total number of training samples, yjIs the true tag of the jth input motif, xjIs the jth input motif.
2. L2 regularization is used in the training to mitigate the effects of overfitting, the final objective function of the model is:
minW(LC+λ∑(||W||2)2) (9)
wherein λ is a regularization coefficient, | W | | ceiling2Is the L2 norm of the weight matrix.
3. The objective function was optimized using an Adam optimizer with learning rates and batch processing set to 0.0001 and 1000, respectively. The early stopping strategy and dropout technique are used to further prevent the model from being over-fitted.
4. And a class re-weighting method is adopted, so that the influence of the positive samples is increased, and the model is forced to learn an abstract mechanism of the positive samples which occupies a small number.
5. In the invention, a deep learning model is realized based on Keras 2.1.6 and TensorFlow 1.13.1, and model training and testing are carried out on a workstation which is provided with a Ubuntu 18.04.1LTS system and is provided with a GPU Nvidia Tesla V100-PCIE-32 GB.
S160: the proposed model was evaluated by four types of experiments of ten-fold cross validation, independent test, model generalization ability test, and recognition ability to unknown lysine acetylation sites.
The step S160 includes:
1. comparing the performance of a lysine acetylation site prediction model based on a modularized dense convolutional network and the performance of other prediction methods under the same reference training data set by adopting cross-folding verification;
2. the prediction capability of a lysine acetylation site prediction model based on a modular dense convolutional network is further compared with that of other models in an independent test mode;
3. further verifying that the lysine acetylation site prediction model based on the modularized dense convolution network has better generalization capability by adopting a generalization experiment mode;
4. on an independent test set, top 20 candidate sites were validated and the ability to identify unknown lysine acetylation sites based on a lysine acetylation site prediction model of a modular dense convolutional network was evaluated.
The performance of a lysine acetylation site prediction model based on a modularized dense convolutional network and the performance of other prediction methods are compared under the same reference training data set by adopting ten-fold cross validation, and the method comprises the following steps of:
(1) the model of the invention and other existing lysine acetylation site prediction models are combined in a ten-fold cross validation mode: MusiteDeep, CapsNet, DeepAcet, PSKACEPred, EnsemblePail, GPS-PAIL2.0, and ProAcePred.
(2) The performance of the model was evaluated using six statistical measures, including sensitivity (Sn), specificity (Sp), accuracy (Acc), precision (Pre), Mahalanobis Correlation Coefficient (MCC) and geometric mean (G-mean), which are defined as follows:
Figure BDA0002799475020000161
Figure BDA0002799475020000162
Figure BDA0002799475020000163
Figure BDA0002799475020000164
Figure BDA0002799475020000165
Figure BDA0002799475020000166
wherein TP, TN, FP and FN are true positive, true negative, false positive and false negative respectively. The MCC and G-mean indices may well reflect model quality when positive and negative samples are not balanced. In addition, the area under the Receiver Operating Characteristic (ROC) curve (AUC) and the area under the precision recall rate (PR) curve (AUPR) are also adopted to measure the overall performance of the model, and the higher the AUC and AUPR values are, the better the overall performance of the model is. The comparison result of the model is shown in the attached drawing of the specification.
The method for predicting the lysine acetylation site by using the modular dense convolutional network comprises the following steps of:
for models with independent tools, training the models with training data and then performing potential lysine acetylation site prediction on independent test data sets, for models providing Web services, testing the prediction performance based on the independent test data sets only. The result shows that the lysine acetylation site prediction model based on the modularized dense convolutional network has the highest MCC, G-mean, AUC and AUPR, is optimal in independent test data set, and has better lysine acetylation site prediction capability compared with other prediction methods. The results of the independent tests are shown in the attached drawings of the specification.
The method further verifies that the lysine acetylation site prediction model based on the modularized dense convolutional network has better generalization capability by adopting a generalization experiment mode, and comprises the following steps of:
the generalized experiment mode is adopted to predict lysine acetylation sites under the 0.3 redundancy removing threshold of a human data set, under the 0.4 and 0.3 redundancy removing thresholds of a mice data set and under the 0.4 and 0.3 redundancy removing thresholds of an escherichia coli data set. The lysine acetylation site prediction model based on the modularized dense convolutional network has better generalization capability and can be suitable for different species data sets, and available reference is provided for the prediction of lysine acetylation modification sites of more other species. The results of the generalization ability test are shown in the attached drawings of the specification.
Wherein, on an independent test set, the candidate sites with the top 20 rank are verified, and the ability of identifying unknown lysine acetylation sites based on a lysine acetylation site prediction model of a modularized dense convolutional network is evaluated, comprising the following steps:
the top 20 candidate sites predicted to be lysine acetylated by the model of the invention are listed according to the results of the independent test set and these 20 candidate sites were examined manually in the lysine modification database PLMD and the protein database Uniprot (https:// www.uniprot.org). Through statistical validation, found that 20 candidate sites in 13 are truly acetylated, 65%. The results of the first 20 candidate sites for independent testing of acetylated proteins by human are shown in the attached figure of the specification.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A lysine acetylation site prediction method based on a modular dense convolutional network is characterized by comprising the following steps:
s1, acquiring and preprocessing lysine acetylation site experimental data, converting the preprocessed data into numerical vectors in an encoding mode, and constructing site initial feature space;
s2, adopting a modularized dense convolution network, respectively extracting high-level characteristics of protein structural characteristics, protein original sequences and amino acid physicochemical properties from the site initial characteristic space, and obtaining low-level characteristics and high-level characteristics through dense jump connection;
s3, introducing the importance of compression-excitation SE layer evaluation features, weighting each feature map, and realizing self-adaptive dynamic fusion of three types of information, namely protein structural characteristics, protein original sequences and amino acid physicochemical properties to obtain fusion features;
s4, constructing a lysine acetylation site prediction model based on the fusion characteristics and the softmax layer, and predicting potential lysine acetylation sites;
s5, training a lysine acetylation site prediction model based on the modularized dense convolutional network;
s6 through ten-fold cross validation, independent test, model generalization ability test and unknown lysine acetylation site recognition ability four types of experiments, evaluation lysine acetylation site prediction model.
2. The model of claim 1, wherein the step S1 of obtaining and preprocessing the experimental data of lysine acetylation sites comprises:
s11 obtaining an experimentally verified lysine acetylated protein sequence from a protein lysine modification database;
s12, using CD-HIT tool to carry out sequence redundancy elimination, and setting the threshold value to be 0.4;
s13 randomly selecting 10% of the filtered lysine acetylated protein sequences, constructing an independent test data set, and using the residual lysine acetylated protein sequences as a training data set.
3. The model of claim 2, wherein in step S1, the transforming the preprocessed data into numerical vectors by coding comprises:
1) obtaining the vector representation of the original sequence information of the protein by using the original sequence information of the protein of the one-of-21 coding site;
2) adopting the amino acid physicochemical attribute information of the Atchley factor coding sites, wherein each amino acid residue is represented by 5 Atchley factors, and obtaining the vector representation of the amino acid physicochemical attribute information;
3) protein structural property information was obtained by SPIDER3, including 8 indices of 3 attributes, secondary structure: alpha helix P (H), beta strand P (C), gamma loop P (E); local diaphyseal torsion angle:
Figure FDA0002799475010000021
ψ, θ, τ; accessible surface area: ASA; a vector representation of the protein structural property information is obtained.
4. The model for predicting lysine acetylation sites based on modular dense convolutional network of claim 1, wherein step S2 specifically comprises:
1) introducing a design idea of a modular network structure, and constructing a structure, a sequence and a physical and chemical information module;
2) and (3) extracting high-level features of each module by adopting a stacking dense rolling block, and simultaneously acquiring low-level features and high-level features through dense jump connection.
5. The model for predicting lysine acetylation sites based on the modular dense convolutional network as claimed in claim 4, wherein the design idea of the modular network structure is introduced to construct three information modules of structure, sequence and physicochemical, specifically comprising:
a structure module, a sequence module, a physicochemical module and three feature extraction submodules are respectively constructed on the basis of the structural characteristics of the protein, the original sequence of the protein and the physicochemical properties of amino acid, and parameter spaces among the submodules are mutually independent.
6. The model of claim 5, wherein the sequence module is extracted from the high-level features by using stacked dense convolutional blocks, and the model specifically comprises:
1) the sequence module receives as input the one-of-21 code for the L-length site motif and then generates a low-level profile of the original sequence information of the protein by means of a one-dimensional convolution layer, as shown in equation (1):
X0=σ(I*W+b) (1)
wherein I is a one-of-21 code vector,
Figure FDA0002799475010000031
is a weight matrix, S is the size of the filter (S-3), D is the number of filters (D-96), b is a bias term, σ is an activation function, X0Is the output of the one-dimensional convolution layer, and the size is L multiplied by D;
2) extracting high-level characteristic representation of protein original sequence information by adopting a dense convolution block, wherein the dense convolution process is shown as a formula (2):
Xl=σ([X0;X1;...;Xl-1]*W′+b′) (2)
wherein, Xl-1A feature map generated for the first-1 convolutional layer in the dense convolutional block [ · C]The representations are concatenated along the feature dimension,
Figure FDA0002799475010000032
for the weight matrix, D 'is the total number of filters for 1 to l-1 layers of convolution in the dense volume block, and D' is the filter for the l-th convolution layer in the dense volume blockThe number of devices (D ″ ═ 32), b' is the bias term, σ is the activation function, XlRepresenting a feature map generated by the first convolutional layer in a dense convolutional block, the output of which is a low-level feature map X0Feature map X generated with each convolution layer in a dense convolution block1,X2,...,XlAre connected in series by characteristic dimensions, i.e. [ X ]0;X1;...;Xl];
3) Performing convolution operation and activation operation on the characteristic diagram of the protein original sequence information obtained in the step 2) by adopting a transition layer, wherein the process of the transition layer is shown as a formula (3):
X=σ([X0;X1;...;Xl]*W″+b″) (3)
wherein,
Figure FDA0002799475010000033
for the weight matrix, S 'is the size of the filter (S' ═ 1), b "is the bias term, σ is the activation function, and X is the output of the transition layer;
then, performing average pooling operation on the transition layer result to reduce the dimension of the characteristic diagram and reduce the risk of model overfitting;
4) repeating the steps 2) and 3) to form a stacked dense volume block, wherein the step 3) is not performed after the step 2) for the fourth time, and global average pooling replacement is used;
through the process, the high-level characteristic X of the original protein sequence of the site is extracted by the sequence module(seq)
7. The model of claim 6, wherein in step S3, the compression-excitation layer is introduced to evaluate the importance of the high-level features extracted by the sequence module, and each feature map is weighted, comprising the steps of:
1) compression: high-level feature X extracted from sequence module by global average pooling(seq)The global space information is compressed into the channel descriptor, and the compression process is shown as formula (4):
Figure FDA0002799475010000041
wherein z iscRepresents X(seq)C characteristic diagram of
Figure FDA0002799475010000042
Channel statistics of Fsq(. cndot.) denotes a compression operation, W and H denote feature diagrams, respectively
Figure FDA0002799475010000043
Width and height of X(seq)After calculating the statistical information of each feature map, X is obtained(seq)Channel descriptor of
Figure FDA0002799475010000044
2) Excitation: trapping X with two fully-connected layers (FC)(seq)Channel dependence of, learning X(seq)The excitation process of the specific weight of each feature map is shown as formula (5):
s=Fex(z,W)=σ(W2*δ(W1*z)) (5)
wherein,
Figure FDA0002799475010000047
indicating learned X after the instigation operation(seq)The specific weight of each feature map in (1), Fex(-) represents the firing operation; delta and sigma respectively represent the activation functions of two fully-connected layers, the former being a ReLU function and the layer being a function with parameters
Figure FDA0002799475010000045
A reduction ratio of r (r-16); the latter is Sigmoid function, and the layer is with parameters
Figure FDA0002799475010000046
The dimension reduction layer ensures the dimension of s and the characteristic X(seq)The number of channels is the same;
3) feature scaling (scale): scaling X by activating(seq)Get the output of SE layer
Figure FDA0002799475010000051
Wherein
Figure FDA0002799475010000052
Is composed of
Figure FDA0002799475010000053
Any one of the elements of (a), (b), (c), (d,
Figure FDA0002799475010000054
is calculated as shown in equation (6):
Figure FDA0002799475010000055
wherein,
Figure FDA0002799475010000056
representation of feature map
Figure FDA0002799475010000057
Each value of (a) is multiplied by a weight sc.
8. The modular dense convolutional network-based lysine acetylation site prediction model as claimed in claim 1, wherein the step S4 of predicting potential lysine acetylation sites comprises the steps of:
softmax layer receives advanced features
Figure FDA0002799475010000058
As an input, after weighted summation and activation operation, the prediction class of the sample is obtained, and the forward propagation process of the softmax layer is shown as formula (7):
Figure FDA0002799475010000059
wherein,
Figure FDA00027994750100000510
in order to be a weight matrix, the weight matrix,
Figure FDA00027994750100000511
for the bias term, P (y ═ i | x) represents the probability that the sample x is predicted to be i ∈ {0, 1} class, and the class corresponding to the maximum probability is the prediction class of the softmax classifier.
9. The modular dense convolutional network-based lysine acetylation site prediction model as claimed in claim 1, wherein the step S5 of training the modular dense convolutional network-based lysine acetylation site prediction model comprises the steps of:
1) cross entropy is used as a cost function:
Figure FDA00027994750100000512
where N is the total number of training samples, yjIs the true tag of the jth input motif, xjIs the jth input motif;
2) with L2 regularization in the training, the final objective function of the model is:
minW(LC+λ∑(||W||2)2) (9)
wherein λ is a regularization coefficient, | W | | ceiling2Is the L2 norm of the weight matrix;
(3) optimizing the objective function by adopting an Adam optimizer, and setting the learning rate and the batch processing to be 0.0001 and 1000 respectively; adopting early stopping strategy and dropout technology to further prevent the model from being over-fitted;
(4) and a class re-weighting method is adopted, so that the influence of the positive samples is increased, and the model is forced to learn an abstract mechanism of the positive samples which occupies a small number.
10. The model for predicting lysine acetylation sites based on modular dense convolutional network of claim 1, wherein step S6 specifically comprises:
(1) comparing the performance of a lysine acetylation site prediction model based on a modularized dense convolutional network and the performance of other prediction methods under the same reference training data set by adopting cross-folding verification;
(2) comparing the prediction capability of a lysine acetylation site prediction model based on a modular dense convolutional network with that of other models in an independent test mode;
(3) the method adopts a generalization experiment mode to verify that the lysine acetylation site prediction model based on the modularized dense convolution network has better generalization capability;
(4) on an independent test set, top 20 candidate sites were validated and the ability to identify unknown lysine acetylation sites based on a lysine acetylation site prediction model of a modular dense convolutional network was evaluated.
CN202011344614.3A 2020-11-25 2020-11-25 Lysine acetylation site prediction method based on modular dense convolutional network Active CN112447265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011344614.3A CN112447265B (en) 2020-11-25 2020-11-25 Lysine acetylation site prediction method based on modular dense convolutional network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011344614.3A CN112447265B (en) 2020-11-25 2020-11-25 Lysine acetylation site prediction method based on modular dense convolutional network

Publications (2)

Publication Number Publication Date
CN112447265A true CN112447265A (en) 2021-03-05
CN112447265B CN112447265B (en) 2021-08-20

Family

ID=74737660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011344614.3A Active CN112447265B (en) 2020-11-25 2020-11-25 Lysine acetylation site prediction method based on modular dense convolutional network

Country Status (1)

Country Link
CN (1) CN112447265B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298235A (en) * 2021-06-10 2021-08-24 浙江传媒学院 Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN113593634A (en) * 2021-08-06 2021-11-02 中国海洋大学 Transcription factor binding site prediction method fusing DNA shape characteristics
CN113611354A (en) * 2021-07-05 2021-11-05 河南大学 Protein torsion angle prediction method based on lightweight deep convolutional network
CN114496095A (en) * 2022-01-20 2022-05-13 广东药科大学 Modification site recognition method, system, device and storage medium
CN114927165A (en) * 2022-07-20 2022-08-19 深圳大学 Method, device, system and storage medium for identifying ubiquitination sites

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063393A (en) * 2019-12-26 2020-04-24 青岛科技大学 Prokaryotic acetylation site prediction method based on information fusion and deep learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063393A (en) * 2019-12-26 2020-04-24 青岛科技大学 Prokaryotic acetylation site prediction method based on information fusion and deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOWEI ZHAO等: "General and Species-Specific Lysine Acetylation Site Prediction Using a Bi-Modal Deep Architecture", 《IEEE ACCESS》 *
李佳根: "基于深度迁移学习的赖氨酸乙酰化位点预测问题的研究", 《中国优秀硕士学位论文全文数据库基础科学辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298235A (en) * 2021-06-10 2021-08-24 浙江传媒学院 Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN113611354A (en) * 2021-07-05 2021-11-05 河南大学 Protein torsion angle prediction method based on lightweight deep convolutional network
CN113611354B (en) * 2021-07-05 2023-06-02 河南大学 Protein torsion angle prediction method based on lightweight deep convolutional network
CN113593634A (en) * 2021-08-06 2021-11-02 中国海洋大学 Transcription factor binding site prediction method fusing DNA shape characteristics
CN113593634B (en) * 2021-08-06 2022-03-11 中国海洋大学 Transcription factor binding site prediction method fusing DNA shape characteristics
CN114496095A (en) * 2022-01-20 2022-05-13 广东药科大学 Modification site recognition method, system, device and storage medium
CN114927165A (en) * 2022-07-20 2022-08-19 深圳大学 Method, device, system and storage medium for identifying ubiquitination sites
CN114927165B (en) * 2022-07-20 2022-12-02 深圳大学 Method, device, system and storage medium for identifying ubiquitination sites
WO2024016389A1 (en) * 2022-07-20 2024-01-25 深圳大学 Ubiquitination site identification method, apparatus and system, and storage medium

Also Published As

Publication number Publication date
CN112447265B (en) 2021-08-20

Similar Documents

Publication Publication Date Title
CN112447265B (en) Lysine acetylation site prediction method based on modular dense convolutional network
Tan et al. Mnasnet: Platform-aware neural architecture search for mobile
CN114037844B (en) Global rank perception neural network model compression method based on filter feature map
CN107622182B (en) Method and system for predicting local structural features of protein
CN111210871A (en) Protein-protein interaction prediction method based on deep forest
CN111898689A (en) Image classification method based on neural network architecture search
CN111785329A (en) Single-cell RNA sequencing clustering method based on confrontation automatic encoder
Paupamah et al. Quantisation and pruning for neural network compression and regularisation
CN116580848A (en) Multi-head attention mechanism-based method for analyzing multiple groups of chemical data of cancers
CN115343676B (en) Feature optimization method for positioning technology of redundant substances in sealed electronic equipment
CN114743600B (en) Deep learning prediction method of target-ligand binding affinity based on gated attention mechanism
CN114360644A (en) Method and system for predicting combination of T cell receptor and epitope
CN115049019A (en) Method and device for evaluating arsenic adsorption performance of metal organic framework and related equipment
CN117476106B (en) Multi-class unbalanced protein secondary structure prediction method and system
CN113611354B (en) Protein torsion angle prediction method based on lightweight deep convolutional network
CN118426414A (en) Industrial data enhancement method based on self-attention variation self-encoder
CN117079741A (en) Molecular insulation strength prediction method, device and medium based on neural network
CN115472229B (en) Thermophilic protein prediction method and device
CN116071596A (en) Plankton scene classification method, device, equipment and storage medium
CN116486896A (en) Ligand specific binding residue prediction method based on domain self-adaption and graph network
CN115497564A (en) Antigen identification model establishing method and antigen identification method
CN112861949B (en) Emotion prediction method and system based on face and sound
CN115206422A (en) Mass spectrum spectrogram analyzing method and device and intelligent terminal
CN114724630A (en) Deep learning method for predicting posttranslational modification sites of protein
Termritthikun et al. Neural architecture search and multi-objective evolutionary algorithms for anomaly detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210319

Address after: 030000 Yingze West Street, Taiyuan, Taiyuan, Shanxi

Applicant after: Taiyuan University of Technology

Applicant after: Xueyi Technology (Chengdu) Co.,Ltd.

Address before: 030000 Yingze West Street, Taiyuan, Taiyuan, Shanxi

Applicant before: Taiyuan University of Technology

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210730

Address after: 030000 Yingze West Street, Taiyuan, Taiyuan, Shanxi

Applicant after: Taiyuan University of Technology

Address before: 030000 Yingze West Street, Taiyuan, Taiyuan, Shanxi

Applicant before: Taiyuan University of Technology

Applicant before: Xueyi Technology (Chengdu) Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant