CN114512188B - DNA binding protein recognition method based on improved protein sequence position specificity matrix - Google Patents

DNA binding protein recognition method based on improved protein sequence position specificity matrix Download PDF

Info

Publication number
CN114512188B
CN114512188B CN202210274125.8A CN202210274125A CN114512188B CN 114512188 B CN114512188 B CN 114512188B CN 202210274125 A CN202210274125 A CN 202210274125A CN 114512188 B CN114512188 B CN 114512188B
Authority
CN
China
Prior art keywords
position specificity
matrix
protein sequence
score
dna binding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210274125.8A
Other languages
Chinese (zh)
Other versions
CN114512188A (en
Inventor
冉坤
彭绍亮
赵雄君
潘亮
王练
刘文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202210274125.8A priority Critical patent/CN114512188B/en
Publication of CN114512188A publication Critical patent/CN114512188A/en
Application granted granted Critical
Publication of CN114512188B publication Critical patent/CN114512188B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a DNA binding protein recognition method based on an improved protein sequence position specificity matrix, which comprises the following steps: s1, initializing parameters; s2, constructing DNA binding protein sequence information; s3, expressing a protein sequence by adopting a position specificity score matrix; s4, normalizing the position specificity score matrix to obtain an improved position specificity score matrix; s5, inputting a convolutional neural network; s6, inputting an output result of the convolutional neural network into a bidirectional long-short-time memory network; s7, weighting hidden features generated by different storage units by adopting a time distribution dense layer; s8, inputting the output of the dense layer into the flat layer; s9, inputting the improved position specificity score matrix into a random forest model to obtain a decision result of a specific protein sequence; s10, inputting the output of the step S8 and the decision result of the step S9 into a scoring layer, and carrying out final prediction scoring according to the set weight. The invention improves the prediction performance and accuracy.

Description

DNA binding protein recognition method based on improved protein sequence position specificity matrix
Technical Field
The invention relates to the technical fields of biological informatics and computer fusion, in particular to a DNA binding protein recognition method based on an improved protein sequence position specificity matrix.
Background
DNA Binding Proteins (DBPs) are important proteins that play an important role in a variety of biological processes, such as DNA replication, transcriptional control, chromatin stability and modification, epigenetic regulation, post-transcriptional gene regulation, alternative splicing, translation, and the like. They have an important role in certain diseases such as cancer and myeloid leukemia. In addition, DNA binding proteins can also bind to DNA, which also plays an important role in gene expression, and accurate recognition of DNA binding proteins is of great importance.
Experimental techniques can accurately identify DNA binding proteins such as chromatin immunoprecipitation microarrays, x-ray crystallography, and filter binding assays, but these methods are expensive and time consuming. Particularly in the postgene era, the calculation method has low cost and is a good supplement of experimental technology. In recent years, computing methods based on machine learning algorithms have received widespread attention for their encouraging performance. Given a protein sequence as input, machine learning-based methods have proven effective in automatically predicting whether the protein sequence binds to DNA.
Therefore, the improvement of the accuracy of the model on DNA binding protein recognition is significant, and the discovery of important functions of potential DNA replication, transcription and the like and the mechanism of action thereof by utilizing the knowledge is very important scientific significance.
Disclosure of Invention
The invention aims to provide a DNA binding protein identification method based on an improved protein sequence position specificity matrix, so as to overcome the defects in the prior art.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
a method for DNA binding protein recognition based on an improved protein sequence position specificity matrix comprising the steps of:
s1, initializing parameters, including setting a network input dimension dim, a network sequence length l, and setting the number and the size of filters of a first convolution layer of a convolution neural network to n 1 And size 1 The number and size of filters of the second convolution layer is set to n 2 And size 2 The size of the pooling core of the maximum pooling layer is size 3 Setting the neuron number of the two-way long short-time memory network as n 3 The number of nodes of the full connection layer is set to n 4 The final predicted score of the DNA binding protein was set to score DBP The neural network predicted result is score 1 Random forest prediction result is score 2 Neural networkThe weight occupied by the prediction result is w 1 The weight of the random forest prediction result is w 2
S2, constructing DNA binding protein sequence information;
s3, for a given protein sequence S, using a position specificity score matrix to represent the protein sequence S 1 S 2 ...S L Wherein S is i (1.ltoreq.i.ltoreq.L) represents an amino acid at the i-th position in S, L being the length of S;
s4, normalizing the position specificity score matrix, decomposing the normalized matrix into n submatrices, calculating the local position specificity score matrix characteristics of all submatrices, and expressing the protein sequence as a characteristic carrier with a specific dimension to obtain an improved position specificity score matrix;
s5, inputting the improved position specificity scoring matrix into a convolutional neural network, and sequentially stacking two convolutional layers, wherein the output of the upper layer is used as the input of the lower layer, and the convolutional layers adopt ReLU as an activation function;
s6, inputting an output result of the convolutional neural network into the bidirectional long-short-time memory network, and adopting a ReLU as an activation function;
s7, weighting hidden features generated by different storage units by adopting a time distribution dense layer;
s8, inputting the output of the dense layer into the flat layer, changing the result into one-dimensional data, and inputting the one-dimensional data into the full-connection layer to obtain output, wherein a node of the output adopts sigmoid as an activation function;
s9, inputting the improved position specificity score matrix obtained in the step S4 into a random forest model, and obtaining a decision result of a specific protein sequence through a random forest decision tree;
s10, inputting the output of the step S8 and the decision result of the step S9 into a scoring layer, and carrying out final prediction scoring according to the set weight, wherein the prediction scoring corresponds to a confidence degree, and the higher the scoring is, the higher the possibility of correctly identifying is.
Further, the step S2 specifically includes:
s20, obtaining the gene classification term DNA-binding annotated protein from the annotated protein sequence database Swiss-Prot as a positive sample S +
S21, collecting proteins irrelevant to the annotation of the gene classification term DNA-binding from the annotated protein sequence database Swiss-Prot as a negative sample S
S22, in the positive sample S + And negative sample S Protein with the length smaller than a set value is removed;
s23, removing the positive sample S + And negative sample S The middle cut-off threshold is a homologous protein with a first set value and the coverage rate is a sequence length of a second set value.
Further, a negative sample S in the step S21 Proteins of known structure are selected.
Further, said step S23 specifically uses CD-HIT and BLASTCLITS to remove said positive sample S + And negative sample S Homologous proteins with a cut-off threshold of 0.35 and a coverage of 90% sequence length.
Further, in the step S3, an e-value threshold is set to 0.001 and the iteration number is set to 10, and a corresponding position-specific score matrix is generated through PSI-BLAST.
Further, the step S4 specifically includes:
s40, dividing the position specificity score matrix into n submatrices, wherein the first n-1 submatrices are provided with L/n rows and 20 columns, the last submatrix is provided with L- (n-1) L/n rows and 20 columns, and each submatrix keeps the evolution information contained in the position specificity score matrix, wherein n is more than or equal to 1;
s41, calculating local position specificity score matrix characteristics of each submatrix, wherein the first n-1 submatrices calculate 20 local characteristics by combining evolution information, and the last submatrix is calculated by values of the first n-1 submatrices.
Further, the step S9 specifically includes:
s90, sampling the position specificity score matrix after the improvement in the step S4 with a put back mode to obtain a plurality of sample sets;
s91, randomly extracting m features from candidate features to serve as candidate features for decision under a current node, selecting and dividing training sample features from the candidate features, constructing a decision tree by using each sample set as a training sample, and calculating a single decision tree by using a CART algorithm after generating the sample set and determining the features without pruning;
and S92, voting the output of the decision trees by adopting a random forest method after the decision trees with the set number are obtained, so that the class with the largest vote is used as the decision of the random forest.
Further, the step S90 is specifically from n to n each time 1 Random decimation n with put back in each training sample 2 Samples.
Further, the step S10 specifically includes:
s100, respectively obtaining the prediction score of the output DNA binding protein of the step S8 1 And the predicted score of the DNA binding protein of the decision result of step S9 2
S110, according to different weights w 1 And w 2 The final prediction score was calculated as follows:
score DBP =score 1 *w 1 +score 2 *w 2
compared with the prior art, the invention has the advantages that: according to the DNA binding protein recognition method based on the improved protein sequence position specificity matrix, the prediction accuracy is improved by constructing a positive sample and a negative sample of the DNA binding protein; secondly, learning spatial sequence information and time sequence information of the DNA binding protein through a convolutional neural network, a two-way long and short-time memory network and a random forest model, improving a PSSM matrix, and improving the recognition performance of the DNA binding protein; finally, by setting different weights, the neural network and the random forest model are weighted to obtain a final prediction score, and the prediction performance and accuracy are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a DNA binding protein recognition method of the present invention based on an improved protein sequence position specificity matrix.
FIG. 2 is a block diagram of a neural network based on the DNA binding protein recognition method of the present invention that improves the positional specificity matrix of a protein sequence.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings so that the advantages and features of the present invention can be more easily understood by those skilled in the art, thereby making clear and defining the scope of the present invention.
Referring to FIG. 1, the present embodiment discloses a DNA binding protein recognition method based on an improved protein sequence position specificity matrix, comprising the steps of:
step S1, initializing parameters, including setting a network input dimension dim, a network sequence length l, and setting the number and the size of filters of a first convolution layer of a convolution neural network to n 1 And size 1 The number and size of filters of the second convolution layer is set to n 2 And size 2 The size of the pooling core of the maximum pooling layer is size 3 Setting the neuron number of the two-way long short-time memory network as n 3 The number of nodes of the full connection layer is set to n 4 The final predicted score of the DNA binding protein was set to score DBP The neural network predicted result is score 1 Random forest prediction result is score 2 The weight occupied by the neural network prediction result is w 1 The weight of the random forest prediction result is w 2
And S2, constructing DNA binding protein sequence information.
Specifically, the method specifically comprises the following steps:
step S20, obtaining the DNA-binding annotated protein of the gene classification term from the annotated protein sequence database Swiss-Prot as a positive sample S +
Step S21, collecting proteins unrelated to the annotation of the gene classification term DNA-binding from the annotated protein sequence database Swiss-Prot as a negative sample S To ensure the quality of the negative sample, the negative sample S Proteins of known structure are selected.
Step S22, in the positive sample S + And negative sample S Protein with a length less than the set value (40 in this example) is removed.
Step S23, removing the positive sample S + And negative sample S Homologous proteins with a cutoff threshold of a first set (0.35) and a coverage of a second set (90%) sequence length were removed using the CD-HIT and BLASTCLITS methods in this example.
Step S3, for a given protein sequence S, using a position-specific scoring matrix (PSSM) to represent the protein sequence S 1 S 2 ...S L Wherein S is i (1.ltoreq.i.ltoreq.L) represents an amino acid at the ith position in S, L being the length of S.
In this embodiment, the e-value threshold is set to 0.001 and the iteration number is set to 10, and a corresponding position-specific score matrix is generated by PSI-BLAST.
And S4, normalizing a Position Specificity Score Matrix (PSSM), decomposing the normalized matrix into n submatrices, calculating the local position specificity score matrix characteristics of all submatrices, and expressing a protein sequence as a characteristic carrier with a specific dimension to obtain an improved position specificity score matrix IMPSSM= { x|x=normalization (PSSM (i), 0< i < n+1}.
Specifically, step S4 in this embodiment specifically includes:
step S40, dividing the position specificity score matrix into n submatrices, wherein the first n-1 submatrices have L/n rows and 20 columns, the last submatrix has L- (n-1) L/n rows and 20 columns, and each submatrix retains the evolution information contained in the Position Specificity Score Matrix (PSSM), wherein n is more than or equal to 1.
And S41, calculating the local position specificity score matrix characteristics of each submatrix, wherein the first n-1 submatrices calculate 20 local characteristics by combining evolution information, and the last submatrix is calculated by the values of the first n-1 submatrices.
And S5, inputting the improved position specificity score matrix (IMPSSM) into a convolutional neural network, and sequentially stacking two convolutional layers, wherein the output of the upper layer is used as the input of the next layer, and the convolutional layers adopt a ReLU as an activation function.
And S6, inputting an output result of the convolutional neural network into the bidirectional long-short-time memory network, and adopting a ReLU as an activation function.
And step S7, after the bidirectional long-short-time memory network, weighting hidden features generated by different storage units by adopting a time distribution dense layer.
And S8, inputting the output of the dense layer into the flat layer, changing the result into one-dimensional data, and inputting the one-dimensional data into the full-connection layer to obtain output, wherein a node of the output adopts sigmoid as an activation function.
And S9, inputting the improved position specificity score matrix obtained in the step S4 into a random forest model, and obtaining a decision result of the specific protein sequence through a random forest decision tree.
Specifically, the step S9 specifically includes:
step S90, sampling the position specificity scoring matrix after the improvement in step S4 with a put back to obtain a plurality of sample sets, specifically: specifically from the original n each time 1 Random decimation n with put back in each training sample 2 Samples (including possibly duplicate samples).
Step S91, randomly extracting m features from candidate features to serve as candidate features for decision under a current node, selecting and dividing training sample features from the candidate features, constructing a decision tree by using each sample set as a training sample, and calculating by using a CART algorithm after generating the sample set and determining the features by using a single decision tree without pruning;
step S92, voting is carried out on the output of the decision trees by adopting a random forest method after the decision trees with the set number are obtained, and the class with the largest vote is used as the decision of the random forest.
And step S10, inputting the output of the step S8 and the decision result of the step S9 into a scoring layer, and carrying out final prediction scoring according to the set weight.
Specifically, step S10 specifically includes:
step S100, obtaining the predicted score of the output DNA binding protein of step S8 1 And the predicted score of the DNA binding protein of the decision result of step S9 2
Step S110, according to different weights w 1 And w 2 The final prediction score was calculated as follows:
score DBP =score 1 *w 1 +score 2 *w 2
according to the invention, the prediction precision is improved by constructing positive samples and negative samples of DNA binding proteins; secondly, learning spatial sequence information and time sequence information of the DNA binding protein through a convolutional neural network, a two-way long and short-time memory network and a random forest model, improving a PSSM matrix, and improving the recognition performance of the DNA binding protein; finally, by setting different weights, the neural network and the random forest model are weighted to obtain a final prediction score, and the prediction performance and accuracy are improved.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the patentees may make various modifications or alterations within the scope of the appended claims, and are intended to be within the scope of the invention as described in the claims.

Claims (9)

1. A method for identifying a DNA binding protein based on an improved protein sequence position specificity matrix comprising the steps of:
s1, initializing parameters, including setting network input dimension dim, network sequence length l and convolutional neural networkThe number and size of filters of the first convolution layer is set to n 1 And size 1 The number and size of filters of the second convolution layer is set to n 2 And size 2 The size of the pooling core of the maximum pooling layer is size 3 Setting the neuron number of the two-way long short-time memory network as n 3 The number of nodes of the full connection layer is set to n 4 The final predicted score of the DNA binding protein was set to score DBP The neural network predicted result is score 1 Random forest prediction result is score 2 The weight occupied by the neural network prediction result is w 1 The weight of the random forest prediction result is w 2
S2, constructing DNA binding protein sequence information;
s3, for a given protein sequence S, using a position specificity score matrix to represent the protein sequence S 1 S 2 ...S L Wherein S is i (1.ltoreq.i.ltoreq.L) represents an amino acid at the i-th position in S, L being the length of S;
s4, normalizing the position specificity score matrix, decomposing the normalized matrix into n submatrices, calculating the local position specificity score matrix characteristics of all submatrices, and expressing the protein sequence as a characteristic carrier with a specific dimension to obtain an improved position specificity score matrix;
s5, inputting the improved position specificity scoring matrix into a convolutional neural network, and sequentially stacking two convolutional layers, wherein the output of the upper layer is used as the input of the lower layer, and the convolutional layers adopt ReLU as an activation function;
s6, inputting an output result of the convolutional neural network into the bidirectional long-short-time memory network, and adopting a ReLU as an activation function;
s7, weighting hidden features generated by different storage units by adopting a time distribution dense layer;
s8, inputting the output of the dense layer into the flat layer, changing the result into one-dimensional data, and inputting the one-dimensional data into the full-connection layer to obtain output, wherein a node of the output adopts sigmoid as an activation function;
s9, inputting the improved position specificity score matrix obtained in the step S4 into a random forest model, and obtaining a decision result of a specific protein sequence through a random forest decision tree;
s10, inputting the output of the step S8 and the decision result of the step S9 into a scoring layer, and carrying out final prediction scoring according to the set weight.
2. The method for identifying a DNA binding protein based on an improved protein sequence position specificity matrix according to claim 1, wherein said step S2 specifically comprises:
s20, obtaining the gene classification term DNA-binding annotated protein from the annotated protein sequence database Swiss-Prot as a positive sample S +
S21, collecting proteins irrelevant to the annotation of the gene classification term DNA-binding from the annotated protein sequence database Swiss-Prot as a negative sample S
S22, in the positive sample S + And negative sample S Protein with the length smaller than a set value is removed;
s23, removing the positive sample S + And negative sample S The middle cut-off threshold is a homologous protein with a first set value and the coverage rate is a sequence length of a second set value.
3. The method for DNA binding protein recognition based on the improved protein sequence position specificity matrix according to claim 2, wherein said negative sample S in step S21 Proteins of known structure are selected.
4. The method for DNA binding protein recognition based on the improved protein sequence position specificity matrix according to claim 2, wherein said step S23 specifically uses CD-HIT and BLASTCLITS to remove said positive sample S + And negative sample S Homologous proteins with a cut-off threshold of 0.35 and a coverage of 90% sequence length.
5. The method for identifying a DNA binding protein based on the improved protein sequence position specificity matrix according to claim 1, wherein the e-value threshold is set to 0.001 and the number of iterations is set to 10 in the step S3, and the corresponding position specificity score matrix is generated by PSI-BLAST.
6. The method for identifying a DNA binding protein based on an improved protein sequence position specificity matrix according to claim 1, wherein said step S4 specifically comprises:
s40, dividing the position specificity score matrix into n submatrices, wherein the first n-1 submatrices are provided with L/n rows and 20 columns, the last submatrix is provided with L- (n-1) L/n rows and 20 columns, and each submatrix keeps the evolution information contained in the position specificity score matrix, wherein n is more than or equal to 1;
s41, calculating local position specificity score matrix characteristics of each submatrix, wherein the first n-1 submatrices calculate 20 local characteristics by combining evolution information, and the last submatrix is calculated by values of the first n-1 submatrices.
7. The method for identifying a DNA binding protein based on an improved protein sequence position specificity matrix according to claim 1, wherein said step S9 specifically comprises:
s90, sampling the position specificity score matrix after the improvement in the step S4 with a put back mode to obtain a plurality of sample sets;
s91, randomly extracting m features from candidate features to serve as candidate features for decision under a current node, selecting and dividing training sample features from the candidate features, constructing a decision tree by using each sample set as a training sample, and calculating a single decision tree by using a CART algorithm after generating the sample set and determining the features without pruning;
and S92, voting the output of the decision trees by adopting a random forest method after the decision trees with the set number are obtained, so that the class with the largest vote is used as the decision of the random forest.
8. The substrate according to claim 7A method for identifying DNA binding proteins by improving the positional specificity matrix of a protein sequence, characterized in that the step S90 is specifically performed every time from the original n 1 Random decimation n with put back in each training sample 2 Samples.
9. The method for identifying a DNA binding protein based on an improved protein sequence position specificity matrix according to claim 1, wherein said step S10 specifically comprises:
s100, respectively obtaining the prediction score of the output DNA binding protein of the step S8 1 And the predicted score of the DNA binding protein of the decision result of step S9 2
S110, according to different weights w 1 And w 2 The final prediction score was calculated as follows:
score DBP =score 1 *w 1 +score 2 *w 2
CN202210274125.8A 2022-03-20 2022-03-20 DNA binding protein recognition method based on improved protein sequence position specificity matrix Active CN114512188B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210274125.8A CN114512188B (en) 2022-03-20 2022-03-20 DNA binding protein recognition method based on improved protein sequence position specificity matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210274125.8A CN114512188B (en) 2022-03-20 2022-03-20 DNA binding protein recognition method based on improved protein sequence position specificity matrix

Publications (2)

Publication Number Publication Date
CN114512188A CN114512188A (en) 2022-05-17
CN114512188B true CN114512188B (en) 2024-04-05

Family

ID=81553408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210274125.8A Active CN114512188B (en) 2022-03-20 2022-03-20 DNA binding protein recognition method based on improved protein sequence position specificity matrix

Country Status (1)

Country Link
CN (1) CN114512188B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808975A (en) * 2016-03-14 2016-07-27 南京理工大学 Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN111785321A (en) * 2020-06-12 2020-10-16 浙江工业大学 DNA binding residue prediction method based on deep convolutional neural network
CN112489723A (en) * 2020-12-01 2021-03-12 南京理工大学 DNA binding protein prediction method based on local evolution information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3100607A1 (en) * 2018-05-23 2019-11-28 Envisagenics, Inc. Systems and methods for analysis of alternative splicing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808975A (en) * 2016-03-14 2016-07-27 南京理工大学 Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN111785321A (en) * 2020-06-12 2020-10-16 浙江工业大学 DNA binding residue prediction method based on deep convolutional neural network
CN112489723A (en) * 2020-12-01 2021-03-12 南京理工大学 DNA binding protein prediction method based on local evolution information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于特征融合的DNA-蛋白质结合位点预测;薛广富;;科学技术创新;20200605(第16期);全文 *

Also Published As

Publication number Publication date
CN114512188A (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN111798921A (en) RNA binding protein prediction method and device based on multi-scale attention convolution neural network
CN111370073B (en) Medicine interaction rule prediction method based on deep learning
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN111933212A (en) Clinical omics data processing method and device based on machine learning
JP2024524795A (en) Gene phenotype prediction based on graph neural networks
CN108427865B (en) Method for predicting correlation between LncRNA and environmental factors
CN114819056B (en) Single-cell data integration method based on domain countermeasure and variation inference
CN109727637B (en) Method for identifying key proteins based on mixed frog-leaping algorithm
CN112927753A (en) Method for identifying interface hot spot residues of protein and RNA (ribonucleic acid) compound based on transfer learning
CN114091603A (en) Spatial transcriptome cell clustering and analyzing method
Yu et al. SANPolyA: a deep learning method for identifying Poly (A) signals
Wang et al. A brief review of machine learning methods for RNA methylation sites prediction
CN114783526A (en) Depth unsupervised single cell clustering method based on Gaussian mixture graph variation self-encoder
CN114743600A (en) Gate-controlled attention mechanism-based deep learning prediction method for target-ligand binding affinity
Yan et al. A review about RNA–protein-binding sites prediction based on deep learning
CN115472221A (en) Protein fitness prediction method based on deep learning
Luo et al. A Caps-UBI model for protein ubiquitination site prediction
CN114512188B (en) DNA binding protein recognition method based on improved protein sequence position specificity matrix
Borah et al. A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis
CN116705192A (en) Drug virtual screening method and device based on deep learning
WO2023148684A1 (en) Local steps in latent space and descriptors-based molecules filtering for conditional molecular generation
CN114627964B (en) Prediction enhancer based on multi-core learning and intensity classification method and classification equipment thereof
CN115691661A (en) Gene coding breeding prediction method and device based on graph clustering
Durge et al. Heuristic analysis of genomic sequence processing models for high efficiency prediction: A statistical perspective
CN113223622B (en) miRNA-disease association prediction method based on meta-path

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant