CN112216345A - Protein solvent accessibility prediction method based on iterative search strategy - Google Patents

Protein solvent accessibility prediction method based on iterative search strategy Download PDF

Info

Publication number
CN112216345A
CN112216345A CN202011030157.0A CN202011030157A CN112216345A CN 112216345 A CN112216345 A CN 112216345A CN 202011030157 A CN202011030157 A CN 202011030157A CN 112216345 A CN112216345 A CN 112216345A
Authority
CN
China
Prior art keywords
information
sequence
protein
msa
solvent accessibility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011030157.0A
Other languages
Chinese (zh)
Other versions
CN112216345B (en
Inventor
胡俊
樊学强
董世建
白岩松
张贵军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhaoji Biotechnology Co ltd
Shenzhen Xinrui Gene Technology Co ltd
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202011030157.0A priority Critical patent/CN112216345B/en
Publication of CN112216345A publication Critical patent/CN112216345A/en
Application granted granted Critical
Publication of CN112216345B publication Critical patent/CN112216345B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Abstract

Firstly, according to input protein sequence information of solvent accessibility to be determined, generating corresponding multi-sequence association information by using a HHBlits tool, further generating a corresponding position specificity frequency matrix, and simultaneously carrying out the operation on each protein sequence in a PDB database; secondly, calculating the similarity between the position specificity frequency matrix of the input protein sequence and the position specificity frequency matrix of each protein in the PDB database; then, acquiring a plurality of protein sequences with the highest similarity to the input protein and structure information from a PDB database, and taking the protein sequences and the structure information as template proteins; thirdly, calculating solvent accessibility information of each template protein by using a DSSP tool; finally, the solvent accessibility of the input protein sequence is predicted from the solvent accessibility information of the template protein. The method has low calculation cost and high prediction precision.

Description

Protein solvent accessibility prediction method based on iterative search strategy
Technical Field
The invention relates to the fields of bioinformatics, pattern recognition and computer application, in particular to a protein solvent accessibility prediction method based on an iterative search strategy.
Background
In each life activity, the biological function of a protein plays an important role, and the biological function of a protein is mainly determined by its structure. Predicting the solvent accessibility of proteins is a key step in the prediction of protein structure. Therefore, the method for accurately predicting the solvent accessibility of the protein has important guiding significance for the aspects of understanding the protein function, analyzing the interrelation among biomolecules, designing new drugs and the like.
The research literature finds that many methods for predicting the Solvent accessibility of protein amino acids have been proposed, such as san (Joo, K.; Lee, S.J.; Lee, J.san: Solvent accessibility prediction of proteins by means of a protein structure, function, biological. 2012,80,1791, Joo, K, etc. san: a method for predicting the Solvent accessibility of proteins based on the K-neighbor algorithm, protein structure, function, biological. 2012,80,1791. and SPIDER3 (biological R et (2017) Capturing non-local interaction by local storage short term, biological. 2012, protein access prediction, and secondary nerve prediction, 2842. interaction, and secondary nerve access prediction, protein access prediction, secondary nerve access, protein access prediction, and secondary nerve access prediction, and secondary nerve access, protein access prediction, biological, and secondary nerve access, biological, and secondary nerve access, biological, 33(18) 2842 and 2849), etc. Although the existing method can be used for predicting the solvent accessibility of the protein, a large number of training data sets and machine learning algorithms are generally used, so the calculation cost is high, meanwhile, the problems of noise information and data imbalance in the training sets are not paid enough attention, the prediction accuracy cannot be guaranteed to be optimal, and the prediction efficiency needs to be further improved.
In view of the above, the existing prediction methods for protein solvent accessibility have a great gap from the practical application requirements in terms of calculation cost and prediction accuracy, and improvements are urgently needed.
Disclosure of Invention
In order to overcome the defects of the existing prediction method of the accessibility of the protein solvent in the aspects of calculation cost and prediction accuracy, the invention provides the prediction method of the accessibility of the protein solvent based on the iterative search strategy, which has low calculation cost and high prediction accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting protein solvent accessibility based on an iterative search strategy, the method comprising the steps of:
1) inputting protein sequence information to be subjected to solvent accessibility prediction, wherein the number of protein residues is L, and recording the information as S;
2) for a given protein sequence S, the corresponding multiple-sequence binding information was generated using the HHBlits tool and recorded as
Figure BDA0002703340400000021
Wherein
Figure BDA0002703340400000022
The N-th sequence matching information in the MSA is represented, N is the total number of the sequence matching information in the MSA, each sequence matching information contains L elements, and each element belongs to an element set R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
3) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure BDA0002703340400000023
Wherein
Figure BDA0002703340400000024
Figure BDA0002703340400000025
To represent
Figure BDA0002703340400000026
The first element of (1) when
Figure BDA0002703340400000027
And RrIn the case of the same element type,
Figure BDA0002703340400000028
otherwise
Figure BDA0002703340400000029
4) For any two protein sequences SXAnd SYGiven their multiple sequence alignment information MSAXAnd MSAYThe similarity sim (S) between them is calculated using the following procedureX,SY) And obtain their sequence alignment information ali, as follows:
4.1) according to MSAXAnd MSAYObtaining S using step 3)XAnd SYCorresponding position-specific frequency matrix
Figure BDA00027033404000000210
And
Figure BDA00027033404000000211
4.2) constructing a similarity matrix
Figure BDA00027033404000000212
Wherein
Figure BDA00027033404000000213
4.3) obtaining S by using a Needleman-Wunsch dynamic programming algorithm according to the similarity matrix XYXAnd SYAli and calculating SXAnd SYIs/are as follows
Figure BDA00027033404000000214
Wherein, when ali (l)X) Not equal to-1, ali (l)X) Is SYNeutralization of SXL. 1XA residue ofIndex the residues on the alignment and
Figure BDA00027033404000000215
otherwise, ali (l)X) Is represented by the formulaXL. 1XOn each residue alignment is a complementary space element and
Figure BDA0002703340400000031
5) for each protein in the PDB pool
Figure BDA0002703340400000032
Generating corresponding multi-sequence association information by using step 2)
Figure BDA0002703340400000033
Form a multi-sequence association information set and record it as
Figure BDA0002703340400000034
Wherein I represents the total number of protein sequences in the PDB pool;
6) multiple sequence association information MSA from input sequence S and generated in step 5)
Figure BDA0002703340400000035
Set, using step 4) to calculate MSA and
Figure BDA0002703340400000036
the similarity of each element in the set is obtained, and the protein sequence and the sequence comparison information in the PDB database corresponding to the M elements with the highest similarity are obtained to form a new multi-sequence association information MSAnewThe original MSA used for updating and replacing the input sequence S, and then step 6) is executed, the iteration process is terminated until the MSA information of the input sequence S is converged;
7) for each PDB database protein contained in the MSA obtained in step 6)
Figure BDA0002703340400000037
Calculating corresponding solutions using DSSP tools based on the corresponding three-dimensional structure informationAgent accessibility information, comprising a set of solvent accessibility information, is denoted as
Figure BDA0002703340400000038
Wherein
Figure BDA0002703340400000039
Is composed of
Figure BDA00027033404000000310
The corresponding solvent accessibility information is then communicated to the mobile station,
Figure BDA00027033404000000311
to represent
Figure BDA00027033404000000312
Solvent accessibility information for the first residue in (1);
8) obtained according to step 7)
Figure BDA00027033404000000313
The solvent accessibility information of the input protein sequence S is predicted to be
Figure BDA00027033404000000314
Wherein
Figure BDA00027033404000000315
Is solvent accessibility information for the first residue in S when alim(l) Not equal to-1, alim(l) Index the residue in the m-th sequence in MSA aligned with the l-th residue of S and
Figure BDA00027033404000000316
otherwise, alim(l) Indicates that the alignment with the first residue of S is a complementary space element and
Figure BDA00027033404000000317
the technical conception of the invention is as follows: firstly, according to input protein sequence information of solvent accessibility to be determined, generating corresponding multi-sequence association information by using a HHBlits tool, further generating a corresponding position specificity frequency matrix, and simultaneously carrying out the operation on each protein sequence in a PDB database; secondly, calculating the similarity between the position specificity frequency matrix of the input protein sequence and the position specificity frequency matrix of each protein in the PDB database; then, acquiring a plurality of protein sequences with the highest similarity to the input protein and structure information from a PDB database, and taking the protein sequences and the structure information as template proteins; thirdly, calculating solvent accessibility information of each template protein by using a DSSP tool; finally, the solvent accessibility of the input protein sequence is predicted from the solvent accessibility information of the template protein. The invention provides the protein solvent accessibility prediction method based on the iterative search strategy, which is low in calculation cost and high in prediction precision.
The beneficial effects of the invention are as follows: on one hand, multi-sequence matching information is obtained from the protein sequence, and more useful information is obtained by using an iterative search strategy, so that preparation is made for further improving the prediction precision of the accessibility of the protein solvent; on the other hand, similarity and sequence comparison information are calculated from the multi-sequence matching information of the protein, so that the prediction efficiency and accuracy of the accessibility of the protein solvent are improved.
Drawings
FIG. 1 is a schematic diagram of a protein solvent accessibility prediction method based on an iterative search strategy.
FIG. 2 is a result file of solvent accessibility predictions for protein 1ibaA using an iterative search strategy based protein solvent accessibility prediction method.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a protein solvent accessibility prediction method based on an iterative search strategy comprises the following steps:
1) inputting protein sequence information to be subjected to solvent accessibility prediction, wherein the number of protein residues is L, and recording the information as S;
2) to pairGiven a protein sequence S, the corresponding multiple-sequence binding information was generated using the HHBlits tool and recorded as
Figure BDA0002703340400000041
Wherein
Figure BDA0002703340400000042
The N-th sequence matching information in the MSA is represented, N is the total number of the sequence matching information in the MSA, each sequence matching information contains L elements, and each element belongs to an element set R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
3) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure BDA0002703340400000043
Wherein
Figure BDA0002703340400000044
Figure BDA0002703340400000045
To represent
Figure BDA0002703340400000046
The first element of (1) when
Figure BDA0002703340400000047
And RrIn the case of the same element type,
Figure BDA0002703340400000048
otherwise
Figure BDA0002703340400000049
4) For any two protein sequences SXAnd SYGiven their multiple sequence alignment information MSAXAnd MSAYThe similarity sim (S) between them is calculated using the following procedureX,SY) And get togetherTheir sequence alignment information ali was obtained as follows:
4.1) according to MSAXAnd MSAYObtaining S using step 3)XAnd SYCorresponding position-specific frequency matrix
Figure BDA00027033404000000410
And
Figure BDA00027033404000000411
4.2) constructing a similarity matrix
Figure BDA00027033404000000412
Wherein
Figure BDA00027033404000000413
4.3) obtaining S by using a Needleman-Wunsch dynamic programming algorithm according to the similarity matrix XYXAnd SYAli and calculating SXAnd SYIs/are as follows
Figure BDA0002703340400000051
Wherein, when ali (l)X) Not equal to-1, ali (l)X) Is SYNeutralization of SXL. 1XResidue index on residue alignment and
Figure BDA0002703340400000052
otherwise, ali (l)X) Is represented by the formulaXL. 1XOn each residue alignment is a complementary space element and
Figure BDA0002703340400000053
5) for each protein in the PDB pool
Figure BDA0002703340400000054
Generating corresponding multi-sequence association information by using step 2)
Figure BDA0002703340400000055
Form a multi-sequence association information set and record it as
Figure BDA0002703340400000056
Wherein I represents the total number of protein sequences in the PDB pool;
6) multiple sequence association information MSA from input sequence S and generated in step 5)
Figure BDA0002703340400000057
Set, using step 4) to calculate MSA and
Figure BDA0002703340400000058
the similarity of each element in the set is obtained, and the protein sequence and the sequence comparison information in the PDB database corresponding to the M elements with the highest similarity are obtained to form a new multi-sequence association information MSAnewThe original MSA used for updating and replacing the input sequence S, and then step 6) is executed, the iteration process is terminated until the MSA information of the input sequence S is converged;
7) for each PDB database protein contained in the MSA obtained in step 6)
Figure BDA0002703340400000059
Calculating corresponding solvent accessibility information by using a DSSP tool according to the corresponding three-dimensional structure information to form a solvent accessibility information set which is recorded as
Figure BDA00027033404000000510
Wherein
Figure BDA00027033404000000511
Is composed of
Figure BDA00027033404000000512
The corresponding solvent accessibility information is then communicated to the mobile station,
Figure BDA00027033404000000513
to represent
Figure BDA00027033404000000514
Solvent accessibility information for the first residue in (1);
8) obtained according to step 7)
Figure BDA00027033404000000515
The solvent accessibility information of the input protein sequence S is predicted to be
Figure BDA00027033404000000516
Wherein
Figure BDA00027033404000000517
Is solvent accessibility information for the first residue in S when alim(l) Not equal to-1, alim(l) Index the residue in the m-th sequence in MSA aligned with the l-th residue of S and
Figure BDA00027033404000000518
otherwise, alim(l) Indicates that the alignment with the first residue of S is a complementary space element and
Figure BDA00027033404000000519
in this embodiment, the solvent accessibility prediction of protein 1ibaA is taken as an example, and a method for predicting the solvent accessibility of protein based on an iterative search strategy comprises the following steps:
1) inputting protein sequence information to be subjected to solvent accessibility prediction, wherein the number of protein residues is L, and the protein sequence information is marked as S, wherein L is 78;
2) for a given protein sequence S, the corresponding multiple-sequence binding information was generated using the HHBlits tool and recorded as
Figure BDA0002703340400000061
Wherein
Figure BDA0002703340400000062
The N-th sequence matching information in the MSA is represented, N is the total number of the sequence matching information in the MSA, each sequence matching information contains L elements, and each element belongs to an element set R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
3) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure BDA0002703340400000063
Wherein
Figure BDA0002703340400000064
Figure BDA0002703340400000065
To represent
Figure BDA0002703340400000066
The first element of (1) when
Figure BDA0002703340400000067
And RrIn the case of the same element type,
Figure BDA0002703340400000068
otherwise
Figure BDA0002703340400000069
4) For any two protein sequences SXAnd SYGiven their multiple sequence alignment information MSAXAnd MSAYThe similarity sim (S) between them is calculated using the following procedureX,SY) And obtain their sequence alignment information ali, as follows:
4.1) according to MSAXAnd MSAYObtaining S using step 3)XAnd SYCorresponding position-specific frequency matrix
Figure BDA00027033404000000610
And
Figure BDA00027033404000000611
4.2) constructing a similarity matrix
Figure BDA00027033404000000612
Wherein
Figure BDA00027033404000000613
4.3) obtaining S by using a Needleman-Wunsch dynamic programming algorithm according to the similarity matrix XYXAnd SYAli and calculating SXAnd SYIs/are as follows
Figure BDA00027033404000000614
Wherein, when ali (l)X) Not equal to-1, ali (l)X) Is SYNeutralization of SXL. 1XResidue index on residue alignment and
Figure BDA00027033404000000615
otherwise, ali (l)X) Is represented by the formulaXL. 1XOn each residue alignment is a complementary space element and
Figure BDA00027033404000000616
5) for each protein in the PDB pool
Figure BDA00027033404000000617
Generating corresponding multi-sequence association information by using step 2)
Figure BDA00027033404000000618
Form a multi-sequence association information set and record it as
Figure BDA00027033404000000619
Wherein I represents the total number of protein sequences in the PDB pool;
6) multiple sequence association information MSA from input sequence S and generated in step 5)
Figure BDA00027033404000000620
Set, using step 4) to calculate MSA and
Figure BDA00027033404000000621
the similarity of each element in the set is obtained, and the protein sequence and the sequence comparison information in the PDB database corresponding to the M elements with the highest similarity are obtained to form a new multi-sequence association information MSAnewThe original MSA used for updating and replacing the input sequence S, and then step 6) is executed, the iteration process is terminated until the MSA information of the input sequence S is converged;
7) for each PDB database protein contained in the MSA obtained in step 6)
Figure BDA0002703340400000071
Calculating corresponding solvent accessibility information by using a DSSP tool according to the corresponding three-dimensional structure information to form a solvent accessibility information set which is recorded as
Figure BDA0002703340400000072
Wherein
Figure BDA0002703340400000073
Is composed of
Figure BDA0002703340400000074
The corresponding solvent accessibility information is then communicated to the mobile station,
Figure BDA0002703340400000075
to represent
Figure BDA0002703340400000076
Solvent accessibility information for the first residue in (1);
8) obtained according to step 7)
Figure BDA0002703340400000077
The solvent accessibility information of the input protein sequence S is predicted to be
Figure BDA0002703340400000078
Wherein
Figure BDA0002703340400000079
Is solvent accessibility information for the first residue in S when alim(l) Not equal to-1, alim(l) Index the residue in the m-th sequence in MSA aligned with the l-th residue of S and
Figure BDA00027033404000000710
otherwise, alim(l) Indicates that the alignment with the first residue of S is a complementary space element and
Figure BDA00027033404000000711
as an example of predicting the solvent accessibility of protein 1ibaA, the solvent accessibility result file of protein 1ibaA obtained by the above method is shown in FIG. 2.
The above description is the result of the prediction of the solvent accessibility of the protein 1ibaA according to the invention, and is not intended to limit the scope of the invention, and various modifications and improvements can be made without departing from the scope of the invention as defined in the basic content thereof, and are not intended to be excluded from the scope of the invention.

Claims (1)

1. A method for predicting the solvent accessibility of a protein based on an iterative search strategy, the method comprising the steps of:
1) inputting protein sequence information to be subjected to solvent accessibility prediction, wherein the number of protein residues is L, and recording the information as S;
2) for a given protein sequence S, the corresponding multiple-sequence binding information was generated using the HHBlits tool and recorded as
Figure FDA0002703340390000011
Wherein
Figure FDA0002703340390000012
The N-th sequence matching information in the MSA is represented, N is the total number of the sequence matching information in the MSA, each sequence matching information contains L elements, and each element belongs to an element set R ═ { R ═ R1,…,Rr,…,R21The set R is made of twentyThe amino acid is composed of common amino acid and a complementary space element;
3) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure FDA0002703340390000013
Wherein
Figure FDA0002703340390000014
Figure FDA0002703340390000015
To represent
Figure FDA0002703340390000016
The first element of (1) when
Figure FDA0002703340390000017
And RrIn the case of the same element type,
Figure FDA0002703340390000018
otherwise
Figure FDA0002703340390000019
4) For any two protein sequences SXAnd SYGiven their multiple sequence alignment information MSAXAnd MSAYThe similarity sim (S) between them is calculated using the following procedureX,SY) And obtain their sequence alignment information ali, as follows:
4.1) according to MSAXAnd MSAYObtaining S using step 3)XAnd SYCorresponding position-specific frequency matrix
Figure FDA00027033403900000110
And
Figure FDA00027033403900000111
4.2) constructing a similarity matrix
Figure FDA00027033403900000112
Wherein
Figure FDA00027033403900000113
4.3) obtaining S by using a Needleman-Wunsch dynamic programming algorithm according to the similarity matrix XYXAnd SYAli and calculating SXAnd SYIs/are as follows
Figure FDA00027033403900000114
Wherein, when ali (l)X) Not equal to-1, ali (l)X) Is SYNeutralization of SXL. 1XResidue index on residue alignment and
Figure FDA00027033403900000115
otherwise, ali (l)X) Is represented by the formulaXL. 1XThe residues are complementary space elements in the alignment and
Figure FDA00027033403900000116
5) for each protein in the PDB pool
Figure FDA00027033403900000117
Generating corresponding multi-sequence association information by using step 2)
Figure FDA0002703340390000021
Form a multi-sequence association information set and record it as
Figure FDA0002703340390000022
Wherein I represents the total number of protein sequences in the PDB pool;
6) multiple sequence association information MSA from input sequence S and generated in step 5)
Figure FDA0002703340390000023
Set, using step 4) to calculate MSA and
Figure FDA0002703340390000024
the similarity of each element in the set is obtained, and the protein sequence and the sequence comparison information in the PDB database corresponding to the M elements with the highest similarity are obtained to form a new multi-sequence association information MSAnewThe original MSA used for updating and replacing the input sequence S, and then step 6) is executed, the iteration process is terminated until the MSA information of the input sequence S is converged;
7) for each PDB database protein contained in the MSA obtained in step 6)
Figure FDA0002703340390000025
Calculating corresponding solvent accessibility information by using a DSSP tool according to the corresponding three-dimensional structure information to form a solvent accessibility information set which is recorded as
Figure FDA0002703340390000026
Wherein
Figure FDA0002703340390000027
Is composed of
Figure FDA0002703340390000028
The corresponding solvent accessibility information is then communicated to the mobile station,
Figure FDA0002703340390000029
to represent
Figure FDA00027033403900000210
Solvent accessibility information for the first residue in (1);
8) obtained according to step 7)
Figure FDA00027033403900000211
Solvent accessibility of input protein sequence SInformation is predicted to be
Figure FDA00027033403900000212
Wherein
Figure FDA00027033403900000213
Is solvent accessibility information for the first residue in S when alim(l) Not equal to-1, alim(l) Index the residue in the m-th sequence in MSA aligned with the l-th residue of S and
Figure FDA00027033403900000214
otherwise, alim(l) Indicates that the residue alignment with the first residue of S is a complementary space element and
Figure FDA00027033403900000215
CN202011030157.0A 2020-09-27 2020-09-27 Protein solvent accessibility prediction method based on iterative search strategy Active CN112216345B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011030157.0A CN112216345B (en) 2020-09-27 2020-09-27 Protein solvent accessibility prediction method based on iterative search strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011030157.0A CN112216345B (en) 2020-09-27 2020-09-27 Protein solvent accessibility prediction method based on iterative search strategy

Publications (2)

Publication Number Publication Date
CN112216345A true CN112216345A (en) 2021-01-12
CN112216345B CN112216345B (en) 2021-12-17

Family

ID=74050806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011030157.0A Active CN112216345B (en) 2020-09-27 2020-09-27 Protein solvent accessibility prediction method based on iterative search strategy

Country Status (1)

Country Link
CN (1) CN112216345B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361752A (en) * 2021-05-21 2021-09-07 浙江工业大学 Protein solvent accessibility prediction method based on multi-view learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689920A (en) * 2019-09-18 2020-01-14 上海交通大学 Protein-ligand binding site prediction algorithm based on deep learning
WO2020076772A1 (en) * 2018-10-08 2020-04-16 Freenome Holdings, Inc. Transcription factor profiling
CN111554346A (en) * 2020-04-29 2020-08-18 上海交通大学 Protein sequence design implementation method based on multi-objective optimization
US20200286581A1 (en) * 2017-03-23 2020-09-10 Rutgers, The State University Of New Jersey Systems and methods for modeling a protein parameter for understanding protein interactions and generating an energy map

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200286581A1 (en) * 2017-03-23 2020-09-10 Rutgers, The State University Of New Jersey Systems and methods for modeling a protein parameter for understanding protein interactions and generating an energy map
WO2020076772A1 (en) * 2018-10-08 2020-04-16 Freenome Holdings, Inc. Transcription factor profiling
CN110689920A (en) * 2019-09-18 2020-01-14 上海交通大学 Protein-ligand binding site prediction algorithm based on deep learning
CN111554346A (en) * 2020-04-29 2020-08-18 上海交通大学 Protein sequence design implementation method based on multi-objective optimization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHONG LI: ""Protein Contact Map Prediction Based on ResNet and DenseNet"", 《BIOMED RESEARCH INTERNATIONAL》 *
张海仓: ""蛋白质中残基远程相互作用预测算法研究综述"", 《计算机研究与发展》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361752A (en) * 2021-05-21 2021-09-07 浙江工业大学 Protein solvent accessibility prediction method based on multi-view learning
CN113361752B (en) * 2021-05-21 2022-07-26 浙江工业大学 Protein solvent accessibility prediction method based on multi-view learning

Also Published As

Publication number Publication date
CN112216345B (en) 2021-12-17

Similar Documents

Publication Publication Date Title
Zhao et al. HyperAttentionDTI: improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism
Climer et al. Rearrangement Clustering: Pitfalls, Remedies, and Applications.
CN110289050B (en) Drug-target interaction prediction method based on graph convolution sum and word vector
CN112149881B (en) DNA binding residue prediction method based on convolutional neural network
Olson et al. Evolutionary-inspired probabilistic search for enhancing sampling of local minima in the protein energy surface
EP2504776A1 (en) Density based clustering for multidimensional data
CN110176272B (en) Protein disulfide bond prediction method based on multi-sequence association information
CN111667880A (en) Protein residue contact map prediction method based on depth residual error neural network
CN111081312B (en) Ligand binding residue prediction method based on multi-sequence association information
CN112216345B (en) Protein solvent accessibility prediction method based on iterative search strategy
Wang et al. Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining
CN110163243B (en) Protein structure domain dividing method based on contact graph and fuzzy C-means clustering
CN109346125B (en) Rapid and accurate protein binding pocket structure alignment method
CN112149885B (en) Ligand binding residue prediction method based on sequence template
CN112365921B (en) Protein secondary structure prediction method based on long-time and short-time memory network
Unsleber Accelerating Reaction Network Explorations with Automated Reaction Template Extraction and Application
CN113361752B (en) Protein solvent accessibility prediction method based on multi-view learning
CN104376120A (en) Information retrieval method and system
CN112837740B (en) DNA binding residue prediction method based on structural characteristics
Olson et al. Enhancing sampling of the conformational space near the protein native state
Geethu et al. Improved 3-D protein structure predictions using deep ResNet model
CN112820355A (en) Molecular virtual screening method based on protein sequence comparison
Wang et al. Explore the hidden treasure in protein–protein interaction networks—An iterative model for predicting protein functions
Leinweber et al. CavSimBase: a database for large scale comparison of protein binding sites
Albrecht et al. Single-cell specific and interpretable machine learning models for sparse scChIP-seq data imputation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20221111

Address after: N2248, Floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou, Guangdong 510,000

Patentee after: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Address before: The city Zhaohui six districts Chao Wang Road Hangzhou City, Zhejiang province 310014 18

Patentee before: JIANG University OF TECHNOLOGY

Effective date of registration: 20221111

Address after: D1101, Building 4, Software Industry Base, No. 19, 17, 18, Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen, Guangdong, 518000

Patentee after: Shenzhen Xinrui Gene Technology Co.,Ltd.

Address before: N2248, Floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou, Guangdong 510,000

Patentee before: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

TR01 Transfer of patent right