CN113361752A - Protein solvent accessibility prediction method based on multi-view learning - Google Patents

Protein solvent accessibility prediction method based on multi-view learning Download PDF

Info

Publication number
CN113361752A
CN113361752A CN202110558859.4A CN202110558859A CN113361752A CN 113361752 A CN113361752 A CN 113361752A CN 202110558859 A CN202110558859 A CN 202110558859A CN 113361752 A CN113361752 A CN 113361752A
Authority
CN
China
Prior art keywords
protein
information
sequence
pipeline
solvent accessibility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110558859.4A
Other languages
Chinese (zh)
Other versions
CN113361752B (en
Inventor
胡俊
樊学强
白岩松
郑琳琳
张贵军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhaoji Biotechnology Co ltd
Shenzhen Xinrui Gene Technology Co ltd
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110558859.4A priority Critical patent/CN113361752B/en
Publication of CN113361752A publication Critical patent/CN113361752A/en
Application granted granted Critical
Publication of CN113361752B publication Critical patent/CN113361752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Abstract

A protein solvent accessibility prediction method based on multi-view learning is characterized in that HHblits tools are used for generating corresponding multi-sequence association information according to input protein sequence information of protein solvent accessibility to be determined, and a corresponding position specificity frequency matrix is generated based on the multi-sequence association information; generating a corresponding position specificity score matrix by using a PSI-BLAST tool, and generating corresponding secondary structure information by using a PSIPRED tool; building a multi-view learning neural network framework, collecting all proteins annotated with three-level structure information from a PDB library, generating characteristic information of the proteins, forming a data set with corresponding labels, and learning a prediction model on the data set by using the multi-view learning neural network framework; and inputting the characteristic information of the accessibility of the protein solvent to be predicted into the model to obtain a prediction result of the accessibility of the protein solvent. The method has low calculation cost and high prediction precision.

Description

Protein solvent accessibility prediction method based on multi-view learning
Technical Field
The invention relates to the fields of bioinformatics, pattern recognition and computer application, in particular to a protein solvent accessibility prediction method based on multi-view learning.
Background
In each life activity, the biological function of a protein plays an important role, and the biological function of a protein is mainly determined by its structure. Predicting the solvent accessibility of proteins is a key step in the prediction of protein structure. Therefore, the method for accurately predicting the solvent accessibility of the protein has important guiding significance for the aspects of understanding the protein function, analyzing the interrelation among biomolecules, designing new drugs and the like.
The research literature finds that many methods for predicting the Solvent accessibility of protein amino acids have been proposed, such as san (Joo, K.; Lee, S.J.; Lee, J.san: Solvent accessibility prediction of proteins by means of a protein structure, function, biological. 2012,80,1791, Joo, K, etc. san: a method for predicting the Solvent accessibility of proteins based on the K-neighbor algorithm, protein structure, function, biological. 2012,80,1791. and SPIDER3 (biological R et (2017) Capturing non-local interaction by local storage short term, biological. 2012, protein access prediction, and secondary nerve prediction, 2842. interaction, and secondary nerve access prediction, protein access prediction, secondary nerve access, protein access prediction, and secondary nerve access prediction, and secondary nerve access, protein access prediction, biological, and secondary nerve access, biological, and secondary nerve access, biological, 33(18) 2842 and 2849), etc. Although the existing method can be used for predicting the solvent accessibility of the protein, a large number of training data sets and machine learning algorithms are generally used, so the calculation cost is high, meanwhile, the problems of noise information and data imbalance in the training sets are not paid enough attention, the prediction accuracy cannot be guaranteed to be optimal, and the prediction efficiency needs to be further improved.
In view of the above, the existing prediction methods for protein solvent accessibility have a great gap from the practical application requirements in terms of calculation cost and prediction accuracy, and improvements are urgently needed.
Disclosure of Invention
In order to overcome the defects of high calculation cost and low prediction accuracy of the conventional prediction method of the accessibility of the protein solvent, the invention provides the protein solvent accessibility prediction method based on the iterative search strategy, which is low in calculation cost and high in prediction accuracy.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a multi-view based prediction method of protein solvent accessibility, the method comprising the steps of:
1) inputting a piece of protein sequence information with the number of protein residues L and to be subjected to protein solvent accessibility prediction, and recording the information as S;
2) for any given number of protein residues LXProtein sequence information of (2), denoted SX
3) For protein sequence SXGenerating corresponding multi-sequence association information by using HHBlits tool, and recording the information as
Figure BDA0003078157460000021
Wherein the content of the first and second substances,
Figure BDA0003078157460000022
representing the nth sequence matching information in the MSA, N being the total number of the sequence matching information in the MSA, each sequence matching information containing LXAn element, each element belonging to the set of elements R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
4) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure BDA0003078157460000023
Wherein
Figure BDA0003078157460000024
To represent
Figure BDA0003078157460000025
The first element of (1) when
Figure BDA0003078157460000026
And RrIn the case of the same element type,
Figure BDA0003078157460000027
otherwise
Figure BDA0003078157460000028
5) For a given protein sequence SXGenerating a corresponding position specificity scoring matrix by using a PSI-BLAST tool, and recording the position specificity scoring matrix as PSSM;
6) for a given protein sequence SXGenerating corresponding secondary structure information by using a PSIPRED tool, and recording the secondary structure information as PSS;
7) all proteins with annotated tertiary structure information were collected from the PDB library, and then based on the tertiary structure information of all proteins, a corresponding protein solvent accessibility tag was generated using DSSP tools, denoted as Dataset ═ Si,YiIn which S isiDenotes the i-th protein, Y, in DatasetiDenotes S in DatasetiCorresponding tag information, i ═ 1,2, …, N is the total number of protein sequences in Dataset;
8) building a depth multi-view characteristic learning neural network framework, wherein the neural network framework consists of 4 pipelines which are respectively marked as I, II, III and IV;
9) the pipeline I and the pipeline II are composed of two-layer bidirectional long-short time memory recurrent neural network BilSTM, three linear layers FC and a two-layer attention mechanism module SEnet and are respectively used for extracting evolution information in a position specificity frequency matrix and a position specificity score matrix, and corresponding outputs of the pipeline are respectively recorded as a first output and a second output;
10) the pipeline III consists of two layers of bidirectional long-time memory cyclic neural networks (BilSTM), three linear layers (FC) and a two-layer attention mechanism module SEnet and is used for extracting secondary structure information, and the corresponding output of the pipeline is written as (iii);
11) the pipeline IV consists of three linear layers FC and two layers of attention mechanism modules SENET, and the corresponding output of the pipeline is recorded as the fourth;
12) according to steps 3) to 6), all S in the Dataset are generatediRespectively, are recorded as
Figure BDA0003078157460000031
Figure BDA0003078157460000032
Wherein i is 1,2, … …, N is the total number of protein sequences and the corresponding tag YiComposing a sample set
Figure BDA0003078157460000033
13) Using the steps 8) to 11), the built depth multi-view feature learning neural network framework learns a prediction model on S, and recording the model as DMVFL;
14) in the process of training the DMVFL, the outputs of the steps 9) to 11) are calculated by using a mean square error function (I), the outputs of the steps are respectively compared with the loss of the label (I), the loss of the label (II), the loss of the label (III), and the loss of the label (IV) are recorded as
Figure BDA0003078157460000034
Where T is 4, y is a label, y istIs a predictor of solvent accessibility;
15) and 3) generating corresponding characteristic information of the protein S to be detected through steps 3) -6), and inputting the characteristic information into the trained model DMVFL to obtain the solvent accessibility information of the protein S.
The technical conception of the invention is as follows: a protein solvent accessibility prediction method based on multi-view learning comprises the steps of firstly, generating corresponding multi-sequence association information by using an HHblits tool according to input protein sequence information of protein solvent accessibility to be determined, and generating a corresponding position specificity frequency matrix based on the multi-sequence association information; secondly, generating a corresponding position specificity scoring matrix by using a PSI-BLAST tool according to input protein sequence information of the solvent accessibility of the protein to be determined, and generating corresponding secondary structure information by using a PSIPRED tool; thirdly, building a multi-view learning neural network framework, collecting all proteins with annotated three-level structure information from a PDB library, calculating labels of protein sequences by using a DSSP tool according to the three-level structure information of the proteins, generating characteristic information of the proteins, forming a data set with the corresponding labels, and learning a prediction model on the data set by using the multi-view learning neural network framework; and finally, inputting the characteristic information to be subjected to the protein solvent accessibility prediction into the model to obtain a prediction result of the protein solvent accessibility. The invention provides a protein solvent accessibility prediction method based on multi-view learning, which is low in calculation cost and high in prediction accuracy.
The beneficial effects of the invention are as follows: on one hand, multi-sequence matching information is obtained from the protein sequence, and more useful information is obtained by using a multi-view learning strategy, so that preparation is made for further improving the prediction precision of the accessibility of the protein solvent; on the other hand, more effective information is mined from a plurality of derived information of the protein sequence, and the prediction efficiency and precision of the accessibility of the protein solvent are improved.
Drawings
FIG. 1 is a schematic diagram of a protein solvent accessibility prediction method based on multi-view learning.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, a protein solvent accessibility prediction method based on multi-view learning includes the following steps:
1) inputting a piece of protein sequence information with the number of protein residues L and to be subjected to protein solvent accessibility prediction, and recording the information as S;
2) for any given number of protein residues LXProtein sequence information of (2), denoted SX
3) For protein sequenceColumn SXGenerating corresponding multi-sequence association information by using HHBlits tool, and recording the information as
Figure BDA0003078157460000041
Wherein the content of the first and second substances,
Figure BDA0003078157460000042
representing the nth sequence matching information in the MSA, N being the total number of the sequence matching information in the MSA, each sequence matching information containing LXAn element, each element belonging to the set of elements R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
4) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure BDA0003078157460000043
Wherein
Figure BDA0003078157460000044
To represent
Figure BDA0003078157460000045
The first element of (1) when
Figure BDA0003078157460000046
And RrIn the case of the same element type,
Figure BDA0003078157460000047
otherwise
Figure BDA0003078157460000048
5) For a given protein sequence SXGenerating a corresponding position specificity scoring matrix by using a PSI-BLAST tool, and recording the position specificity scoring matrix as PSSM;
6) for a given protein sequence SXGenerating corresponding secondary structure information by using a PSIPRED tool, and recording the secondary structure information as PSS;
7) collection of annotated triple junctions from PDB librariesAll proteins of the structural information are then generated from their tertiary structural information using DSSP tools to generate corresponding protein solvent accessibility tags, denoted as Dataset ═ Si,YiIn which S isiDenotes the i-th protein, Y, in DatasetiDenotes S in DatasetiCorresponding tag information, i ═ 1,2, …, N is the total number of protein sequences in Dataset;
8) building a depth multi-view characteristic learning neural network framework, wherein the neural network framework consists of 4 pipelines which are respectively marked as I, II, III and IV;
9) the pipeline I and the pipeline II are composed of two-layer bidirectional long-short time memory recurrent neural network BilSTM, three linear layers FC and a two-layer attention mechanism module SEnet and are respectively used for extracting evolution information in a position specificity frequency matrix and a position specificity score matrix, and corresponding outputs of the pipeline are respectively recorded as a first output and a second output;
10) the pipeline III consists of two layers of bidirectional long-time memory cyclic neural networks (BilSTM), three linear layers (FC) and a two-layer attention mechanism module SEnet and is used for extracting secondary structure information, and the corresponding output of the pipeline is written as (iii);
11) the pipeline IV consists of three linear layers FC and two layers of attention mechanism modules SENET, and the corresponding output of the pipeline is recorded as the fourth;
12) according to steps 3) to 6), all S in the Dataset are generatediRespectively, are recorded as
Figure BDA0003078157460000051
Figure BDA0003078157460000052
Wherein i is 1,2, … …, N is the total number of protein sequences and the corresponding tag YiComposing a sample set
Figure BDA0003078157460000053
13) Using the steps 8) to 11), the built depth multi-view feature learning neural network framework learns a prediction model on S, and recording the model as DMVFL;
14) in the process of training the DMVFL, the outputs of the steps 9) to 11) are calculated by using a mean square error function (I), the outputs of the steps are respectively compared with the loss of the label (I), the loss of the label (II), the loss of the label (III), and the loss of the label (IV) are recorded as
Figure BDA0003078157460000054
Where T is 4, y is a label, y istIs a predictor of solvent accessibility;
15) and 3) generating corresponding characteristic information of the protein S to be detected through steps 3) -6), and inputting the characteristic information into the trained model DMVFL to obtain the solvent accessibility information of the protein S.
In this embodiment, the solvent accessibility prediction of protein 1ibaA is taken as an example, and a method for predicting the solvent accessibility of protein based on an iterative search strategy comprises the following steps:
1) inputting a piece of protein sequence information with 76 protein residues and to be subjected to protein solvent accessibility prediction, and recording the information as S;
2) for any given number of protein residues LXProtein sequence information of (2), denoted SX
3) For protein sequence SXGenerating corresponding multi-sequence association information by using HHBlits tool, and recording the information as
Figure BDA0003078157460000055
Wherein
Figure BDA0003078157460000056
Representing the nth sequence matching information in the MSA, N being the total number of the sequence matching information in the MSA, each sequence matching information containing LXAn element, each element belonging to the set of elements R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
4) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure BDA0003078157460000057
Wherein
Figure BDA0003078157460000058
To represent
Figure BDA0003078157460000059
The first element of (1) when
Figure BDA0003078157460000061
And RrIn the case of the same element type,
Figure BDA0003078157460000062
otherwise
Figure BDA0003078157460000067
5) For a given protein sequence SXGenerating a corresponding position specificity scoring matrix by using a PSI-BLAST tool, and recording the position specificity scoring matrix as PSSM;
6) for a given protein sequence SXGenerating corresponding secondary structure information by using a PSIPRED tool, and recording the secondary structure information as PSS;
7) all proteins with annotated tertiary structure information were collected from the PDB library, and then based on the tertiary structure information of all proteins, a corresponding protein solvent accessibility tag was generated using DSSP tools, denoted as Dataset ═ Si,YiIn which S isiDenotes the i-th protein, Y, in DatasetiDenotes S in DatasetiCorresponding tag information, i ═ 1,2, …, N is the total number of protein sequences in Dataset;
8) building a depth multi-view characteristic learning neural network framework, wherein the neural network framework consists of 4 pipelines which are respectively marked as I, II, III and IV;
9) the pipeline I and the pipeline II are composed of two-layer bidirectional long-short time memory recurrent neural network BilSTM, three linear layers FC and a two-layer attention mechanism module SEnet and are respectively used for extracting evolution information in a position specificity frequency matrix and a position specificity score matrix, and corresponding outputs of the pipeline are respectively recorded as a first output and a second output;
10) the pipeline III consists of two layers of bidirectional long-time memory cyclic neural networks (BilSTM), three linear layers (FC) and a two-layer attention mechanism module SEnet and is used for extracting secondary structure information, and the corresponding output of the pipeline is written as (iii);
11) the pipeline IV consists of three linear layers FC and two layers of attention mechanism modules SENET, and the corresponding output of the pipeline is recorded as the fourth;
12) according to steps 3) to 6), all S in the Dataset are generatediRespectively, are recorded as
Figure BDA0003078157460000063
Figure BDA0003078157460000064
Wherein i is 1,2, … …, N is the total number of protein sequences and the corresponding tag YiComposing a sample set
Figure BDA0003078157460000065
13) Using the steps 8) to 11), the built depth multi-view feature learning neural network framework learns a prediction model on S, and recording the model as DMVFL;
14) in the process of training the DMVFL, the outputs of the steps 9) to 11) are calculated by using a mean square error function (I), the outputs of the steps are respectively compared with the loss of the label (I), the loss of the label (II), the loss of the label (III), and the loss of the label (IV) are recorded as
Figure BDA0003078157460000066
Where T is 4, y is a label, y istIs a predictor of solvent accessibility;
15) and 3) generating corresponding characteristic information of the protein S to be detected through steps 3) -6), and inputting the characteristic information into the trained model DMVFL to obtain the solvent accessibility information of the protein S.
Using the solvent accessibility prediction for protein 1ibaA as an example, the solvent accessibility for protein 1ibaA was obtained using the above method.
The above description is the result of the prediction of the solvent accessibility of the protein 1ibaA according to the invention, and is not intended to limit the scope of the invention, and various modifications and improvements can be made without departing from the scope of the invention as defined in the basic content thereof, and are not intended to be excluded from the scope of the invention.

Claims (1)

1. A method for predicting the solvent accessibility of a protein based on multi-view learning, the method comprising the steps of:
1) inputting a piece of protein sequence information with the number of protein residues L and to be subjected to protein solvent accessibility prediction, and recording the information as S;
2) for any given number of protein residues LXProtein sequence information of (2), denoted SX
3) For protein sequence SXGenerating corresponding multi-sequence association information by using HHBlits tool, and recording the information as
Figure FDA0003078157450000011
Wherein the content of the first and second substances,
Figure FDA0003078157450000012
representing the nth sequence matching information in the MSA, N being the total number of the sequence matching information in the MSA, each sequence matching information containing LXAn element, each element belonging to the set of elements R ═ { R ═ R1,…,Rr,…,R21The set R is composed of twenty common amino acids and a complementary space element;
4) for given multi-sequence association information MSA, generating corresponding position specificity frequency matrix, recording as
Figure FDA0003078157450000013
Wherein
Figure FDA0003078157450000014
Figure FDA0003078157450000015
To represent
Figure FDA0003078157450000016
The first element of (1) when
Figure FDA0003078157450000017
And RrIn the case of the same element type,
Figure FDA0003078157450000018
otherwise
Figure FDA0003078157450000019
5) For a given protein sequence SXGenerating a corresponding position specificity scoring matrix by using a PSI-BLAST tool, and recording the position specificity scoring matrix as PSSM;
6) for a given protein sequence SXGenerating corresponding secondary structure information by using a PSIPRED tool, and recording the secondary structure information as PSS;
7) all proteins with annotated tertiary structure information were collected from the PDB library, and then based on the tertiary structure information of all proteins, a corresponding protein solvent accessibility tag was generated using DSSP tools, denoted as Dataset ═ Si,YiIn which S isiDenotes the i-th protein, Y, in DatasetiDenotes S in DatasetiCorresponding tag information, i ═ 1,2, …, N is the total number of protein sequences in Dataset;
8) building a depth multi-view characteristic learning neural network framework, wherein the neural network framework consists of 4 pipelines which are respectively marked as I, II, III and IV;
9) the pipeline I and the pipeline II are composed of two-layer bidirectional long-short time memory recurrent neural network BilSTM, three linear layers FC and a two-layer attention mechanism module SEnet and are respectively used for extracting evolution information in a position specificity frequency matrix and a position specificity score matrix, and corresponding outputs of the pipeline are respectively recorded as a first output and a second output;
10) the pipeline III consists of two layers of bidirectional long-time memory cyclic neural networks (BilSTM), three linear layers (FC) and a two-layer attention mechanism module SEnet and is used for extracting secondary structure information, and the corresponding output of the pipeline is written as (iii);
11) the pipeline IV consists of three linear layers FC and two layers of attention mechanism modules SENET, and the corresponding output of the pipeline is recorded as the fourth;
12) according to steps 3) to 6), all S in the Dataset are generatediRespectively, are recorded as
Figure FDA0003078157450000021
Figure FDA0003078157450000022
Wherein i is 1,2, … …, N is the total number of protein sequences and the corresponding tag YiComposing a sample set
Figure FDA0003078157450000023
13) Using the steps 8) to 11), the built depth multi-view feature learning neural network framework learns a prediction model on S, and recording the model as DMVFL;
14) in the process of training the DMVFL, the outputs of the steps 9) to 11) are calculated by using a mean square error function (I), the outputs of the steps are respectively compared with the loss of the label (I), the loss of the label (II), the loss of the label (III), and the loss of the label (IV) are recorded as
Figure FDA0003078157450000024
Where T is 4, y is a label, y istIs a predictor of solvent accessibility;
15) and 3) generating corresponding characteristic information of the protein S to be detected through steps 3) -6), and inputting the characteristic information into the trained model DMVFL to obtain the solvent accessibility information of the protein S.
CN202110558859.4A 2021-05-21 2021-05-21 Protein solvent accessibility prediction method based on multi-view learning Active CN113361752B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110558859.4A CN113361752B (en) 2021-05-21 2021-05-21 Protein solvent accessibility prediction method based on multi-view learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110558859.4A CN113361752B (en) 2021-05-21 2021-05-21 Protein solvent accessibility prediction method based on multi-view learning

Publications (2)

Publication Number Publication Date
CN113361752A true CN113361752A (en) 2021-09-07
CN113361752B CN113361752B (en) 2022-07-26

Family

ID=77527136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110558859.4A Active CN113361752B (en) 2021-05-21 2021-05-21 Protein solvent accessibility prediction method based on multi-view learning

Country Status (1)

Country Link
CN (1) CN113361752B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121149A (en) * 2021-12-01 2022-03-01 天津理工大学 RNA secondary structure prediction algorithm based on bidirectional GRU and attention mechanism

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808975A (en) * 2016-03-14 2016-07-27 南京理工大学 Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN106295242A (en) * 2016-08-04 2017-01-04 上海交通大学 Protein domain detection method based on cost-sensitive LSTM network
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN107273714A (en) * 2017-06-07 2017-10-20 南京理工大学 The ATP binding site estimation methods of conjugated protein sequence and structural information
KR20200066903A (en) * 2018-12-03 2020-06-11 숙명여자대학교산학협력단 Prediction apparatus of protein solvation free energy using deep learning and method thereof
CN112216345A (en) * 2020-09-27 2021-01-12 浙江工业大学 Protein solvent accessibility prediction method based on iterative search strategy

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105808975A (en) * 2016-03-14 2016-07-27 南京理工大学 Multi-core-learning and Boosting algorithm based protein-DNA binding site prediction method
CN106295242A (en) * 2016-08-04 2017-01-04 上海交通大学 Protein domain detection method based on cost-sensitive LSTM network
CN106909807A (en) * 2017-02-14 2017-06-30 同济大学 A kind of Forecasting Methodology that drug targeting interactions between protein is predicted based on multivariate data
CN107273714A (en) * 2017-06-07 2017-10-20 南京理工大学 The ATP binding site estimation methods of conjugated protein sequence and structural information
KR20200066903A (en) * 2018-12-03 2020-06-11 숙명여자대학교산학협력단 Prediction apparatus of protein solvation free energy using deep learning and method thereof
CN112216345A (en) * 2020-09-27 2021-01-12 浙江工业大学 Protein solvent accessibility prediction method based on iterative search strategy

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114121149A (en) * 2021-12-01 2022-03-01 天津理工大学 RNA secondary structure prediction algorithm based on bidirectional GRU and attention mechanism

Also Published As

Publication number Publication date
CN113361752B (en) 2022-07-26

Similar Documents

Publication Publication Date Title
Zhao et al. HyperAttentionDTI: improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism
Leiva et al. Enrico: A dataset for topic modeling of mobile UI designs
Caudai et al. AI applications in functional genomics
US20160232224A1 (en) Categorization and filtering of scientific data
Shastry et al. Machine learning for bioinformatics
Climer et al. Rearrangement Clustering: Pitfalls, Remedies, and Applications.
US20140052686A1 (en) Method, system and software arrangement for reconstructing formal descriptive models of processes from functional/modal data using suitable ontology
CA2406877A1 (en) Database storage structure
CN111667880A (en) Protein residue contact map prediction method based on depth residual error neural network
Liu et al. Visualizing complex feature interactions and feature sharing in genomic deep neural networks
Saravanan et al. Video image retrieval using data mining techniques
CN113361752B (en) Protein solvent accessibility prediction method based on multi-view learning
CN115472221A (en) Protein fitness prediction method based on deep learning
CN112085245A (en) Protein residue contact prediction method based on deep residual error neural network
CN112216345B (en) Protein solvent accessibility prediction method based on iterative search strategy
Samaddar et al. A model for distributed processing and analyses of ngs data under map-reduce paradigm
Krause et al. Understanding the role of (advanced) machine learning in metagenomic workflows
Chavda et al. Role of Data Mining in Bioinformatics
Wang et al. Fusang: a framework for phylogenetic tree inference via deep learning
Nafar et al. Data mining methods for protein-protein interactions
Albrecht et al. Single-cell specific and interpretable machine learning models for sparse scChIP-seq data imputation
Liu et al. CAKE: a flexible self-supervised framework for enhancing cell visualization, clustering and rare cell identification
Majhi et al. Artificial Intelligence in Bioinformatics
Krismer et al. seqgra: principled selection of neural network architectures for genomics prediction tasks
Yi et al. NetTIME: a multitask and base-pair resolution framework for improved transcription factor binding site prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231227

Address after: 518054, D1101, Building 4, Software Industry Base, No. 19, 17, and 18 Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: Shenzhen Xinrui Gene Technology Co.,Ltd.

Address before: 510075 No. n2248, floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou City, Guangdong Province

Patentee before: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Effective date of registration: 20231227

Address after: 510075 No. n2248, floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou City, Guangdong Province

Patentee after: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Address before: The city Zhaohui six districts Chao Wang Road Hangzhou City, Zhejiang province 310014 18

Patentee before: JIANG University OF TECHNOLOGY