CN110853703A - Semi-supervised learning prediction method for protein secondary structure - Google Patents

Semi-supervised learning prediction method for protein secondary structure Download PDF

Info

Publication number
CN110853703A
CN110853703A CN201910982228.8A CN201910982228A CN110853703A CN 110853703 A CN110853703 A CN 110853703A CN 201910982228 A CN201910982228 A CN 201910982228A CN 110853703 A CN110853703 A CN 110853703A
Authority
CN
China
Prior art keywords
semi
protein
neural network
secondary structure
gan
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910982228.8A
Other languages
Chinese (zh)
Inventor
宫秀军
赵兴海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910982228.8A priority Critical patent/CN110853703A/en
Publication of CN110853703A publication Critical patent/CN110853703A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a method for predicting the secondary structure of protein by semi-supervised learning, which comprises the following steps: (1) acquiring a protein sequence data set; (2) carrying out data cleaning and feature extraction on the acquired data set; (3) building a Semi-GAN neural network model; (4) training a Semi-GAN neural network model: (5) adjusting parameters of the Semi-GAN neural network model; (6) the Semi-GAN neural network model was evaluated. The invention can establish a semi-supervised prediction model for the secondary structure of the protein under the condition of a large amount of data with missing value labels. A large amount of manpower and financial resources are saved.

Description

Semi-supervised learning prediction method for protein secondary structure
Technical Field
The invention relates to the field of bioinformatics and deep learning, and belongs to a key research problem of bioinformatics prediction. The method utilizes a protein data set with a missing label to train a deep learning classification model to predict the secondary structure of the protein.
Background
Protein secondary structure prediction is the inference of the secondary structure of a protein fragment based on its amino acid sequence. In bioinformatics and theoretical chemistry, protein secondary structure prediction is very important for medicine and biotechnology, such as drug design and design of novel enzymes. Since secondary structure can be used to find distant relationships of proteins with unaligned primary structure, combining secondary structure information with simple sequence information can improve the accuracy of their alignment. Finally, protein secondary structure prediction also plays an important role in protein tertiary structure prediction. The secondary structure of the protein can determine the structural type of the partial fragment of the protein, and thus the degree of freedom of the partial fragment of the protein in the tertiary structure can be reduced. Therefore, accurate secondary structure prediction is likely to improve the accuracy of protein tertiary structure prediction.
The objective of protein secondary structure prediction is to predict whether the residue at the center of an amino acid sequence fragment is in α helix, β fold or random coil, although it is generally thought that it is enough to determine the three-dimensional structure of a protein by having enough amino acid sequence information, it is difficult in practice, especially in the case of lacbel, which is a missing protein secondary structure.
Disclosure of Invention
The object of the present invention is to overcome the deficiencies of the prior art, and although several deep learning methods for secondary structure prediction have been developed, the problem of semi-supervised classification of secondary structures has never been studied before. Therefore, a semi-supervised learning prediction method for the protein secondary structure is provided, and a discriminator utilizing a countermeasure generation network (GAN) is modified into a classifier to carry out semi-supervised prediction on the protein secondary structure.
The purpose of the invention is realized by the following technical scheme:
a semi-supervised learning prediction method for protein secondary structure comprises the following steps:
(1) acquiring a protein sequence data set;
(2) carrying out data cleaning and feature extraction on the acquired data set;
(3) building a Semi-GAN neural network model; the Semi-GAN neural network model comprises a generator, a discriminator and a loss function, wherein the generator comprises three deconvolution neural networks, the three deconvolution neural networks are respectively subjected to normalization processing, and a leak ReLU function is adopted as an activation function to prevent overfitting; the discriminator structure uses a network structure of a convolutional neural network, a normalization process and a ReLU activation function; the loss function divides the discriminator loss into two parts: one represents the GAN problem, unsupervised loss; the other is to calculate the probability of a single real class and supervise the loss; for unsupervised loss, the discriminator must distinguish between real training samples and false samples from the generator; in both cases, the binary classification problem is being handled; since the probability value of the true sample is expected to be close to 1, and the probability value of the non-true sample is expected to be close to 0, the sigmoid cross entropy function is used to calculate the loss; for samples from the training set, maximize their true probability by assigning labels of 1; for the synthetic samples from the generator, by labeling them with a 0 to maximize their pseudo-probability;
(4) training a Semi-GAN neural network model:
(5) adjusting parameters of the Semi-GAN neural network model;
(6) the Semi-GAN neural network model was evaluated.
Further, the dataset used in step (1) is a CullPDB dataset consisting of 6133 proteins, each protein having 39900 features; the 6133 protein × 39900 signature can be reshaped into 6133 protein × 700 amino acids × 57 signatures.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects: the present invention is the field of applying semi-supervised learning to protein secondary structure prediction for the first time. The invention can establish a semi-supervised prediction model for the secondary structure of the protein under the condition that the protein data set has a large number of missing labels, thus avoiding the data annotation of the protein data set with the missing labels by the protein, and the data annotation of the protein sequence is a difficult work and needs a large amount of manpower and financial resources.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Fig. 2 is a schematic structural diagram of the Semi-GAN neural network model in this embodiment.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a method for predicting a protein secondary structure by semi-supervised learning, which is shown in figure 1 and specifically comprises the following steps:
first, acquiring a protein data set
First, a data set needs to be acquired. In this example, the dataset used was a CullPDB dataset consisting of 6133 proteins, each protein having 39900 features. The 6133 protein × 39900 signature can be reshaped into 6133 protein × 700 amino acids × 57 signatures. The following table is the protein secondary structure classes and the frequency of occurrence of each class:
description of protein secondary structure classes and class frequencies in the data set.
In this example, the amino acid chains are depicted by a 700 × 57 matrix to keep the data size consistent. 700 denotes the peptide chain and 57 denotes the number of features per amino acid. When the end of the chain is reached, the remainder of the vector will simply be labeled "NoSeq" (unmarked padding). Of the 57 features, 22 represent the primary structure (of which 20 amino acids are included, 1 represents an unknown amino acid, the other represents "No Seq" (unlabeled filling), 22 protein spectra (identical to the primary structure), 9 are possible states of the secondary structure (8), the other likewise represents "No Seq" (unlabeled filling). CB513 is a common test dataset, as an independent test set for the testing of CB513, CB6133 is used as a training dataset because of the redundancy between CB513 and CB6133, CB6133 is filtered by removing sequences with more than 25% sequence similarity to the sequences in CB 513. after filtering, 5534 protein remaining in CB6133 is used as a training sample because the protein sequence spectra with evolutionary information has been a breakthrough in the prediction of the secondary structure of the protein.thus, a Position Specific Scoring Matrix (PSSM) feature is used here, these features are widely used features that can be extracted from protein mass spectra by defining the protein secondary structure (DSSP) and the location-specific iterative basic local alignment search tool (PSI-BLAST). Data used for training contained features and tags in 56 channels (PSSM 22, amino acid sequence 22, carbon and nitrogen termini 2, solvent accessibility tag 2, secondary structure tag 8)). The training data included 700 amino acids. It is believed to provide a good balance between efficiency and coverage, since most protein chains are shorter than 700 amino acids. In training and testing, shorter sequences (less than 700 amino acids) are filled with 0 s.
Secondly, cleaning the data and extracting the characteristics
In this example, only one protein secondary structure tag is output at a time, and the amino acid sequence is specially processed. Assuming a database with 700 amino acid sequences, to predict protein secondary structure, a sliding window is set up through which to return to the model batch matrix. The window is a small portion of the complete protein string. This sliding window is in essence similar to a one-dimensional convolution.
The window size should be chosen to be greater than 11 because the average length of the α helices is approximately 11 residues and the average length of the β strands is approximately 6. a number of uniform sizes from 11 to 23 were tested, 17 of which produced the best results (performance/training time trade-off). the window was shifted one unit at a time, predicting the central amino acid.
The data set of the collapdb + profile _6133 is processed by the method, and the final result is divided into 80% serving as a set, 10% serving as a cross-validation set and 10% serving as a test set. The same operation was taken for the CB513 data set as an independent test set of the model.
Thirdly, building a Semi-GAN neural network model
The generator and the discriminator are trained simultaneously when constructing the GAN for generating the sample. After training, the generator can be discarded because it is only used to train the discriminator/classifier. In this embodiment, the generator is used only to help the discriminator during training. In other words, the generator acts as a distinct source of information from which the discriminator obtains the original unlabeled training data. It can be seen that these unmarked data are key to improving the performance of the discriminator. Furthermore, the discriminator has only one role for conventional sample generation GAN. The probability of whether its input is true is calculated.
First, the work to be done is: in order to turn the discriminator into a semi-supervised classifier, the discriminator must learn the probability of each raw data set class in addition to the GAN problem. In other words, for each input datum, the discriminator must know its specific classification probability. For generating the GAN discriminator, a sigmoid unit output is set. This value represents the probability that the input data is true (value close to 1) or spurious (value close to 0). In other words, from the perspective of the discriminator, a value close to 1 means that the sample is likely from the training set. Also, a value close to 0 means that the variation of the samples from the generator network is higher. By using the probability, the discriminator can send a signal to the generator. This signal allows the generator to adjust its parameters during training, which may improve its ability to create realistic data.
Second, assuming 8-state classification of the protein secondary structure, the discriminator (from the previous GAN) must be converted to a class 9 classifier. For this purpose, its sigmoid output may be converted to softmax with class 9 outputs. The first 8 of each class probability for the protein secondary structure data set (0 to 9), and the 9 th class for all spurious data from the generator. If the class 9 probability is set to 0, then the sum of the first 8 probabilities represents the same probability calculated using the sigmoid function.
Finally, a penalty needs to be set so that the discriminator can perform both of the following operations:
(i) helping the generator learn to generate realistic samples. To do this, the discriminator must be instructed to distinguish between true and false samples.
(ii) The generator's samples and labeled and unlabeled training data are used to help classify the data set.
In summary, there are three different sources of training data for the discriminator.
Real data with a tag. These are data tag pairs as any conventional supervised classification problem; without the true data of the tag. For these, the classifier only knows that these data are authentic; data from the generator. For these, the discriminator learns to classify them as false samples.
The aim of the invention is to be able to predict with sufficient accuracy which class the secondary structure belongs to in the case of a missing secondary structure tag of a protein. Not only the 3-state prediction is concerned, but also the 8-state prediction needs to be concerned, and the 8-state prediction explains more structural information, see Table 1. At present, evolution information of location-specific scoring matrices (PSSM) has been recognized as the most suitable information feature for research
Specifically, the overall structure of the Semi-GAN neural network model in this embodiment is shown in fig. 2;
the generator, following a very standard implementation described in the DCGAN paper. The method includes taking a random vector z as input. Reshaping it into a 4D tensor and inputting it into a series of deconvolution neural networks, where three deconvolution neural networks are set, and respectively performing Batch Normalization on them to accelerate the optimization of the gradient, and then using the learky ReLU function as the activation function to prevent overfitting.
The discriminator is modified into a multi-class classifier for demand. Here, the present invention designs a similar DCGAN architecture using several sets of networks of convolutional neural network + BN (normalization process) and ReLU activation functions. And cross-over convolution is used to reduce the dimensionality of the feature vector. Not all convolutions perform this type of computation. When the dimensionality of the feature vector is to be kept constant, the convolution kernel uses a step size of 1, otherwise a step size of 2 is used. Finally, for stable learning, BN was used for normalization (except for the first layer of convolutional neural network). The 2D convolution window (kernel or filter) is set to 3(3 x 3) for the width and height of all convolutions. Therefore, it may encounter problems with any classifier if it is not well designed. One of the most likely drawbacks that may be encountered when training a large classifier on a very limited data set is overfitting. It is noted that "trained" classifiers typically show a significant difference between training errors (smaller) and testing errors (higher). This indicates that the model captures the structure of the training data set well. However, it cannot generalize the unseen examples because it too trusts the training data. To prevent this, the process may be performed by dropout normalization. Finally, instead of applying a fully connected neural network layer on top of the convolution stack, a Global Average Pooling (GAP) is performed. In GAP, the spatial dimension of the feature vector is averaged. This operation results in a compression of the tensor dimension to a value.
Loss function: as a core of the present invention, the discriminator loss is divided into two parts. One represents the GAN problem, unsupervised loss. The other is to calculate a single true class probability, supervise the loss. For unsupervised wear, the discriminator must distinguish between real training samples and false samples from the generator. For normal GAN, half of the time discriminators receive unlabeled samples from the training set and the other half of the time receives fictitious unlabeled samples from the generator. In both cases, the binary classification problem is being handled. Since the probability value of the true sample is expected to be close to 1, and the probability value of the non-true sample is expected to be close to 0, the sigmoid cross entropy function is used to calculate the loss. For samples from the training set, their true probability is maximized by assigning a label of 1. For the synthetic samples from the generator, their pseudo-probability is maximized by labeling them with a 0.
Fourthly, training and adjusting parameters of the model;
and finally, selecting hyper-parameters such as the number of network layers, learning rate, dropout coefficient, and parameter of adma in a network structure by adopting a grid searching method, and predicting results of different protein data sets with label ratios. The results are as follows:
the study was semi-supervised trained using the cutlpdb, experimental tests were performed using the cb513, and to arrive at a semi-supervised learned data set to inject noise into the labels, a parameter was set to specify the label data ratio in the training set. The overall performance of the deep network (semi-GAN) was evaluated by performing several sets of experiments. In a first set of experiments, the cullpdb + profile _6133 dataset was trained and tested. 80%, 60%, 40%, 20% have been trained and all data are referred to tables 1, 2 and 3 respectively.
Table 1: predicted expression of the secondary Structure of the Q8 protein
Figure BDA0002235572850000061
Table 2: predicted expression of the secondary Structure of the Q3 protein
Figure BDA0002235572850000062
Table 3: predicted global trend table for protein secondary structure
Figure BDA0002235572850000071
According to the final experimental result of the embodiment, as expected from the beginning of the experiment, although the accuracy of the protein secondary structure prediction is improved along with the increase of the proportion of the labeled labels, the accuracy difference is not large, so that a semi-supervised prediction model can be established for the protein secondary structure under the condition that a large amount of missing value data exists. A large amount of manpower and financial resources are saved.
The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (2)

1. A semi-supervised learning prediction method for protein secondary structure is characterized by comprising the following steps:
(1) acquiring a protein sequence data set;
(2) carrying out data cleaning and feature extraction on the acquired data set;
(3) building a Semi-GAN neural network model; the Semi-GAN neural network model comprises a generator, a discriminator and a loss function, wherein the generator comprises three deconvolution neural networks, the three deconvolution neural networks are respectively subjected to normalization processing, and a leak ReLU function is adopted as an activation function to prevent overfitting; the discriminator structure uses a network structure of a convolutional neural network, a normalization process and a ReLU activation function; the loss function divides the discriminator loss into two parts: one represents the GAN problem, unsupervised loss; the other is to calculate the probability of a single real class and supervise the loss; for unsupervised loss, the discriminator must distinguish between real training samples and false samples from the generator; in both cases, the binary classification problem is being handled; in order to make the probability value of the real sample close to 1 and the probability value of the non-real sample close to 0, calculating loss by using a sigmoid cross entropy function; for samples from the training set, maximize their true probability by assigning labels of 1; the synthetic samples from the generator are labeled with 0 to maximize their pseudo-probability;
(4) training a Semi-GAN neural network model:
(5) adjusting parameters of the Semi-GAN neural network model;
(6) the Semi-GAN neural network model was evaluated.
2. The method for predicting the secondary structure of protein through semi-supervised learning of claim 1, wherein the data set used in the step (1) is a CullPDB data set consisting of 6133 proteins, each protein having 39900 features; the 6133 protein × 39900 signature can be reshaped into 6133 protein × 700 amino acids × 57 signatures.
CN201910982228.8A 2019-10-16 2019-10-16 Semi-supervised learning prediction method for protein secondary structure Pending CN110853703A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910982228.8A CN110853703A (en) 2019-10-16 2019-10-16 Semi-supervised learning prediction method for protein secondary structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910982228.8A CN110853703A (en) 2019-10-16 2019-10-16 Semi-supervised learning prediction method for protein secondary structure

Publications (1)

Publication Number Publication Date
CN110853703A true CN110853703A (en) 2020-02-28

Family

ID=69597529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910982228.8A Pending CN110853703A (en) 2019-10-16 2019-10-16 Semi-supervised learning prediction method for protein secondary structure

Country Status (1)

Country Link
CN (1) CN110853703A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001329A (en) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 Method and device for predicting protein expression amount, computer device and storage medium
CN113066528A (en) * 2021-04-12 2021-07-02 山西大学 Protein classification method based on active semi-supervised graph neural network
CN113851192A (en) * 2021-09-15 2021-12-28 安庆师范大学 Amino acid one-dimensional attribute prediction model training method and device and attribute prediction method
WO2022178949A1 (en) * 2021-02-26 2022-09-01 平安科技(深圳)有限公司 Semantic segmentation method and apparatus for electron microtomography data, device, and medium
CN115312119A (en) * 2022-10-09 2022-11-08 之江实验室 Method and system for identifying protein structural domain based on protein three-dimensional structure image

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847181A (en) * 2010-04-30 2010-09-29 天津大学 Tissue-specific gene and regulatory factor data storage method
CN102184346A (en) * 2011-05-09 2011-09-14 天津大学 Method for constructing and analyzing tissue-specific interaction topology network
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN109311937A (en) * 2016-06-15 2019-02-05 新加坡科技研究局 Method of the enhancing for the chromatographic performance of protein purification
US20190122120A1 (en) * 2017-10-20 2019-04-25 Dalei Wu Self-training method and system for semi-supervised learning with generative adversarial networks
CN110097103A (en) * 2019-04-22 2019-08-06 西安电子科技大学 Based on the semi-supervision image classification method for generating confrontation network
CN110110745A (en) * 2019-03-29 2019-08-09 上海海事大学 Based on the semi-supervised x-ray image automatic marking for generating confrontation network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847181A (en) * 2010-04-30 2010-09-29 天津大学 Tissue-specific gene and regulatory factor data storage method
CN102184346A (en) * 2011-05-09 2011-09-14 天津大学 Method for constructing and analyzing tissue-specific interaction topology network
CN109311937A (en) * 2016-06-15 2019-02-05 新加坡科技研究局 Method of the enhancing for the chromatographic performance of protein purification
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
US20190122120A1 (en) * 2017-10-20 2019-04-25 Dalei Wu Self-training method and system for semi-supervised learning with generative adversarial networks
CN110110745A (en) * 2019-03-29 2019-08-09 上海海事大学 Based on the semi-supervised x-ray image automatic marking for generating confrontation network
CN110097103A (en) * 2019-04-22 2019-08-06 西安电子科技大学 Based on the semi-supervision image classification method for generating confrontation network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李洪顺;于华;宫秀军: "一种只利用序列信息预测RNA结合蛋白的深度学习模型", 计算机研究与发展 *
赖向阳;宫秀军;韩来明;: "一种MapReduce架构下基于遗传算法的K-Medoids聚类", 计算机科学 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001329A (en) * 2020-08-26 2020-11-27 东莞太力生物工程有限公司 Method and device for predicting protein expression amount, computer device and storage medium
CN112001329B (en) * 2020-08-26 2021-11-30 深圳太力生物技术有限责任公司 Method and device for predicting protein expression amount, computer device and storage medium
WO2022178949A1 (en) * 2021-02-26 2022-09-01 平安科技(深圳)有限公司 Semantic segmentation method and apparatus for electron microtomography data, device, and medium
CN113066528A (en) * 2021-04-12 2021-07-02 山西大学 Protein classification method based on active semi-supervised graph neural network
CN113066528B (en) * 2021-04-12 2022-07-19 山西大学 Protein classification method based on active semi-supervised graph neural network
CN113851192A (en) * 2021-09-15 2021-12-28 安庆师范大学 Amino acid one-dimensional attribute prediction model training method and device and attribute prediction method
CN115312119A (en) * 2022-10-09 2022-11-08 之江实验室 Method and system for identifying protein structural domain based on protein three-dimensional structure image
US11908140B1 (en) 2022-10-09 2024-02-20 Zhejiang Lab Method and system for identifying protein domain based on protein three-dimensional structure image

Similar Documents

Publication Publication Date Title
CN110853703A (en) Semi-supervised learning prediction method for protein secondary structure
Nguyen et al. Multi-class support vector machines for protein secondary structure prediction
CN109977994B (en) Representative image selection method based on multi-example active learning
Nguyen et al. Learning graph representation via frequent subgraphs
Kang Rotation-invariant wafer map pattern classification with convolutional neural networks
CN111581116B (en) Cross-project software defect prediction method based on hierarchical data screening
CN111325264A (en) Multi-label data classification method based on entropy
CN112116950B (en) Protein folding identification method based on depth measurement learning
Tao et al. RDEC: integrating regularization into deep embedded clustering for imbalanced datasets
CN116013428A (en) Drug target general prediction method, device and medium based on self-supervision learning
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN108549915B (en) Image hash code training model algorithm based on binary weight and classification learning method
KR102272921B1 (en) Hierarchical object detection method for extended categories
Plasencia-Calana et al. Towards scalable prototype selection by genetic algorithms with fast criteria
CN110413792B (en) High-influence defect report identification method
Liu et al. Multi-class classification of support vector machines based on double binary tree
Dong et al. A region selection model to identify unknown unknowns in image datasets
Zhao et al. BatSort: Enhanced Battery Classification with Transfer Learning for Battery Sorting and Recycling
CN110427973A (en) A kind of classification method towards ambiguity tagging sample
Mehta et al. Dynamic classification of defect structures in molecular dynamics simulation data
CN112465884B (en) Multi-element remote sensing image change detection method based on generated characteristic representation network
CN113177604B (en) High-dimensional data feature selection method based on improved L1 regularization and clustering
Tambouratzis Improving the clustering performance of the scanning n-tuple method by using self-supervised algorithms to introduce subclasses
Gustafsson Searching for rare traffic signs
Plasencia-Calana et al. Scalable prototype selection by genetic algorithms and hashing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240112

AD01 Patent right deemed abandoned