CN113096732A - Die body mining method based on deep embedded convolutional neural network - Google Patents

Die body mining method based on deep embedded convolutional neural network Download PDF

Info

Publication number
CN113096732A
CN113096732A CN202110509307.4A CN202110509307A CN113096732A CN 113096732 A CN113096732 A CN 113096732A CN 202110509307 A CN202110509307 A CN 202110509307A CN 113096732 A CN113096732 A CN 113096732A
Authority
CN
China
Prior art keywords
model
embedded
neural network
convolutional
edeepcnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110509307.4A
Other languages
Chinese (zh)
Inventor
黄德双
张寅东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202110509307.4A priority Critical patent/CN113096732A/en
Publication of CN113096732A publication Critical patent/CN113096732A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a die body mining method based on a deep embedded convolutional neural network, which comprises the following steps: s1, constructing a deep embedded convolutional neural network eDeepCNN model; s2, carrying out K-mer coding on the DNA sequence, using an embedded vector as input representation of the K-mer in the model, training as a data set of the model, and carrying out feature extraction and binding prediction; s3, comparing the deep embedded convolutional neural network eDeepCNN model with a shallow layer network, and verifying the superiority of the deep embedded convolutional neural network eDeepCNN model. In the invention, the K-mer code explicitly models the dependency relationship of adjacent nucleotides in a DNA sequence, the shape information of the DNA sequence is hidden, and the high-dimensional embedded vector can fully represent the potential information contained in the K-mer.

Description

Die body mining method based on deep embedded convolutional neural network
Technical Field
The invention relates to the technical field of computer identification and deep learning, in particular to a motif mining method based on a deep embedded convolutional neural network.
Background
Transcription factors play an important role in biological processes such as gene transcription, repair and regulation. The gene variation of the binding site of the transcription factor is closely related to some serious diseases. Therefore, mining of transcription factor binding sites or motif mining has an important influence on understanding the regulatory mechanism of transcription factors. Traditionally, transcription factor binding sites are represented by a position weight matrix PWM, which is calculated by aligning motif sequences and counting the nucleotide distribution of the corresponding positions. However, PWM only focuses on the nucleotide distribution of motif sequences, and ignores the information of motif adjacent sequences, and case studies show that the context sequence information of motifs has a significant influence on binding behavior. Inspired by a position weight matrix, the Deepbind constructs a single-layer convolutional neural network model for a motif mining task, and researches show that the nucleotide distribution of a binding site adjacent sequence has important influence on binding behaviors. In practical biological processes, multiple transcription factors may cooperate with each other to affect the binding process. Thus, there may be motif-motif interactions in a sequence, and a single layer convolutional network is equally ineffective for this case.
PWM assumes that the nucleotides in a DNA sequence are independent of each other and is a simple approximation of a true physical process. The deep bind carries out unique thermal coding based on single nucleotide and has the advantages of simplicity and intuition, but cannot fully express the interaction of adjacent nucleotides, so that a motif mining method based on a deep embedded convolutional neural network is urgently needed.
Disclosure of Invention
The invention aims to capture the interaction between a motif and an adjacent nucleotide sequence aiming at a transcription factor binding prediction task and construct a deep convolutional network eDeepCNN model on the basis of a Deepbind model.
In order to achieve the purpose, the invention provides the following scheme:
a die body mining method based on a deep embedded convolutional neural network comprises the following steps:
s1, constructing a deep embedded convolutional neural network eDeepCNN model;
s2, carrying out K-mer coding on the DNA sequence, training a data set of the eDeepCNN model by using an embedded vector as an input representation of a K-mer in the eDeepCNN model, and carrying out feature extraction and binding prediction;
s3, comparing the eDeepCNN model with a shallow network, and verifying the superiority of the eDeepCNN model.
Preferably, the eDeepCNN model in S1 includes three convolutional layers, and a local maximum pooling layer and a missing layer are disposed behind each convolutional layer to help the deeply-embedded convolutional neural network model resist an overfitting phenomenon during the training process.
Preferably, the three convolutional layers are respectively: the device comprises a first convolutional layer, a second convolutional layer and a third convolutional layer, wherein the first convolutional layer is used for being responsible for extracting sequence local modes, and the second convolutional layer and the third convolutional layer model the interaction between the local modes.
Preferably, the first convolutional layer is calculated to obtain a motif score sequence, which is used as an input of the second convolutional layer, and a local distribution pattern of the score sequence is identified, so as to capture the interaction between the motif and the adjacent sequence; the third convolutional layer has the same operation mode as the second convolutional layer.
Preferably, the embedding vector in S2 represents an embedding representation point in the high-dimensional hidden space, represents an interaction relationship between the relative positions of the embedding vectors of different K-mers in the high-dimensional space, and implements one-to-one mapping between K-mer sequence numbers and corresponding embedding vectors, to obtain a sequence composed of K-mer sequence numbers.
Preferably, the corresponding embedded vectors are found in a table look-up mode according to the K-mer sequence numbers, the embedded vectors are sequentially formed into a two-dimensional array, and the two-dimensional array is converted into an embedded vector matrix through an embedded vector layer.
Preferably, in S2, before training, the embedded vector matrix is randomly initialized, and the embedded vectors corresponding to the K-mers are adjusted and optimized according to training data.
Preferably, in S3, a five-fold cross-validation strategy is adopted for evaluating the accuracy of the eDeepCNN model.
The invention has the beneficial effects that:
the invention provides a method for combining K-mer coding and embedded vector representation and a deep embedded convolutional neural network eDeepCNN by capturing the interaction of a motif and an adjacent nucleotide sequence aiming at a transcription factor binding prediction task. Compared with a single-layer convolutional network, the multilayer convolutional network can capture the context information of the motif sequence and the interaction between the motif and the adjacent sequence, and the fitting capability of the convolutional neural network is fully utilized. The PBM model assumes mutual independence between adjacent nucleotides, the K-mer coding explicitly models the dependency relationship of the adjacent nucleotides in the DNA sequence, the shape information of the DNA sequence is implicit, the embedded vector representation has stronger representation capability and more flexibility compared with the one-hot coding, and the implicit information contained by the K-mer can be fully characterized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a deep-embedded convolutional neural network model structure according to the present invention;
FIG. 3 is a diagram illustrating a comparison between the one-hot encoding and the K-mer encoding of the present invention;
FIG. 4 is a diagram illustrating a comparison between the neural network model structure and the model structure after using the loss strategy according to the present invention;
FIG. 5 is a schematic diagram of the model training and evaluation process under the five-fold intersection of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
A motif mining method based on deep embedded convolutional neural network has a flow as shown in figure 1, and comprises
S1, constructing a deep embedded convolutional neural network eDeepCNN model (shown in the attached figure 2);
deep convolutional network depcnn operated by three layers of convolution with loss and local pooling strategies. The first layer of convolution extracts the local pattern features of the DNA sequence, and calculates scores for all possible local motifs, which is the same as the Deepbind model. Second and third convolutional layers capable of capturing the interaction of motifs and adjacent sequences. The second convolutional layer receives as input the sequence of motif scores calculated by the first convolutional operation and identifies the local distribution pattern of the sequence of scores, and takes into account the interaction between adjacent motifs or the interaction between a motif and an adjacent sequence. According to the same logic, the third convolution layer has a larger receptive field than the second convolution layer, and can capture the interaction between local modes in a larger range in the sequence. Meanwhile, after the interaction of the local modes is preliminarily extracted through the convolution operation of the second layer, the third convolution layer can consider the high-order interaction between the local modes. Finally, the wider receptive field of the multilayer convolutional network can also adapt to the condition that the binding regions of the transcription factors are different in size. The fitting capability of the model is improved after the multilayer convolutional networks are combined, and the candidate sequence can be more comprehensively modeled. A local max pooling layer and a missing layer are laid down after each convolutional layer. The loss strategy plays an important role in the model. Because the number and complexity of model parameters are improved by the plurality of convolutional layers, the loss strategy can help the model to resist the over-fitting phenomenon in the training process so as to improve the model performance. After the convolutional network, a global maximum pooling layer is used to capture the global features of the DNA sequence and form a fixed-length feature vector to be sent to the fully-connected network for final prediction.
S2, carrying out K-mer coding on the DNA sequence, training a data set of the eDeepCNN model by using an embedded vector as an input representation of a K-mer in the eDeepCNN model, and carrying out feature extraction and binding prediction;
k-mer encoding utilizes a sliding window of length K, and can directly and conveniently characterize the interdependence of adjacent K nucleotides by regarding adjacent K nucleotides in the window as the basic building blocks of a DNA sequence. When k is 1, a single nucleotide is used as a basic unit of DNA sequence, and then 4 kinds of single nucleotides (A, C, G, T) are in total, which means that the nucleotides are assumed to be independent from each other at the coding level. When k is equal to 2, the contiguous 2 nucleotides are considered in their entirety as the basic unit of the DNA sequence, for a total of 16 (4)2) Dinucleotides (Dinucleotides), AA, AC, AG, AT, CA, CC, CG, CT, … …, TA, TC, TG, TT, respectively, the dinucleotide codes explicitly taking into account the interaction between two neighbouring nucleotides. Similarly, when k is equal to 3, the DNA has a total of 64 (4)3) Species-independent Trinucleotides (Trinucleotides) allow direct modeling of the dependence between adjacent three nucleotides. When K-mer encoding is performed, the number of independent K-mers has an exponential relationship with K, and the total number increases sharply with the increase of K.
The eDeepCNN model outperformed the comparative method. Under the condition of 1-mer coding, the average R2 of an eDeepCNN-1mer model on 20 data sets reaches 0.59, the relative lifting amplitude is 4% compared with a DeepCNN model represented by single-hot coding and is lifted by 2.5%, and the relative lifting amplitude reaches 22% compared with a DeepBind model of a single-layer convolutional network and is lifted by 10.9%.
After the K-mer coding and the embedded vector representation are combined, the model index is further improved, on 10 data sets, the average index of the eDeepCNN-2mer model is 0.596 which is higher than 0.573 of the eDeepCNN-1mer model, and the improvement amplitude is 4%.
In this embodiment, the candidate sequence is traversed using a sliding window of length K, and the K-mers within the window are recorded. The K-mers are recorded as corresponding sequence numbers, thereby converting the DNA sequence into an array that can be computed. FIG. 3 is a diagram illustrating a comparison between a one-hot encoding and a K-mer encoding. For example, when k equals 2, there are 16 independent 2-mers, the dinucleotide AA corresponds to the number 0, AC corresponds to 1, TG corresponds to 14, and TT corresponds to 15. For sequences of length k, s ═ si1 … k, corresponding to the reference number d(s):
Figure BDA0003059692520000071
wherein s represents the input of the K-mer sequence, and d(s) outputs the corresponding sequence number of the K-mer. siRepresents the nucleotides constituting the K-mer, i denotes the position of the nucleotide in the K-mer, d(s)i) The nucleotides are mapped to the corresponding sequence numbers.
Figure BDA0003059692520000072
Embedded vector representation is widely used in many fields such as natural language processing, information extraction and recommendation systems, etc. The embedded vector represents an embedded representation point in a high-dimensional hidden space, the position of the embedded vector in the high-dimensional space contains a great amount of information, the embedded vector has stronger representation capability compared with single-hot coding, and the relative position of the embedded vector representing different K-mers in the high-dimensional space can better represent the interaction relation between the K-mers. On the other hand, when the K-mers are subjected to one-hot encoding, there is 4 in totalkAn independent K-mer to form a 4kThe unique heat vector of dimension. When k is 1, there are 4 different 1-mers. When k is 2, 16 different 2-mers exist, when k is 5, 1024 independent 5-mers exist, a 1024-dimensional vector is formed after single-hot coding, and the vector dimension of the single-hot coding rises exponentially with the increase of k, so that the model parameter quantity explodes, and the training process is difficult to carry out. And the embedded vector coding can effectively reduce the dimension of the input vector of the convolutional network and avoid the problem of parameter explosion. Moreover, the embedded vector coding dimension is variable, an optimal parameter can be searched and selected in the model training process, and the embedded vector coding method has strong flexibility.
In this embodiment, a one-to-one mapping relationship is constructed between the K-mer sequence numbers and the corresponding embedded vectors, and a sequence consisting of K-mer sequence numbers is obtained after K-mer encoding is performed on the candidate sequence. And finding out corresponding embedded vectors in a table look-up mode according to the serial numbers of the K-mers, and forming a two-dimensional array by the embedded vectors in sequence. In the neural network, the conversion is realized by using an embedded vector layer, the embedded vector layer maintains an embedded vector matrix, each row in the matrix represents an embedded vector corresponding to a corresponding sequence number, and in actual operation, the embedded vectors of the rows corresponding to the matrix are found according to the sequence numbers in the input sequence. Before training begins, the embedded vector matrix is initialized randomly, and embedded vectors corresponding to the K-mers are adjusted and optimized step by step according to training data.
An optional embedded vector layer is added before the convolutional network, and the corresponding model is called eDeepCNN. The detail parameter settings in the model include the width of the convolution kernel in each convolution layer, and the number of convolution kernels is listed in Table 1 below. Some hyper-parameter settings in the model inherit the classic model Deepbind in the auto-motif mining task, and the parameters are proved to be good choices, and other parts are determined in the hyper-parameter mesh search in the training process.
TABLE 1
Figure BDA0003059692520000081
Figure BDA0003059692520000091
S3, comparing the deep embedded convolutional neural network eDeepCNN model with a shallow layer network, and verifying the superiority of the deep embedded convolutional neural network eDeepCNN model.
In the machine learning paradigm, an entire data set is divided into a training set and a testing set, a model realizes the modeling of a task target by optimizing parameters on the training set and learning the data rule in the training set, and then the trained model is applied to the testing set to check the actual effect of the model. One of the main points is that the model can learn the general rule of the task target in the training set, so that the model can be applied to the test set to achieve a good effect. However, in practical situations, the model tends to over-fit the noise of the training data during the training process, or the model learns the specific rules of the training set, and the rules are not applicable to the test set. In this case, the model has a high performance on the training set, but a poor performance on the test set, which is a so-called overfitting phenomenon.
The loss (Dropout) strategy is an effective means to deal with the over-fitting problem in deep neural networks. During the training process, Dropout randomly masks a part of neurons in each parameter optimization process, and forces the output values of the neurons to be zero, which is equivalent to randomly discarding a part of neurons in the neural network. During this optimization, these discarded neuron weight values are unchanged, as shown in fig. 4.
In this embodiment, the Dropout strategy multiplies 0 by the probability of p for each neuron output during training. Meanwhile, in order to compensate for the reduction of the network input value of the next layer caused by the reduction, the Dropout strategy amplifies the unmasked output value by using 1/1-p as a coefficient. During the test, the neuron output values were unchanged without any masking. The Dropout procedure is calculated as follows:
Figure BDA0003059692520000101
Figure BDA0003059692520000102
wherein the parameter r obeys a 0-1 distribution with probability p, x represents the neuron input vector, w, b represents the neural network weight and bias parameters, and the function f represents the neuron activation function.
Because the number and complexity of model parameters are improved by a plurality of convolution layers, the overfitting risk is greatly improved. Therefore, the loss layers are arranged in the convolutional network and the full-connection network, and the model is assisted to resist the overfitting phenomenon in the training process by combining the L2 regularization strategy, so that the model performance is improved.
Using a determined coefficient R2To measure the correlation of the predicted output with the measured value. R2Coefficients have been used in past studies to measure the predictive effect of a model on PBM in vitro datasets. R2The calculation formula of (a) is as follows:
Figure BDA0003059692520000103
wherein, yiThe label value (label) representing sample i,
Figure BDA0003059692520000111
represents the average of the values of the labels,
Figure BDA0003059692520000112
representing the predicted value of sample i.
1-R2And calculating the ratio of the mean square error of the predicted value and the measured value of the regression model to the inherent mean square error of the measured data. R2The closer the value is to 1, the smaller the model prediction error is relative to the dataset intrinsic variance, representing naturally the predicted behavior of the model. Due to R2The inherent variance of the relative data sets is normalized, so that the expression of the model on different transcription factor data sets can be compared, the evaluation indexes on a plurality of data sets can be averaged, and the performance of the model on the transcription factor binding task can be better balanced.
To accurately evaluate the performance of the model, a five-fold cross-validation strategy was used. Five-fold cross validation the experiments were repeated a total of five times, with each experiment having a different training and test set partitioning. The whole data set is randomly divided into five parts with equal size, in each experiment, four parts are taken as training sets, the rest parts are taken as test sets, and in five experiments, five different test sets are sequentially selected. During training, we randomly sample one eighth of the training set as the validation set. We finally adopted the R of the model on the test set in five cross-validation2The mean of (a) evaluates the final performance of the model. The schematic diagram of the five-fold crossover experiment process is shown in figure 5.
The invention provides a method for combining K-mer coding and embedded vector representation and a deep embedded convolutional neural network eDeepCNN by capturing the interaction of a motif and an adjacent nucleotide sequence aiming at a transcription factor binding prediction task. Compared with a single-layer convolutional network, the multilayer convolutional network can capture the context information of the motif sequence and the interaction between the motif and the adjacent sequence, and the fitting capability of the convolutional neural network is fully utilized. The PBM model assumes mutual independence between adjacent nucleotides, the K-mer coding explicitly models the dependency relationship of the adjacent nucleotides in the DNA sequence, the shape information of the DNA sequence is implicit, compared with the one-hot coding, the embedded vector representation has stronger representation capability and greater flexibility, and the implicit information contained by the K-mer can be fully characterized.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (8)

1. A die body mining method based on a deep embedded convolutional neural network is characterized by comprising the following steps:
s1, constructing a deep embedded convolutional neural network eDeepCNN model;
s2, carrying out K-mer coding on the DNA sequence, training a data set of the eDeepCNN model by using an embedded vector as an input representation of a K-mer in the eDeepCNN model, and carrying out feature extraction and binding prediction;
s3, comparing the eDeepCNN model with a shallow network, and verifying the superiority of the eDeepCNN model.
2. The model body mining method based on the deep-embedded convolutional neural network of claim 1, wherein the eDeepCNN model in S1 includes three convolutional layers, and a local maximum pooling layer and a missing layer are disposed behind each convolutional layer for helping the deep-embedded convolutional neural network model resist an over-fitting phenomenon in a training process.
3. The model body mining method based on the deep-embedded convolutional neural network of claim 2, wherein the three convolutional layers are respectively: the device comprises a first convolutional layer, a second convolutional layer and a third convolutional layer, wherein the first convolutional layer is used for being responsible for extracting sequence local modes, and the second convolutional layer and the third convolutional layer model the interaction between the local modes.
4. The motif mining method based on the deep-embedded convolutional neural network as claimed in claim 3, wherein the first convolutional layer is calculated to obtain a motif score sequence, and the motif score sequence is used as an input of the second convolutional layer to identify a local distribution pattern of the score sequence for capturing the interaction of the motif and the adjacent sequence; the third convolutional layer has the same operation mode as the second convolutional layer.
5. The method of claim 1, wherein in step S2, the embedding vector represents an embedding representation point in a high-dimensional hidden space, and represents an interaction relationship between relative positions of embedding vectors of different K-mers in the high-dimensional space, so as to implement one-to-one mapping between K-mer sequence numbers and corresponding embedding vectors, and obtain a sequence consisting of K-mer sequence numbers.
6. The method of claim 5, wherein the embedded vectors are sequentially grouped into a two-dimensional array according to the K-mer sequence numbers by looking up a table, and the two-dimensional array is transformed into an embedded vector matrix through an embedded vector layer.
7. The method of claim 1, wherein in step S2, before training, the embedded vector matrix is initialized randomly, and the embedded vectors corresponding to K-mers are adjusted and optimized according to training data.
8. The model body mining method based on the deep-embedding convolutional neural network of claim 1, wherein in S3, a five-fold cross-validation strategy is adopted for evaluating the accuracy of the eDeepCNN model.
CN202110509307.4A 2021-05-11 2021-05-11 Die body mining method based on deep embedded convolutional neural network Pending CN113096732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110509307.4A CN113096732A (en) 2021-05-11 2021-05-11 Die body mining method based on deep embedded convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110509307.4A CN113096732A (en) 2021-05-11 2021-05-11 Die body mining method based on deep embedded convolutional neural network

Publications (1)

Publication Number Publication Date
CN113096732A true CN113096732A (en) 2021-07-09

Family

ID=76664951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110509307.4A Pending CN113096732A (en) 2021-05-11 2021-05-11 Die body mining method based on deep embedded convolutional neural network

Country Status (1)

Country Link
CN (1) CN113096732A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102206699A (en) * 2010-07-14 2011-10-05 上海聚类生物科技有限公司 Method for prediction of transcription factor binding site (TFBS)
CN110335639A (en) * 2019-06-13 2019-10-15 哈尔滨工业大学(深圳) A kind of Transcription Factor Binding Sites Prediction Algorithm and device across transcription factor
CN111341386A (en) * 2020-02-17 2020-06-26 大连理工大学 Attention-introducing multi-scale CNN-BilSTM non-coding RNA interaction relation prediction method
CN111667884A (en) * 2020-06-12 2020-09-15 天津大学 Convolutional neural network model for predicting protein interactions using protein primary sequences based on attention mechanism
CN111696624A (en) * 2020-06-08 2020-09-22 天津大学 DNA binding protein identification and function annotation deep learning method based on self-attention mechanism
CN112270955A (en) * 2020-10-23 2021-01-26 大连民族大学 Method for predicting RBP binding site of lncRNA (long-range nuclear ribonucleic acid) by attention mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102206699A (en) * 2010-07-14 2011-10-05 上海聚类生物科技有限公司 Method for prediction of transcription factor binding site (TFBS)
CN110335639A (en) * 2019-06-13 2019-10-15 哈尔滨工业大学(深圳) A kind of Transcription Factor Binding Sites Prediction Algorithm and device across transcription factor
CN111341386A (en) * 2020-02-17 2020-06-26 大连理工大学 Attention-introducing multi-scale CNN-BilSTM non-coding RNA interaction relation prediction method
CN111696624A (en) * 2020-06-08 2020-09-22 天津大学 DNA binding protein identification and function annotation deep learning method based on self-attention mechanism
CN111667884A (en) * 2020-06-12 2020-09-15 天津大学 Convolutional neural network model for predicting protein interactions using protein primary sequences based on attention mechanism
CN112270955A (en) * 2020-10-23 2021-01-26 大连民族大学 Method for predicting RBP binding site of lncRNA (long-range nuclear ribonucleic acid) by attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YINDONG ZHANG ET AL: "Predicting in-Vitro Transcription Factor Binding Sites with Deep Embedding Convolution Network", 《ICIC 2020: INTELLIGENT COMPUTING THEORIES AND APPLICATION》 *

Similar Documents

Publication Publication Date Title
Zhang et al. An end-to-end deep learning architecture for graph classification
CN106778014B (en) Disease risk prediction modeling method based on recurrent neural network
CN110334843B (en) Time-varying attention improved Bi-LSTM hospitalization and hospitalization behavior prediction method and device
CN109086805B (en) Clustering method based on deep neural network and pairwise constraints
CN110490320B (en) Deep neural network structure optimization method based on fusion of prediction mechanism and genetic algorithm
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN110993113B (en) LncRNA-disease relation prediction method and system based on MF-SDAE
Jiang et al. A hybrid intelligent model for acute hypotensive episode prediction with large-scale data
CN107577924A (en) A kind of long-chain non-coding RNA subcellular location prediction algorithm based on deep learning
CN112599187B (en) Method for predicting drug and target protein binding fraction based on double-flow neural network
Maulik Analysis of gene microarray data in a soft computing framework
CN112215259B (en) Gene selection method and apparatus
Hota Diagnosis of breast cancer using intelligent techniques
CN102073882A (en) Method for matching and classifying spectrums of hyperspectral remote sensing image by DNA computing
CN113257359A (en) CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR
Shen et al. Simultaneous genes and training samples selection by modified particle swarm optimization for gene expression data classification
CN112926640A (en) Cancer gene classification method and equipment based on two-stage depth feature selection and storage medium
CN101324926A (en) Method for selecting characteristic facing to complicated mode classification
CN117034767A (en) Ceramic roller kiln temperature prediction method based on KPCA-GWO-GRU
Nagae et al. Automatic layer selection for transfer learning and quantitative evaluation of layer effectiveness
CN113096732A (en) Die body mining method based on deep embedded convolutional neural network
CN116541785A (en) Toxicity prediction method and system based on deep integration machine learning model
CN116504331A (en) Frequency score prediction method for drug side effects based on multiple modes and multiple tasks
Ullah et al. Crow-ENN: An Optimized Elman Neural Network with Crow Search Algorithm for Leukemia DNA Sequence Classification
CN114596913B (en) Protein folding identification method and system based on depth central point model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210709