CN113506595A - Method for identifying DNA promoter element based on information theory - Google Patents

Method for identifying DNA promoter element based on information theory Download PDF

Info

Publication number
CN113506595A
CN113506595A CN202110907396.8A CN202110907396A CN113506595A CN 113506595 A CN113506595 A CN 113506595A CN 202110907396 A CN202110907396 A CN 202110907396A CN 113506595 A CN113506595 A CN 113506595A
Authority
CN
China
Prior art keywords
information
sequence
promoter
trinucleotide
mer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110907396.8A
Other languages
Chinese (zh)
Inventor
郭菲
吕一诺
何文颖
唐继军
曹晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202110907396.8A priority Critical patent/CN113506595A/en
Publication of CN113506595A publication Critical patent/CN113506595A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention discloses a method for identifying DNA promoter elements based on information theory, which is based on a double-layer identification model for judging different types of promoters, wherein: the double-layer recognition model carries out promoter sequence recognition through the following steps: step 101: acquiring a promoter sequence data set through an escherichia coli database; step 102: performing position-specific frequency extraction of trinucleotide composition information and dinucleotide composition information on DNA promoter sequence data by a PSTNP algorithm; step 103: optimizing position-specific frequency information of trinucleotide composition information and dinucleotide composition information; the promoter element type identification layer carries out resampling processing on data sets of different promoter types by utilizing a SMOTE algorithm; the invention solves the problem of predicting the DNA promoter and the specific type thereof, and adopts the information theory method to carry out characteristic optimization on the extracted sequence frequency information, thereby obviously improving the prediction precision.

Description

Method for identifying DNA promoter element based on information theory
Technical Field
The invention belongs to the field of functional element prediction algorithms in bioinformatics, and particularly relates to a method for identifying DNA promoter elements based on an information theory.
Background
Promoters are DNA regulatory elements located near the upstream transcription initiation site of a gene, which control the initiation of gene-specific transcription, and determine the timing and level of gene expression. Accurate positioning of the promoter can be realized, so that the identification of the promoter has important significance for researching gene structure, annotating gene information and the like on the genome level. The promoter can be recognized by sigma factors having different functions and structures when specifically bound to RNA polymerase, and is classified as sigma24、σ28、σ32、σ38、σ54And σ70And six types. At present, researchers still mainly recognize these promoters by biological methods. However, computational biology is becoming a more favored classification method due to the time and material consuming nature of conducting biological experiments.
Disclosure of Invention
The invention aims to provide a method for accurately and efficiently predicting a DNA promoter element and a type thereof. The PSTNP algorithm used by the invention can well extract the position specificity information of the nucleotide, and further improves the PSTNP by using the information content scoring matrix, thereby more clearly describing the frequency matrix difference between the positive sample and the negative sample. The position specificity feature matrixes of the trinucleotide and the dinucleotide are combined, and finally a two-layer prediction model based on a support vector machine is constructed: the first layer judges whether the sequence is a promoter, and the second layer further predicts the type of the identified promoter and obtains good prediction performance.
The invention is characterized in that the invention solves the problem of the identification and prediction of DNA promoter elements and types thereof, and comprises the following steps in sequence:
a method for identifying a DNA promoter element based on information theory, the method being based on a double layer identification model for judging different types of promoters, the double layer identification model being composed of a layer for identifying a promoter element and a layer for identifying a promoter element type, wherein:
the promoter sequence recognition is carried out by the following steps of:
step 101: acquiring a promoter sequence data set through an escherichia coli database;
step 102: performing position-specific frequency extraction of trinucleotide information and dinucleotide composition information on DNA promoter sequence data by a PSTNP algorithm; the position-specific frequencies include trinucleotide and dinucleotide position-specific frequency information F on the positive and negative data sets+And F-
Step 103: optimizing position-specific frequency information of trinucleotide information and dinucleotide composition information by the following formula;
Figure BDA0003202201460000021
wherein, F+And F-Respectively representing the distribution condition of the frequency matrix obtained by the positive and negative sample frequency information;
Figure BDA0003202201460000022
represents the 4 th position appearing at the 81-k +1 th position in the sequencekThe degree of difference in the positive and negative sample frequencies of a single trinucleotide or dinucleotide.
And the promoter element type identification layer carries out resampling processing on the data sets of different promoter types by utilizing a SMOTE algorithm.
Further, the nucleotide composition information with position specificity of the promoter sequence obtained in step 102 is generated by the following steps:
2.1 for each 81bp sequence sample S, there are:
S=N1N2…Nl…N81
wherein N islNucleotide representing the l position, consisting of A, C, G, T;
2.2, extracting position specificity information of the promoter sequence by using a k-mer method, and respectively taking k as 3 and k as 2;
2.3 calculating frequency information F of the position specificity of the trinucleotide and dinucleotide over the entire positive and negative data set, respectively+And F-Expressed as follows:
Figure BDA0003202201460000031
and
Figure BDA0003202201460000032
wherein the content of the first and second substances,
Figure BDA0003202201460000033
or
Figure BDA0003202201460000034
Represents the 4 th position appearing at the 81-k +1 th position in the sequencekTrinucleotide (3 mer)i) Or dinucleotides (2 mers)i) Frequency of (3 mer)iRepresenting AAA, AAC, …, TTT, and 2meriRepresenting AA, AC, …, TT.
Further, the step 103 of performing an optimization process on the position-specific frequency information of the trinucleotide composition information and the dinucleotide composition information further includes:
each sequence sample S for position-specific frequency information of trinucleotide composition information and dinucleotide composition information can be represented as:
S=[φ12,…,φw,…,φ81-k+1]T
where T is the transpose operator.
For trinucleotide,. phiwThe definition is as follows:
Figure BDA0003202201460000035
wherein w is a trinucleotide in the sequenceThe information on the position of the inspiration is,
Figure BDA0003202201460000036
4 th position representing the appearance at w-th position in the sequencekTrinucleotide (3 mer)i) Positive and negative sample frequency difference degree of (3 mer)iRepresenting AAA, AAC, …, TTT.
For dinucleotides,. phiwThe definition is as follows:
Figure BDA0003202201460000041
wherein w is a dinucleotide encoding positional information in the sequence,
Figure BDA0003202201460000042
4 th position representing the appearance at w-th position in the sequencekTwo nucleotides (2 mers)i) Positive and negative sample frequency difference degree of (2 mer)iThe expression AA is shown in the specification,
AC,…,TT。
advantageous effects
The invention utilizes an information theory method to process sequence frequency information to carry out DNA promoter element identification and type prediction. And extracting nucleotide position specificity information of the promoter sequence by using a PSTNP algorithm to jointly express sequence information, and improving the PSTNP by using an information content scoring matrix to enlarge the discrete distribution difference of the frequency matrix between the positive sample and the negative sample. Combining the characteristic matrixes of the trinucleotide and the dinucleotide, and obtaining useful characteristic information to the maximum extent. Finally, the invention constructs a two-layer prediction model based on a support vector machine: the first layer judges whether the sequence is a promoter; the second layer further predicts the specific type of promoter identified and achieves good prediction performance. The prediction accuracy of the invention is higher than that of other existing models, and the invention has important significance for the recognition of DNA promoter elements and the research of type prediction problems.
Drawings
FIG. 1 is a flow chart of the computational process of the present invention;
FIG. 2 comparison of performance of different k-mer selections;
FIG. 3 is a performance comparison of three information theory algorithms in feature optimization;
FIG. 4 is a comparison of the performance of six feature selection algorithms;
FIG. 5 comparison of performance of different resampling strategies at the second tier promoter type prediction;
FIG. 6 compares the performance of two existing promoter prediction models.
Detailed Description
Promoters determine the initiation of DNA sequence-specific transcription and are important regulatory elements necessary for gene expression. Identifying and positioning the promoter helps to accurately position the gene, and has important guiding effect on annotation of structural and functional information of the biological genome. In the gene transcription process, when RNA polymerase is specifically combined with a specific promoter, a specific sigma protein factor is required for auxiliary recognition, so the sigma factor is often used for marking the type of the promoter as sigma24、σ28、σ32、σ38、σ54、σ70. At present, the traditional biological experiment method for identifying the promoter and the type thereof is time-consuming, labor-consuming and high in cost, and compared with the traditional biological experiment method for identifying and classifying the promoter and the type thereof by using a bioinformatics algorithm, the method is more economical and convenient.
The basic idea of the invention is as follows: and extracting position specificity information of the promoter sequence, optimizing and improving the characteristics, and constructing a two-layer prediction model based on a support vector machine. The first layer judges whether the sequence is a promoter; the second layer further predicts the specific type of promoter identified.
The invention mainly comprises the following steps: firstly, a DNA promoter sequence data set is constructed, then, the PSTNP algorithm is utilized to obtain k-mer nucleotide composition information with position specificity of a promoter sequence, the extracted sequence information is optimized through an information content scoring matrix, and trinucleotide and dinucleotide feature matrices are combined to obtain more feature information. And finally, constructing a prediction model by using a support vector machine algorithm, and identifying the promoter and the type thereof. The flow chart of the whole calculation process of the invention is shown in FIG. 1. By using the double-layer prediction model, a better prediction result can be obtained than other existing models.
Step (1): coli (e.coli K-12) promoter sequence dataset was obtained via database reguulondb (version 9.3) and redundant sequences were removed with CD-HIT;
step (2): obtaining trinucleotide composition information and dinucleotide composition information with position specificity of a DNA promoter sequence by a PSTNP algorithm;
and (3): calculating an information content scoring matrix, and optimizing the extracted sequence information based on an information theory;
and (4): merging the feature matrices of the trinucleotides and the dinucleotides;
and (5): constructing a prediction model by using a support vector machine, and identifying a DNA promoter sequence;
and (6): resampling the data sets of different starter subtypes by utilizing an SMOTE algorithm, and solving the problem of unbalanced data sets;
and (7): and constructing a prediction model and identifying different types of promoter sequences.
Further, the nucleotide composition information that the DNA promoter sequence in the step (2) has position specificity is generated by the following steps:
2.1 for each 81bp sequence sample S, there are:
S=N1N2…Nl…N81
wherein N islThe nucleotide representing the l-th position consists of A, C, G and T.
2.2, extracting position specificity information of the promoter sequence by using a k-mer method, and respectively taking k as 3 and k as 2;
2.3 calculating frequency information F of the position specificity of the trinucleotide and dinucleotide over the entire positive and negative data set, respectively+And F-Expressed as follows:
Figure BDA0003202201460000061
and
Figure BDA0003202201460000062
wherein the content of the first and second substances,
Figure BDA0003202201460000063
or
Figure BDA0003202201460000064
Represents the 4 th position appearing at the 81-k +1 th position in the sequencekTrinucleotide (3 mer)i) Or dinucleotides (2 mers)i) Frequency of (3 mer)iRepresenting AAA, AAC, …, TTT, and 2meriRepresenting AA, AC, …, TT.
Further, the step of optimizing the PSTNP algorithm using the information content score matrix in step (3) is represented as follows:
3.1, using the information content score matrix, and using the method based on the information theory to optimize the sequence information, the process is expressed as:
Figure BDA0003202201460000071
wherein, F+And F-Respectively representing the distribution condition of the frequency matrix obtained by the positive and negative sample frequency information;
Figure BDA0003202201460000072
represents the 4 th position appearing at the 81-k +1 th position in the sequencekThe degree of difference in the positive and negative sample frequencies of a single trinucleotide or dinucleotide.
3.2, each sequence sample S can then be represented as:
S=[φ12,…,φw,…,φ81-k+1]T
where T is the transpose operator.
For trinucleotide,. phiwThe definition is as follows:
Figure BDA0003202201460000073
wherein w is a trinucleotide-revealing positional information in the sequence,
Figure BDA0003202201460000074
4 th position representing the appearance at w-th position in the sequencekTrinucleotide (3 mer)i) Positive and negative sample frequency difference degree of (3 mer)iRepresenting AAA, AAC, …, TTT.
For dinucleotides,. phiwThe definition is as follows:
Figure BDA0003202201460000081
wherein w is a dinucleotide encoding positional information in the sequence,
Figure BDA0003202201460000082
4 th position representing the appearance at w-th position in the sequencekTwo nucleotides (2 mers)i) Positive and negative sample frequency difference degree of (2 mer)iRepresenting AA, AC, …, TT.
According to the calculation method, 5-fold cross validation is carried out on all prediction experiments. First, when obtaining nucleotide composition information of a position-specific k-mer, different attempts such as 1-mer, 2-mer, 3-mer, etc. were made, and the results of comparing the performances are shown in FIG. 2. It can be seen that the method of combining position-specific features of trinucleotides and dinucleotides performs best, with an overall accuracy significantly higher than the others. Therefore, we finally choose this hybrid method for feature extraction. Then, the invention adopts three information theory methods when improving the PSTNP algorithm, as shown in FIG. 3. It can be seen that the predicted results of the features constructed using the information content score matrix perform best (Acc: 90.05%), with significant advantages over the original PSTNP method and the features of KL divergence, JS divergence processing. After processing the features according to the optimal solution, the present invention uses different classification algorithms for prediction based on 5-fold cross validation, as shown in fig. 4. It can be seen that the classifier constructed using SVM has the best prediction results on Acc (90.05%), MCC (0.68).
At the second level of the classifier, we identified and classified the types of six promoters. The number of samples is extremely unbalanced for different types of promoter datasets. Therefore, we resample the six subsets, and construct three different datasets, namely, the original dataset Data I, the dataset Data ii processed by CD _ HIT undersampling, and the dataset Data iii processed by SMOTE oversampling, respectively, and compare the classification results of the three datasets, as shown in fig. 5. In the results of the original Data set Data I, the classification results of different promoters are greatly different, and the performances are not balanced. The number of samples per subset of Data set Data ii is small, only 96 samples. Based on this, we obtained the most excellent set of results. The SMOTE processed Data set Data iii has a more reliable Data set size, and each subset is 500 samples. The results, whether sensitivity, specificity, Acc or MCC, were also more balanced than the untreated Data set Data I, but slightly worse than Data ii.
Finally, through 5-fold cross validation, the performances of different classifiers for solving the promoter classification problem are compared. The present invention was compared to other 2 classification methods on the same dataset as shown in fig. 6. The result shows that the iPro2L-PSTKNC classifier provided by the invention performs best on the first layer, and Acc reaches 90.05% when a promoter and a non-promoter are identified. Even in the second layer, our model still has the best performance in each sub-classification of the promoter, and the accuracy can reach more than 91%.
In conclusion, the invention provides an improved feature extraction algorithm based on PSTNP, and nucleotide position specificity information of a promoter sequence is effectively described. Subsequently, an SVM algorithm is applied to establish a classification model, and 5-fold cross validation is adopted to evaluate the performance of the classification model. In addition, for the promoters which are already identified, the promoters are further subjected to refined classification. A resampling algorithm is used to process the imbalance of the data sets for different promoter types. Compared with the performance of the most advanced classifier at present, the prediction classification model provided by the invention is obviously improved on the evaluation indexes such as sensitivity, specificity, accuracy, MCC (China computer code) and the like, a more effective method is provided for solving the problem of promoter prediction identification, and the prediction classification model is simple in calculation process, easy to implement and wide in usability.

Claims (3)

1. A method for identifying a DNA promoter element based on information theory, wherein the method is based on a double layer identification model for judging different types of promoters, the double layer identification model is composed of a layer for identifying a promoter element and a layer for identifying a promoter element type, wherein:
the promoter sequence recognition is carried out by the following steps of:
step 101: acquiring a promoter sequence data set through an escherichia coli database;
step 102: performing position-specific frequency extraction of trinucleotide composition information and dinucleotide composition information on DNA promoter sequence data by a PSTNP algorithm; the position-specific frequencies include position-specific frequency information F of trinucleotides and dinucleotides on the positive and negative data sets+And F-
Step 103: optimizing position-specific frequency information of trinucleotide composition information and dinucleotide composition information by the following formula;
Figure FDA0003202201450000011
wherein, F+And F-Respectively representing the distribution condition of the frequency matrix obtained by the positive and negative sample frequency information;
Figure FDA0003202201450000012
represents the 4 th position appearing at the 81-k +1 th position in the sequencekPositive or negative identity of a trinucleotide or dinucleotideThe frequency difference is measured.
And the promoter element type identification layer carries out resampling processing on the data sets of different types of promoters by utilizing a SMOTE algorithm.
2. The method of claim 1, wherein the nucleotide composition information of the promoter sequence with position specificity obtained in step 102 is generated by the following steps:
2.1 for each 81bp sequence sample S, there are:
S=N1N2…Nl…N81
wherein N islNucleotide representing the l position, consisting of A, C, G, T;
2.2, extracting position specificity information of the promoter sequence by using a k-mer method, and respectively taking k as 3 and k as 2;
2.3 calculating the position-specific frequency information F of the trinucleotides and dinucleotides respectively over the entire positive and negative data sets+And F-Expressed as follows:
Figure FDA0003202201450000021
and
Figure FDA0003202201450000022
wherein the content of the first and second substances,
Figure FDA0003202201450000023
or
Figure FDA0003202201450000024
Represents the 4 th position appearing at the 81-k +1 th position in the sequencekTrinucleotide (3 mer)i) Or dinucleotides (2 mers)i) Frequency of (3 mer)iTo representAAA, AAC, …, TTT, and 2meriRepresenting AA, AC, …, TT.
3. The method of claim 1, wherein the step 103 is performed by optimizing the position-specific frequency information of the trinucleotide composition information and the dinucleotide composition information, and further comprises:
each sequence sample S of the position-specific frequency information of the trinucleotide composition information and the dinucleotide composition information can be represented as:
S=[φ1,φ2,…,φw,…,φ81-k+1]T
where T is the transpose operator.
For trinucleotide,. phiwThe definition is as follows:
Figure FDA0003202201450000031
wherein w is a trinucleotide-revealing positional information in the sequence,
Figure FDA0003202201450000033
4 th position representing the appearance at w-th position in the sequencekTrinucleotide (3 mer)i) Positive and negative sample frequency difference degree of (3 mer)iRepresenting AAA, AAC, …, TTT;
for dinucleotides,. phiwThe definition is as follows:
Figure FDA0003202201450000032
wherein w is a dinucleotide encoding positional information in the sequence,
Figure FDA0003202201450000034
4 th position representing the appearance at w-th position in the sequencekTwo isNucleotide (2 mer)i) Positive and negative sample frequency difference degree of (2 mer)iRepresenting AA, AC, …, TT.
CN202110907396.8A 2021-08-09 2021-08-09 Method for identifying DNA promoter element based on information theory Pending CN113506595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110907396.8A CN113506595A (en) 2021-08-09 2021-08-09 Method for identifying DNA promoter element based on information theory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110907396.8A CN113506595A (en) 2021-08-09 2021-08-09 Method for identifying DNA promoter element based on information theory

Publications (1)

Publication Number Publication Date
CN113506595A true CN113506595A (en) 2021-10-15

Family

ID=78015853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110907396.8A Pending CN113506595A (en) 2021-08-09 2021-08-09 Method for identifying DNA promoter element based on information theory

Country Status (1)

Country Link
CN (1) CN113506595A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006249249A1 (en) * 1997-06-03 2007-01-11 Rutgers, The State University Of New Jersey Plastid promoters for transgene expression in the plastids of higher plants
CN111161793A (en) * 2020-01-09 2020-05-15 青岛科技大学 Stacking integration based N in RNA6Method for predicting methyladenosine modification site

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006249249A1 (en) * 1997-06-03 2007-01-11 Rutgers, The State University Of New Jersey Plastid promoters for transgene expression in the plastids of higher plants
CN111161793A (en) * 2020-01-09 2020-05-15 青岛科技大学 Stacking integration based N in RNA6Method for predicting methyladenosine modification site

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WENYING HE: "EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron–ion interaction potential feature selection", 《MOLECULAR BIOSYSTEMS》 *
YINUO LYU 等: "《iEnhancer-KL: A Novel Two-Layer Predictor》", 《IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS》 *
YINUO LYU 等: "《iPro2L-PSTKNC: A Two-Layer Predictor for Discovering Various Types of Promoters by Position Specific of Nucleotide Composition》", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS》 *
刘枫: "霍山石斛 cpDNA 全序列微卫星分布及分子鉴别研究", 《中药材》 *

Similar Documents

Publication Publication Date Title
Barash et al. A simple hyper-geometric approach for discovering putative transcription factor binding sites
US20210332354A1 (en) Systems and methods for identifying differential accessibility of gene regulatory elements at single cell resolution
CN113344272B (en) Prediction method of interaction relation between circRNA, miRNA and RBP based on machine learning
CN111863121A (en) Protein self-interaction prediction method based on graph convolution neural network
Liang et al. iPromoter-ET: Identifying promoters and their strength by extremely randomized trees-based feature selection
US20210398605A1 (en) System and method for promoter prediction in human genome
Tatarinova et al. NPEST: a nonparametric method and a database for transcription start site prediction
CN113823356A (en) Methylation site identification method and device
CN113506595A (en) Method for identifying DNA promoter element based on information theory
Chen et al. sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs
Min et al. Survey of programs used to detect alternative splicing isoforms from deep sequencing data in silico
CN114627964B (en) Prediction enhancer based on multi-core learning and intensity classification method and classification equipment thereof
US20210324465A1 (en) Systems and methods for analyzing and aggregating open chromatin signatures at single cell resolution
CN113593641A (en) Method for identifying DNA enhancer element based on sequence frequency information
CN113362898A (en) RNA subcellular localization method for identifying by fusing multiple sequence frequency information
CN111383710A (en) Gene splice site recognition model construction method based on particle swarm optimization gemini support vector machine
Li et al. Fast and accurate classification of meta-genomics long reads with deSAMBA
Sutanto et al. Assessing global-local secondary structure fingerprints to classify RNA sequences with deep learning
Tao et al. A new promoter recognition method based on features optimal selection
Wu et al. Systems biology approaches to mining high throughput biological data
Abbas et al. TC-6mA-Pred: Prediction of DNA N6-methyladenine sites using CNN with transformer
Garbarine et al. An information theoretic method of microarray probe design for genome classification
jast Muhammad et al. Prediction of Sigma-54 Promoters in Bacterial Genomes
Anand et al. Feature selection approach for quantitative prediction of transcriptional activities
Wang et al. iGAPK: Improved GAPK algorithm for regulatory DNA motif discovery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211015

RJ01 Rejection of invention patent application after publication