CN113257359A - CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR - Google Patents

CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR Download PDF

Info

Publication number
CN113257359A
CN113257359A CN202110639647.9A CN202110639647A CN113257359A CN 113257359 A CN113257359 A CN 113257359A CN 202110639647 A CN202110639647 A CN 202110639647A CN 113257359 A CN113257359 A CN 113257359A
Authority
CN
China
Prior art keywords
cnn
guide rna
svr
model
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110639647.9A
Other languages
Chinese (zh)
Inventor
张桂珊
陈耀文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shantou University
Original Assignee
Shantou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shantou University filed Critical Shantou University
Priority to CN202110639647.9A priority Critical patent/CN113257359A/en
Publication of CN113257359A publication Critical patent/CN113257359A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Bioethics (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The embodiment of the invention discloses a CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR, which comprises the following steps: (1) constructing a reference data set and an independent test set, and coding the reference data set and the test set, (2) constructing a CNN network, pre-training the coded reference data set, and extracting abstract features of a guide RNA sequence, (3) sequencing the abstract features of the guide RNA sequence extracted by the CNN according to importance by using a minimum redundancy maximum correlation method, and adding feature subsets one by one according to the importance sequencing by using a sequential forward search algorithm; (4) inputting the constructed feature subset into an SVR classifier to predict the editing efficiency of guide RNA, adjusting and optimizing model parameters according to Spearman correlation coefficient and AUROC value until an optimal solution is obtained, and storing a trained model, (5) applying the trained CNN-SVR model and combining a migration learning strategy to improve the prediction accuracy of the CNN-SVR in an independent test set. The method has high prediction accuracy and strong robustness.

Description

CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR
Technical Field
The invention relates to a gene editing technology, in particular to a method for predicting CRISPR/Cas9 guide RNA editing efficiency.
Background
The CRISPR/Cas9 system is derived from bacterial defense mechanisms and is a currently more common gene editing tool. The technology can edit and modify at specific positions on the genome, and revolutionary changes are brought to the fields of biology, biotechnology, medicine and the like. CRISPR/Cas9 consists of Cas9 with nuclease activity and a specifically programmed guide RNA that targets the complex to the target genomic region by recognizing 3' PAM, completing recognition and cleavage by base complementary pairing. The efficiency of gene editing depends largely on the activity of the guide RNA, and there is a large difference in the activity of different guide RNAs, resulting in a large difference in gene editing efficiency. However, the specific factors affecting the efficiency of guide RNA editing have not been completely clarified. Cas9 is able to bind to unintended genomic sites resulting in off-target. Designing guide RNAs with high editing efficiency and low off-target effects is an important research issue for optimizing this system. Accurate prediction of guide RNA editing efficiency will help to design guide RNAs with greater activity, maximizing the editing efficiency of the target site, while minimizing off-target effects. Therefore, the efficiency of computer-aided prediction of guide RNA editing is one of the key steps for successful gene editing using the CRISPR/Cas9 system.
Machine learning was applied stepwise in the study of CRISPR/Cas9 guide RNA editing efficiency prediction. Such methods take into account the characteristics of the different nucleotides of a given guide RNA sequence to assess its performance for cleavage. The machine learning prediction effect relies on artificial feature engineering with a priori knowledge. In addition, manually extracting features may introduce redundant information, which in turn affects the prediction effect. The design rules of the current machine learning guide RNA activity prediction method are incomplete or even biased, and still require the integrated analysis of a large amount of data.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is to provide a CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR. The method can automatically extract abstract features of the input sequence, avoids manual feature selection, and has better potential in solving the problem of guide RNA activity prediction research.
In order to solve the technical problem, the embodiment of the invention provides a CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR, which comprises the following steps:
(1) a reference dataset comprising guide RNA sequences and their binary tags was constructed. And integrating data sets of different platforms, performing standardized preprocessing, and constructing a test set. The test set contains guide RNA sequences and their corresponding edit efficiency values. The guide RNA editing efficiency standardization method is as follows: constructing a matrix of guide RNAs and editing efficiencies
Figure 371670DEST_PATH_IMAGE001
The rows and columns of the matrix represent the number of experiments and guide RNAs, respectively.
Figure 727958DEST_PATH_IMAGE002
Is shown as
Figure 383062DEST_PATH_IMAGE003
The secondary experiment guides the efficiency of editing RNA, normalized as defined below:
Figure 965222DEST_PATH_IMAGE004
wherein, ynorIndicating the efficiency of normalized guide RNA editing,
Figure 218742DEST_PATH_IMAGE005
the average value of each row is represented by,
Figure 228155DEST_PATH_IMAGE006
the average value of each column is shown,
Figure 370555DEST_PATH_IMAGE007
to represent
Figure 20321DEST_PATH_IMAGE001
Average value of (a).
(2) Encoding the reference data set and the test set: and (3) carrying out unique hot coding on the guide RNA sequences of the reference data set and the independent test set obtained in the step (1) to obtain a binary matrix. One-hot encoding represents an input sequence as
Figure 564566DEST_PATH_IMAGE008
The binary matrix of (a) is obtained,
Figure 744881DEST_PATH_IMAGE009
indicates the nucleotide species (A, C, G and T),
Figure 938358DEST_PATH_IMAGE010
represents the sequence length. Each position of the sequence is represented by a binary vector of length 4, where A, C, G and T are represented by
Figure 550736DEST_PATH_IMAGE011
Figure 261072DEST_PATH_IMAGE012
Figure 67747DEST_PATH_IMAGE013
Figure 247056DEST_PATH_IMAGE014
And (4) showing.
(3) Constructing a Convolutional Neural Network (CNN) model: and constructing a CNN network model, and training basic network parameters of the CNN model by using a reference data set. The number of convolutional layers of the CNN model, the size and the length of each convolutional layer convolutional core, the number of layers of the fully-connected layers and the number of neurons of each fully-connected layer are determined empirically. The input layer inputs the one-hot coded guide RNA (binary matrix of 4 × 23) into a one-dimensional convolution layer (conv _1) with 256 one-dimensional convolution kernels of length 5 steps 1, regularized using the ReLU activation function and a random erasure rate of 0.3 dropout. The structure of the second layer (conv _2) is the same as the conv _1 layer. The output of the conv _2 layer is flattened and then passes through 4 full-connection layers (FC _1, FC _2, FC _3, FC _4), and the number of neurons is 256, 128, 64 and 40 respectively. And finally, combining the output of the last full-connection layer of the two branches, inputting the output layer of the full-connection neural network layer (FC) only containing one neuron, wherein the activation function is linear and the loss function is mse. And (3) optimizing hyper-parameters of the CNN by using grid search, determining the optimal dropout regularized random deletion rate of 0.3, batch size of 256 and epoch times of 200, and determining the optimal CNN model structure.
(4) Extracting important sequence features by using a feature selection method: and sequencing the abstract features of the guide sequence extracted by the CNN according to the importance by using the minimum redundant maximum correlation, and selecting the sequence features 13 bits before the importance for further training.
(5) Constructing an SVR classifier: inputting the abstract characteristics obtained in the step 1 into an SVR trained by a Gaussian radial basis kernel classifier for training, and searching for the optimal punishment parameters by using a grid search method
Figure 709130DEST_PATH_IMAGE015
Is 1.7, nuclear parameter
Figure 791749DEST_PATH_IMAGE016
Is 0.12 and
Figure 313866DEST_PATH_IMAGE017
the parameter is 0.11, and the optimal SVR model is obtained.
(6) And (3) further optimizing the prediction precision of the CNN-SVR model by using transfer learning: the CNN-SVR base model is trained de novo on the baseline dataset. And transferring the parameters of all layers outside the last two fully-connected layers processed by the obtained basic model to a specific test set for prediction, thereby improving the prediction precision of the model.
The embodiment of the invention has the following beneficial effects: according to the invention, on the basis of a classical CNN network, the last layer of linear regression is improved into SVR, and the CNN extraction guide RNA sequence abstract characteristics and the advantage of SVR on high-dimensional data regression analysis are combined, so that the prediction accuracy is improved. The invention also trains a base model on the reference data set, combines the transfer learning training skill, and only finely adjusts the parameters of the last two fully-connected layers when the independent test set is predicted, thereby improving the robustness of the model and the prediction accuracy of the small sample data set.
Drawings
FIG. 1 is a flow chart of a CNN-SVR prediction method of the present invention;
fig. 2 is a diagram of a CNN network architecture in the present invention;
FIG. 3 is a method of model training in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
The application discloses a CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR, which can accurately and robustly predict the editing efficiency of guide RNA. The continuous layers of the CNN network enable the model to automatically learn abstract features, and the last layer of the network can be regarded as a linear classifier operator for the features extracted by the previous hidden layer. Because many trainable parameters are included in the multi-layered perceptron after the CNN feature extraction layer, the CNN is not always the best choice for classification. SVR of fixed kernel function excels in handling such feature vectors and has significant advantages in minimizing generalization error. And (3) combining the CNN and the SVR to construct a CNN-SVR model, automatically extracting the characteristics of the sequence by the CNN, calculating a classifier function by the SVR in a high-dimensional characteristic space obtained by learning, carrying out regression analysis on the guide RNA sequence abstract characteristics extracted by the CNN, and outputting a predicted guide RNA editing efficiency value.
The method disclosed by the invention needs to train the CNN-SVR before using the CNN-SVR to predict CRISPR/Cas9 guide RNA editing efficiency. Therefore, the invention is divided into two parts, the first part is a training model, the second part is the prediction of the efficiency of the guide RNA editing of the test set, the main flow refers to fig. 1, the CNN network structure is shown in fig. 2, and the model training method flow is shown in fig. 3.
Aiming at data obtained by different platforms and experimental conditions, the model needs to be trained to ensure the effectiveness of the model. Training a model ab initio on a given training set, comprising the steps of:
firstly, integrating CRISPR/Cas9 guide RNA editing efficiency experiment data of different experiment platform open sources, and constructing a benchmark dataset.
Second, the reference data set is preprocessed by first applying a one-hot encodingConversion of guide RNA sequences each 23bp in length
Figure 918154DEST_PATH_IMAGE018
The binary matrix of (2). Secondly, the editing efficiency value of the guide RNA is normalized to obtain
Figure 424398DEST_PATH_IMAGE019
The efficiency value of the edit.
Third, the CNN network is trained ab initio on a reference data set. And optimizing the hyperparameter of the CNN by using network search. The hyper-parameter optimization is carried out in the following order: model weight initialization means ("zero", "he _ uniform", "uniform", "gloot _ uniform", "lect _ uniform", "normal", "he _ normal"), dropout regularization random deletion rate (0.2, 0.3, 0.4, 0.5, 0.6), batch size (64, 128, 256, 512), epoch number (50, 100, 200, 300). In order to avoid overfitting, a five-fold cross validation training model is adopted, and the parameter which enables the average loss value of the validation set to be minimum is selected as the optimal model.
And fourthly, optimizing the features extracted by the CNN by adopting a two-step method. First, the features extracted by CNN are sorted using the minimum redundant maximum correlation. Second, an optimal feature set is determined using a sequential forward search. Specifically, the features extracted by CNN are introduced into the training SVR from high to low according to the importance obtained by the minimum redundancy maximum correlation one by one, and the feature subset which enables AUROC to be maximum is selected as the optimal feature subset.
The fifth step: inputting the selected optimal feature subset into the SVR trained by the Gaussian radial basis kernel classifier for training and evaluation, and searching the optimal punishment parameter by using a grid search method
Figure 797741DEST_PATH_IMAGE015
Nuclear parameters
Figure 756339DEST_PATH_IMAGE016
And
Figure 83808DEST_PATH_IMAGE017
and obtaining the optimal SVR model according to the parameters. The parameter grid search range is as follows:
Figure 903997DEST_PATH_IMAGE020
Figure 630382DEST_PATH_IMAGE021
Figure 762811DEST_PATH_IMAGE022
and the second part is used for predicting the editing efficiency of the guide RNA of the test set, after the CNN-SVR is obtained by training, the editing efficiency of the guide RNA to be analyzed is predicted by the CNN-SVR, and the method comprises the following steps:
firstly, preprocessing an independent test set to be analyzed, firstly, converting each guide RNA sequence with the length of 23bp into a guide RNA sequence by using one-hot coding
Figure 76112DEST_PATH_IMAGE018
The binary matrix of (2). Secondly, the editing efficiency value of the guide RNA is normalized to obtain
Figure 870630DEST_PATH_IMAGE019
The efficiency value of the edit.
And secondly, performing predictive analysis on the data to be analyzed by combining a migration learning method, and migrating the pre-trained CNN model parameters on the reference data set to an independent test set. The method comprises the following steps: the parameters of the convolutional layer, the pooling layer and the first three fully-connected layers are all frozen, the parameters obtained by learning from the reference data set are migrated and learned, and the weights of the last two fully-connected layers are only finely adjusted. And adjusting and optimizing model parameters according to the Spearman correlation coefficient and the AUROC value until an optimal solution is obtained, and successfully training and storing the model.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (4)

1. A CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR is characterized by comprising the following steps:
s1: constructing a reference data set containing a guide RNA sequence and a binary label thereof, integrating data sets of different platforms, carrying out standardized pretreatment, and constructing a test set, wherein the test set comprises the guide RNA sequence and an editing efficiency value corresponding to the guide RNA sequence;
s2: carrying out unique hot coding on the guide RNA sequences of the reference data set and the test set to obtain a binary matrix;
s3: constructing a Convolutional Neural Network (CNN) model, and training basic network parameters of the CNN model by using the reference data set; optimizing hyper-parameters of the CNN by using grid search, wherein the hyper-parameters comprise dropout regularization random deletion rate, batch size and epoch times, and determining an optimal CNN model structure;
s4: sequencing the abstract features of the guide sequence extracted by the CNN according to importance by using the minimum redundant maximum correlation, and selecting the sequence features 13 bits before the importance for next training;
s5: inputting the abstract features into an SVR trained by a Gaussian radial basis kernel classifier for training, and searching an optimal punishment parameter C, a kernel parameter gamma and an ϵ parameter by using a grid search method to obtain an optimal SVR model;
s6: and training a CNN-SVR basic model from the beginning on a reference data set, and transferring the parameters of all layers outside the last two fully-connected layers processed by the obtained basic model to a specific test set for prediction.
2. The CNN-SVR based CRISPR/Cas9 guide RNA editing efficiency prediction method according to claim 1, wherein the performing of the normalization pre-treatment in step S1 comprises:
constructing a matrix of guide RNAs and editing efficiencies
Figure DEST_PATH_IMAGE002
The rows and columns of the matrix represent the number of experiments and guide RNAs, respectively,
Figure DEST_PATH_IMAGE004
is shown as
Figure DEST_PATH_IMAGE006
The secondary experiment guides the efficiency of editing RNA, normalized as defined below:
Figure DEST_PATH_IMAGE008
wherein, ynorIndicating the efficiency of normalized guide RNA editing,
Figure DEST_PATH_IMAGE010
the average value of each row is represented by,
Figure DEST_PATH_IMAGE012
the average value of each column is shown,
Figure DEST_PATH_IMAGE014
to represent
Figure DEST_PATH_IMAGE016
Average value of (a).
3. The CNN-SVR-based CRISPR/Cas9 guide RNA editing efficiency prediction method of claim 2, wherein in the step S2, the one-hot encoding represents the input sequence as
Figure DEST_PATH_IMAGE018
The binary matrix of (a) is obtained,
Figure DEST_PATH_IMAGE020
represents a nucleotide species including adenine A, guanine G, cytosine C and thymine T,
Figure DEST_PATH_IMAGE022
representing the length of the sequence, each position of the sequence being represented by a binary vector of length 4, where A, C, G and T are represented by
Figure DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE028
Figure DEST_PATH_IMAGE030
And (4) showing.
4. The CNN-SVR-based CRISPR/Cas9 guide RNA editing efficiency prediction method of claim 3, wherein the penalty parameter is
Figure DEST_PATH_IMAGE032
Is 1.7, nuclear parameter
Figure DEST_PATH_IMAGE034
Is 0.12 and
Figure DEST_PATH_IMAGE036
the parameter was 0.11.
CN202110639647.9A 2021-06-08 2021-06-08 CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR Pending CN113257359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110639647.9A CN113257359A (en) 2021-06-08 2021-06-08 CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110639647.9A CN113257359A (en) 2021-06-08 2021-06-08 CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR

Publications (1)

Publication Number Publication Date
CN113257359A true CN113257359A (en) 2021-08-13

Family

ID=77187119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110639647.9A Pending CN113257359A (en) 2021-06-08 2021-06-08 CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR

Country Status (1)

Country Link
CN (1) CN113257359A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115099275A (en) * 2022-06-29 2022-09-23 西南医科大学 Training method of arrhythmia diagnosis model based on artificial neural network
WO2023207686A1 (en) * 2022-04-29 2023-11-02 京东方科技集团股份有限公司 Gene editing result prediction method and apparatus, electronic device, program and medium
WO2024164131A1 (en) * 2023-02-07 2024-08-15 中国科学院脑科学与智能技术卓越创新中心 Method and system for predicting base editing efficiency

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110010194A (en) * 2019-04-10 2019-07-12 浙江科技学院 A kind of prediction technique of RNA secondary structure

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110010194A (en) * 2019-04-10 2019-07-12 浙江科技学院 A kind of prediction technique of RNA secondary structure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUISHAN ZHANG 等: "A Novel Hybrid CNN-SVR for CRISPR/Cas9 Guide RNA Activity Prediction", 《FRONTIERS IN GENETICS》 *
张向荣 等编著: "《模式识别》", 31 May 2019, 西安:西安电子科技大学出版社 *
张桂珊 等: "机器学习方法在CRISPR/Cas9 系统中的应用", 《遗传》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207686A1 (en) * 2022-04-29 2023-11-02 京东方科技集团股份有限公司 Gene editing result prediction method and apparatus, electronic device, program and medium
CN115099275A (en) * 2022-06-29 2022-09-23 西南医科大学 Training method of arrhythmia diagnosis model based on artificial neural network
WO2024164131A1 (en) * 2023-02-07 2024-08-15 中国科学院脑科学与智能技术卓越创新中心 Method and system for predicting base editing efficiency

Similar Documents

Publication Publication Date Title
CN111341386B (en) Attention-introducing multi-scale CNN-BilSTM non-coding RNA interaction relation prediction method
CN113257359A (en) CRISPR/Cas9 guide RNA editing efficiency prediction method based on CNN-SVR
US20220301658A1 (en) Machine learning driven gene discovery and gene editing in plants
CN114927162A (en) Multi-set correlation phenotype prediction method based on hypergraph representation and Dirichlet distribution
CN111400494B (en) Emotion analysis method based on GCN-Attention
US11861491B2 (en) Deep learning-based pathogenicity classifier for promoter single nucleotide variants (pSNVs)
CN108427865B (en) Method for predicting correlation between LncRNA and environmental factors
Huang et al. Harnessing deep learning for population genetic inference
CN103020979A (en) Image segmentation method based on sparse genetic clustering
CN115908909A (en) Evolutionary neural architecture searching method and system based on Bayes convolutional neural network
El-Tohamy et al. A deep learning approach for viral DNA sequence classification using genetic algorithm
EP4032093B1 (en) Artificial intelligence-based epigenetics
CN117216656A (en) 4mC site recognition algorithm based on pruning pre-training model and artificial feature code fusion
CN116343908B (en) Method, medium and device for predicting protein coding region by fusing DNA shape characteristics
Sanchez Reconstructing our past˸ deep learning for population genetics
Lahmer et al. Classification of DNA microarrays using deep learning to identify cell cycle regulated genes
CN115691817A (en) LncRNA-disease association prediction method based on fusion neural network
CN117012280A (en) Method for constructing DNA sequence pre-training language model and application thereof
Kristensen et al. Classification of DNA Sequences by a MLP and SVM Network
Chowdhury et al. Cell type identification from single-cell transcriptomic data via gene embedding
Dong et al. Cell type identification from single-cell transcriptomic data via semi-supervised learning
CN116994645B (en) Prediction method of piRNA and mRNA target pair based on interactive reasoning network
Guo et al. Deep Effective k-mer representation learning for polyadenylation signal prediction via co-occurrence embedding
de Abreu Development of DNA sequence classifiers based on deep learning
CN118629494A (en) Genome prediction method based on Transformer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813

RJ01 Rejection of invention patent application after publication