CN112863599B - Automatic analysis method and system for virus sequencing sequence - Google Patents

Automatic analysis method and system for virus sequencing sequence Download PDF

Info

Publication number
CN112863599B
CN112863599B CN202110271331.9A CN202110271331A CN112863599B CN 112863599 B CN112863599 B CN 112863599B CN 202110271331 A CN202110271331 A CN 202110271331A CN 112863599 B CN112863599 B CN 112863599B
Authority
CN
China
Prior art keywords
sequence
virus
sequencing
genome
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110271331.9A
Other languages
Chinese (zh)
Other versions
CN112863599A (en
Inventor
刘健
孙嘉良
陈娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202110271331.9A priority Critical patent/CN112863599B/en
Publication of CN112863599A publication Critical patent/CN112863599A/en
Application granted granted Critical
Publication of CN112863599B publication Critical patent/CN112863599B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Abstract

The invention discloses an automatic analysis method and system of a virus sequencing sequence, which comprises the following steps: performing quality control and sequence assembly on the virus sequencing sequence to obtain a virus genome long sequence; after coding the virus genome long sequence, adopting a pre-trained deep learning network model to carry out type identification; annotation of viral sequencing sequences was performed based on sequence alignment of the viral genome long sequence to the reference genome. Aiming at the problems of a large amount of increased virus sequencing data and a large amount of occupied hard disk space, the invention introduces deep learning to construct an identification model, and provides a virus annotation function while realizing virus type identification.

Description

Automatic analysis method and system for virus sequencing sequence
Technical Field
The invention relates to the technical field of gene sequencing analysis, in particular to an automatic analysis method and system of a virus sequencing sequence.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
A plurality of new viruses with large-scale lethality for human beings, such as SARS (severe acute respiratory syndrome), influenza A virus H1N1, MERS (middle east respiratory syndrome), ebola virus and the like, have appeared in the last two decades, but the current research on virus identification is not enough. Existing virus identification tools are usually identified based on BLAST alignment with genome databases or protein databases, but as virus data grows in multiples or even exponential order, the speed processing of this approach becomes progressively slower, so in the face of the large growing amount of virus sequencing data, existing approaches have failed to meet the virus identification requirements; in addition, due to the rapid increase in the amount of virus sequencing data, the storage of databases used in sequence-based alignment methods also takes up more and more hard disk space.
Disclosure of Invention
Aiming at the problems that a large amount of increased virus sequencing data volume and a large amount of occupied hard disk space are provided, the invention introduces deep learning to construct an identification model, realizes virus type identification and provides a virus annotation function.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for automated analysis of viral sequencing sequences, comprising:
performing quality control and sequence assembly on the virus sequencing sequence to obtain a virus genome long sequence;
after coding the virus genome long sequence, adopting a pre-trained deep learning network model to carry out type identification;
annotation of viral sequencing sequences was performed based on sequence alignment of the viral genome long sequence to the reference genome.
In a second aspect, the present invention provides an automated analysis system for sequencing a virus, comprising:
the data preprocessing module is configured to perform quality control and sequence assembly on the virus sequencing sequence to obtain a virus genome length sequence;
the identification module is configured to encode the virus genome long sequence and then carry out type identification by adopting a pre-trained deep learning network model;
an annotation module configured to perform annotation of the viral sequencing sequence according to the sequence alignment of the viral genome long sequence and the reference genome.
In a third aspect, the invention provides computer readable instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
aiming at the problem of species identification and identification of a single species sequencing sequence, the invention provides a deep learning-based multi-classification classifier, aiming at the quantity of a large amount of increased virus sequencing data, a deep learning method is introduced to identify the types of viruses, and compared with the traditional identification method which needs to be compared with a large amount of virus genomes, the invention can greatly improve the identification speed.
The invention utilizes the identification model obtained by deep learning method training to replace a large amount of virus databases occupying hard disk space, so that the hard disk space required to be occupied is obviously reduced.
The invention not only realizes the identification of virus species through deep learning, but also provides a virus annotation function, and realizes several annotation functions of evolutionary tree analysis, traceability prediction, mutation detection and protein function annotation.
The invention introduces the identification and classification method of deep learning, the speed of which can not be obviously slowed down along with the increase of data in a real database, and virus data characteristics are abstracted, thereby solving the problem that the database based on the prior method occupies a large amount of hard disk space and obviously improving the analysis efficiency of virus identification.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flowchart of a method for automated analysis of a virus sequencing sequence provided in example 1 of the present invention;
fig. 2 is a diagram of a deep learning network model structure provided in embodiment 1 of the present invention;
fig. 3 is a branch flow chart of the network model provided in embodiment 1 of the present invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example 1
As shown in fig. 1, this example provides a method for automated analysis of viral sequencing sequences, comprising:
s1: performing quality control and sequence assembly on the virus sequencing sequence to obtain a virus genome long sequence;
s2: after coding the virus genome long sequence, adopting a pre-trained deep learning network model to carry out type identification;
s3: annotation of viral sequencing sequences was performed based on sequence alignment of the viral genome long sequence to the reference genome.
In view of the high correlation between the accuracy of the similarity calculation of the accurate and complete data set and the gene sequence and the efficiency of deep learning, this example aims to obtain a high-quality data set in which 9569 virus genome sequences belonging to 137 families, respectively, were downloaded from NCBI (National Center for Biotechnology Information) FTP;
in the step S1, the method specifically includes:
s1-1: the quality control aims at filtering low-quality sequences, and the low-quality sequences mean that wrong bases can be contained in the sequences, so the quality of the obtained genome data is evaluated, a quality evaluation report containing indexes such as high-quality base proportion, average quality and GC content is generated, and the virus genome sequences are subjected to the operations of de-adapter and primer sequence;
preferably, external software such as fastp, fastQC, trimmatic, cutadapt, and simple is used for quality control operations.
S1-2: the sequence assembly is used for assembling the short sequences after the quality control into virus genome long sequences contigs;
preferably, external assembly software such as MEGAHIT, velvet, SPAdes and Canu is adopted for sequence assembly;
preferably, after obtaining the assembled long sequence contigs, the present embodiment may further determine the assembly quality by using evaluation indexes such as the total contigs length, N50, and the average length of contigs, so as to determine the reliability of using contigs.
In this embodiment, for the training set used in the deep learning network model, ONE thousand sequences with a length of 5000 in 137 families were randomly selected, and the ONE-HOT coding was used to encode the base sequences in the viral genome and input the encoded base sequences into the deep learning network model.
Preferably, in the present embodiment, a new identification method based on deep learning is proposed, and a multi-class convolutional neural network model including multiple parallel branch networks is constructed using a multi-class model of the convolutional neural network CNN and a residual error network, as shown in fig. 2 to 3, the whole model is composed of different parallel branches, each branch is similar to a small independent network, and each branch uses a different architecture, which can help the neural network to learn richer features of a genome sequence.
The multi-classification convolution neural network model has the specific structure as follows:
(1) On the main branch with the deepest depth, the number of layers is set to be deeper than that of other branches, so that the training result is more accurate, and the activation functions in all the convolutional layers are set to be ReLu;
(2) To alleviate the overfitting problem, the present embodiment sets the regularizer parameter (regularizer) in the hidden layer to 0.001;
(3) To counteract the problem of gradient disappearance due to too deep a depth, the present embodiment adds a residual join on the main branch;
(4) This example selects Nadam as the optimizer (optimizer) of model training, nadam is RMSprop algorithm with Nesterov momentum;
(5) The present embodiment selects the classification cross entropy (canonical cross entropy) as the loss function;
(6) At the top of all the branches, the present embodiment combines the outputs of all the branches using a connecting layer, then passes through two fully connected full connecting layers (sense layers), and finally outputs 137 scores after the softmax layer, and takes softmax as an activation function to represent the final classification result.
Preferably, the training of the deep learning network model comprises: and constructing a training set after performing feature engineering on the reference genome, and training the network model by adopting the training set.
Preferably, the virus sequencing sequence is identified according to the trained deep learning network model, the probability that the virus sequencing sequence belongs to each family (biological classification level) is output, and the family with the highest probability is taken as the type of the virus sequencing sequence.
When the virus sequencing sequence is identified according to the multi-classification convolutional neural network model, the deep learning identification and the traditional sequence comparison method can be integrated, the virus identification is realized by using the sequence comparison method of the traditional BLAST software, and the virus sequencing sequence is combined with the deep learning method to realize complementation, so that the final identification accuracy is improved; for the sequence alignment method, this example combines the sequence indices that construct the reference genome for the sequence alignment function of the BLAST software;
preferably, BLAST is aligned with the database to obtain a result file containing the parameters Query id, subject id,% identity, alignment length, etc., to predict the type of virus sequenced sequence based on the reference genome and the class to which the reference genome belongs.
In this embodiment, the similarity (% identity) and alignment length (alignment length) of each reference genome (subject acc. Ver) are evaluated by using a conventional sequence alignment method, and the product of the alignment length and the similarity is used as an evaluation score, i.e., the sequence similarity between the virus sequencing sequence and the reference genome;
preferably, the evaluation score obtained by multiplying the alignment length by the similarity is:
Figure BDA0002974565040000071
wherein, the identity is the sequence similarity, the alignment length is the length of the contigs aligned with the reference genome, and the accessoversion is the sequence number of the reference genome aligned with the input sequence.
It is understood that the similarity calculation can also be performed by using a sequence alignment method.
In step S3, the annotation of the virus sequencing sequence includes the construction of a phylogenetic tree, specifically: according to the sequence similarity between the virus sequencing sequence and each reference gene sequence, selecting the reference gene sequence of N sites before the sequence similarity, and constructing a phylogenetic tree;
preferably, the genetic distance between the reference genomic sequences is calculated by MEGA software, and the phylogenetic tree is drawn according to the genetic distance by using the ete module of Python.
In this embodiment, the annotation of the virus sequencing sequence further includes mutation detection, specifically: obtaining a reference gene sequence with the highest sequence similarity, and comparing long sequence contigs assembled by virus sequencing sequences with the reference gene sequence; and judging the possible gene variation information of the virus sequencing sequence relative to the reference genome according to the positions of different bases in the comparison result.
Preferably, the positions of different bases are extracted from the alignment results using FreeBayes to generate a VCF file.
In this embodiment, the annotation of the virus sequencing sequence further includes a source-tracing prediction, specifically: selecting a reference gene sequence N bits before the sequence similarity, and because the possibility that the sequences with higher similarity have the same host and origin is higher, the present example speculates the host and origin of the virus sequencing sequence by using the host and origin information of the reference genome obtained from NCBI; specifically, a reference gene sequence with N positions before the sequence similarity is selected, and the information of a host and a source of the reference gene sequence is obtained from a local collection traceability information data set, so that a traceability prediction result is obtained.
In this embodiment, the annotation of the viral sequencing sequence further includes protein function annotation, specifically: combining the assembled genome long sequence contigs with information in KEGG and GeneOntology to generate protein annotation information for contigs, wherein the protein annotation information comprises annotation information related to the retrieved genome, such as the gene name, the best matching protein and the predicted gene name;
preferably, the present embodiment integrates EggNOG-mapper software as a component of the annotation function.
In the embodiment, identification of virus species is realized through deep learning, and besides, a plurality of annotation functions such as evolutionary tree analysis, traceability prediction, mutation detection and protein function annotation are also realized.
Example 2
The present embodiment provides an automated analysis system for virus sequencing sequences, comprising:
the data preprocessing module is configured to perform quality control and sequence assembly on the virus sequencing sequence to obtain a virus genome long sequence;
the identification module is configured to encode the virus genome long sequence and then carry out type identification by adopting a pre-trained deep learning network model;
an annotation module configured to perform annotation of the viral sequencing sequence according to the sequence alignment of the viral genome long sequence and the reference genome.
It should be noted that the modules correspond to the steps described in embodiment 1, and the modules are the same as the corresponding steps in the implementation examples and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In further embodiments, there is also provided:
computer readable instructions which, when executed by a processor, perform the method of embodiment 1.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as a combination of computer software and electronic hardware. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (11)

1. A method for automated analysis of a viral sequencing sequence, comprising:
obtaining a virus genome long sequence after performing quality control and sequence assembly on a virus sequencing sequence;
after coding the virus genome long sequence, adopting a pre-trained deep learning network model to carry out type identification;
integrating a deep learning network model and a traditional sequence comparison method, identifying the virus type by using the traditional sequence comparison method, and realizing complementation by combining the deep learning network model, thereby improving the final identification accuracy;
in the traditional sequence comparison method, the similarity and the alignment length of a reference genome sequence are evaluated, and the sequence similarity of a virus genome long sequence and the reference genome sequence is obtained according to the product of the alignment length and the similarity; predicting the type of the virus sequencing sequence according to the reference genome sequence and the category to which the reference genome sequence belongs;
annotating the virus sequencing sequence according to the sequence similarity obtained by comparing the sequence of the virus genome long sequence with the sequence of the reference genome sequence;
the deep learning network model is a multi-classification convolutional neural network model which is constructed by using a multi-classification model of a convolutional neural network and a residual error network and comprises a plurality of parallel branch networks;
in the multi-classification convolutional neural network model:
one branch network in the multiple parallel branch networks has a depth larger than that of other branch networks, and residual connection is added on a main branch network with the deepest depth;
and at the top of all the branch networks, combining the outputs of all the branch networks by using a connecting layer, then passing through two fully-connected layers, and finally outputting the classification result through a softmax layer.
2. The method of claim 1, wherein the quality control is performed by performing de-adaptor and de-primer operations on the virus sequencing sequence.
3. The method of claim 1, wherein the sequence assembly is performed by assembling short sequences into long sequences to obtain long sequences of the viral genome.
4. The method of claim 1, wherein the base sequence of the long sequence of the genome of the virus is encoded.
5. The method of claim 1, wherein the reference genome sequence is subjected to feature engineering to construct a training set, and the deep learning network model is trained by using the training set.
6. The method of claim 1, wherein the type identification comprises: and identifying the virus sequencing sequence according to a pre-trained deep learning network model, outputting the probability that the virus sequencing sequence belongs to each family, and taking the family with the highest probability as the type of the virus sequencing sequence.
7. The method of claim 1, wherein the annotation of the viral sequencing sequence comprises obtaining a reference genomic sequence N bits before the sequence similarity, and calculating the genetic distance between the reference genomic sequences to construct the phylogenetic tree.
8. The method of claim 1, wherein the annotation of the viral sequencing sequence comprises obtaining a reference genomic sequence with the highest sequence similarity, comparing the viral sequencing sequence with the reference genomic sequence, and determining the genetic variation information of the viral sequencing sequence relative to the reference genomic sequence according to the positions of different bases in the comparison result.
9. The method of claim 1, wherein the annotation of the viral sequencing sequence comprises a protein function annotation comprising a retrieved gene name, a best-matching protein, and a predicted gene name.
10. An automated analysis system for viral sequencing sequences, comprising:
the data preprocessing module is configured to perform quality control and sequence assembly on the virus sequencing sequence to obtain a virus genome long sequence;
the identification module is configured to encode the virus genome long sequence and then carry out type identification by adopting a pre-trained deep learning network model;
integrating a deep learning network model and a traditional sequence comparison method, identifying the virus type by using the traditional sequence comparison method, and realizing complementation by combining the deep learning network model, thereby improving the final identification accuracy;
in the traditional sequence comparison method, the similarity and the alignment length of a reference genome sequence are evaluated, and the sequence similarity of a virus genome long sequence and the reference genome sequence is obtained according to the product of the alignment length and the similarity; predicting the type of the virus sequencing sequence according to the reference genome sequence and the category to which the reference genome sequence belongs;
the deep learning network model is a multi-classification convolutional neural network model which is constructed by using a multi-classification model of a convolutional neural network and a residual error network and comprises a plurality of parallel branch networks;
in the multi-classification convolutional neural network model:
one branch network in the multiple parallel branch networks has a depth larger than that of other branch networks, and residual connection is added on a main branch network with the deepest depth;
combining the outputs of all the branch networks by using a connecting layer at the tops of all the branch networks, then passing through two fully-connected full-connecting layers, and finally outputting a classification result through a softmax layer;
an annotation module configured to perform annotation of the viral sequencing sequence according to sequence similarity obtained from sequence alignment of the viral genome long sequence and the reference genome sequence.
11. Computer readable instructions, wherein said computer readable instructions, when executed by a processor, perform the method of any of claims 1-9.
CN202110271331.9A 2021-03-12 2021-03-12 Automatic analysis method and system for virus sequencing sequence Active CN112863599B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110271331.9A CN112863599B (en) 2021-03-12 2021-03-12 Automatic analysis method and system for virus sequencing sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110271331.9A CN112863599B (en) 2021-03-12 2021-03-12 Automatic analysis method and system for virus sequencing sequence

Publications (2)

Publication Number Publication Date
CN112863599A CN112863599A (en) 2021-05-28
CN112863599B true CN112863599B (en) 2022-10-14

Family

ID=75994361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110271331.9A Active CN112863599B (en) 2021-03-12 2021-03-12 Automatic analysis method and system for virus sequencing sequence

Country Status (1)

Country Link
CN (1) CN112863599B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299345A (en) * 2021-06-30 2021-08-24 中国人民解放军军事科学院军事医学研究院 Virus gene classification method and device and electronic equipment
US20230108229A1 (en) * 2021-09-27 2023-04-06 International Business Machines Corporation Prediction of interference with host immune response system based on pathogen features
CN114334010B (en) * 2021-12-27 2024-03-22 山东第一医科大学(山东省医学科学院) Automatic identification method and system for classification of bunyas related virus species
CN114512182A (en) * 2022-03-23 2022-05-17 厦门大学 Method and system for virus gene recognition and host prediction

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894211A (en) * 2010-06-30 2010-11-24 深圳华大基因科技有限公司 Gene annotation method and system
CN106295245A (en) * 2016-07-27 2017-01-04 广州麦仑信息科技有限公司 The method of storehouse noise reduction own coding gene information feature extraction based on Caffe
CN109906274A (en) * 2016-11-08 2019-06-18 赛卢拉研究公司 Method for cell marking classification
CN110349630A (en) * 2019-06-21 2019-10-18 天津华大医学检验所有限公司 Analysis method, device and its application of the macro gene order-checking data of blood
CN110363003A (en) * 2019-07-25 2019-10-22 哈尔滨工业大学 A kind of Android virus static detection method based on deep learning

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194208B (en) * 2017-04-25 2020-10-02 荣联科技集团股份有限公司 Gene analysis annotation method and device
CN107267613B (en) * 2017-06-28 2020-10-27 安吉康尔(深圳)科技有限公司 Sequencing data processing system and SMN gene detection system
CN109712669B (en) * 2018-12-05 2022-10-21 上海美吉生物医药科技有限公司 Protein function annotation method and system
CN111462821B (en) * 2020-04-10 2022-02-22 广州微远医疗器械有限公司 Pathogenic microorganism analysis and identification system and application
CN111785328B (en) * 2020-06-12 2021-11-23 中国人民解放军军事科学院军事医学研究院 Coronavirus sequence identification method based on gated cyclic unit neural network
CN111933218B (en) * 2020-07-01 2022-03-29 广州基迪奥生物科技有限公司 Optimized metagenome binding method for analyzing microbial community
CN112365929A (en) * 2020-10-19 2021-02-12 北京大学 Method for analyzing microbial population induction effect based on metagenome data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894211A (en) * 2010-06-30 2010-11-24 深圳华大基因科技有限公司 Gene annotation method and system
CN106295245A (en) * 2016-07-27 2017-01-04 广州麦仑信息科技有限公司 The method of storehouse noise reduction own coding gene information feature extraction based on Caffe
CN109906274A (en) * 2016-11-08 2019-06-18 赛卢拉研究公司 Method for cell marking classification
CN110349630A (en) * 2019-06-21 2019-10-18 天津华大医学检验所有限公司 Analysis method, device and its application of the macro gene order-checking data of blood
CN110363003A (en) * 2019-07-25 2019-10-22 哈尔滨工业大学 A kind of Android virus static detection method based on deep learning

Also Published As

Publication number Publication date
CN112863599A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN112863599B (en) Automatic analysis method and system for virus sequencing sequence
Rasheed et al. Metagenomic taxonomic classification using extreme learning machines
CN111462820A (en) Non-coding RNA prediction method based on feature screening and integration algorithm
CN110993113A (en) LncRNA-disease relation prediction method and system based on MF-SDAE
Rasheed et al. LSH-Div: Species diversity estimation using locality sensitive hashing
Chi et al. Research on the mechanism of soybean resistance to phytophthora infection using machine learning methods
Nasser et al. Multiple sequence alignment using fuzzy logic
CN114927163A (en) Method for predicting genetic model based on single cell map and storage medium
CN116312783A (en) System for predicting DNA synthesis difficulty and application thereof
CN114822694A (en) Long non-coding RNA recognition method based on CatBOost algorithm
CN111755074A (en) Method for predicting DNA replication origin in saccharomyces cerevisiae
Zhang et al. DLmeta: a deep learning method for metagenomic identification
CN115240775B (en) Cas protein prediction method based on stacking integrated learning strategy
Souliotis Bayesian and machine learning approaches in metagenomics
Rasheed et al. TAC-ELM: Metagenomic Taxonomic Classification with Extreme Learning Machines.
US20220367011A1 (en) Identification of unknown genomes and closest known genomes
CN115910216B (en) Method and system for identifying genome sequence classification errors based on machine learning
Deng Algorithms for reconstruction of gene regulatory networks from high-throughput gene expression data
Rasheed Data mining framework for metagenome analysis
JU Computational Methods for the Analysis of Genomic and Proteomic Sequences
Fichte et al. Deep Clustering for Metagenomic Binning
Walker Iterative Random Forest Based High Performance Computing Methods Applied to Biological Systems and Human Health
김수민 Comparison of metagenomics contig binning methods
Hossain Multi-label Deep Learning Models for Virus Genome DNA Sequence Classification
Shaw Prediction of Isoform Functions and Interactions with ncRNAs via Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant