CN108595913A - Differentiate the supervised learning method of mRNA and lncRNA - Google Patents

Differentiate the supervised learning method of mRNA and lncRNA Download PDF

Info

Publication number
CN108595913A
CN108595913A CN201810449074.1A CN201810449074A CN108595913A CN 108595913 A CN108595913 A CN 108595913A CN 201810449074 A CN201810449074 A CN 201810449074A CN 108595913 A CN108595913 A CN 108595913A
Authority
CN
China
Prior art keywords
lncrna
mrna
sequence
model
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810449074.1A
Other languages
Chinese (zh)
Other versions
CN108595913B (en
Inventor
文江辉
邓兵
柳叶舒
石雨
肖新平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201810449074.1A priority Critical patent/CN108595913B/en
Publication of CN108595913A publication Critical patent/CN108595913A/en
Application granted granted Critical
Publication of CN108595913B publication Critical patent/CN108595913B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Abstract

The invention discloses a kind of supervised learning methods differentiating mRNA and lncRNA, including step:Using mRNA the and lncRNA data of people in Genecode databases as training set and test set, transcript sequence is converted to k mer sequences;The frequencies of various k mer in the sequence in every sequence are counted, are then normalized;K mer frequencies are configured to matrix form, the input as convolutional neural networks model convolutional layer;Using the mRNA and lncRNA of people in the convolutional neural networks training Genecode databases built, training and test mRNA and lncRNA determine model parameter, are used for the lncRNA or mRNA of Accurate Prediction people.The present invention is distinguished according to the kmer features of mRNA and lncRNA, propose a kind of differentiation lncRNA and mRNA method based on convolutional neural networks supervised learning kmer features, the effective advantage for incorporating convolutional neural networks, using the k mer of sequence as mode input, for discriminating model accuracy after training up to 98% or more, the biological function for further analysis lncRNA sequences has established good basis.

Description

Differentiate the supervised learning method of mRNA and lncRNA
Technical field
The present invention relates to biotechnology, it is related to predicting the category attribute aspect skill of the transcript sequence of unknown human Art especially identifies the lncRNA sequences or mRNA sequence technology of unknown human, in particular to a kind of discriminating mRNA and lncRNA Supervised learning method.
Background technology
Long-chain non-coding RNA (long non-coding RNA, lncRNA) is that one kind is present in eucaryote, length More than 200 nucleotide, while the non-coding RNA of any protein coding potential is not shown.Initially they are considered as RNA The by-product of polymerase II transcription, without biological function.But go deep into research, it has been found that:Although lncRNAs sheets Body not coding protein, but plays very important effect, it is joined in the form of RNA in Eukaryotic gene expression regulation It is modified with organism genomic imprinting, chromatin, transcriptional activation, transcription interference, a variety of important regulation and control such as transport in core Journey, to play its biological function.Usually between 200nt~100000nt, structure is similar to the length of lncRNAs MRNAs has poly (A) tails and promoter structure by montage.Because there is centainly similar in lncRNAs and mRNAs Property, and there is biological function in lncRNAs and mRNAs, therefore identification to lncRNAs and mRNAs and distinguish work particularly It is important.
Currently, the method for identification lncRNA is more.Its main flow is the feature of first abstraction sequence, such as the opening of sequence Then the sequence structure features such as reading frame, protein sequence similitude utilize machine learning method, to the sequence signature of extraction into Row training, to obtain the identification model of lncRNA and mRNA.The identification model of wherein better performances have CNCI, CPC, CSF and PhyloCSF etc., they all achieve good effect on respective training dataset and validation data set.
However the method for abstraction sequence feature training is excessively high for the sequencing quality dependence of sequence, and come from current research It sees, high throughput sequencing technologies may not be able to ensure the accuracy of obtained sequence, and the hair of mistake is sequenced in sequencing procedure Raw and different sequencing depth and sequencing preference can all impact sequencing quality.Certainly, also some scholars consider These problems, and propose the recognizer independent of sequence quality, but these methods but be difficult to escape model it is cumbersome, Computationally intensive and high time cost denounces.Therefore in the base for not depending on sequence quality and sequence biological structure feature On plinth, while reduced model avoids largely calculating again, proposes a kind of model of effective classification lncRNA and mRNA.
Invention content
It is special according to the kmer of mRNA and lncRNA the purpose of the present invention is to solve deficiency existing for above-mentioned background technology Sign difference, and the supervised learning method of the discriminating mRNA and lncRNA based on convolutional neural networks proposed a kind of.
To achieve the above object, the supervised learning method of the discriminating mRNA and lncRNA designed by the present invention, it is special Place is, includes the following steps:
1) mRNA manned under in Genecode databases transcription notebook data and lncRNA transcribe notebook data, and selection meets The mRNA sequence and lncRNA of length requirement are as experimental data;
2) each transcript sequence sample in experimental data is converted to k-mer sequences, wherein k is the nature more than 0 Number;
3) each frequency of k-mer sequence fragments type in the sequence in every sequence is counted, is then normalized Processing, finds out frequency of each k-mer sequence appeared in the k-mers sequences in every sequence, k-mers in every sequence The sum of frequency be 1;
4) k-mers frequencies are configured to matrix form, as the input of convolutional neural networks model convolutional layer, then divided It is not used as the full articulamentum of activation primitive by convolutional calculation layer, pond computation layer and use softmax functions, builds convolution god Through network model framework;
5) experimental data is divided into model training sample data set and model measurement sample data set, utilizes model training sample Notebook data set pair convolutional neural networks model is trained, and obtains disaggregated model;
6) by adjusting the parameter of convolutional neural networks model and k values, optimize convolutional neural networks model, and utilize model Test sample data set verifies classification accuracy, to Accurate Prediction mRNA and lncRNA sequence.
Preferably, the tool of the mRNA sequence and lncRNA that meet length requirement as experimental data is chosen in the step 1) Body step is:Under 2000~10000 sequences are randomly selected in manned mRNA transcription notebook datas and lncRNA transcription notebook datas Row analyze sequence length, determine the length range of lncRNA and mRNA, and then mRNA manned under transcribes notebook data With lncRNA transcribe notebook data in meet lncRNA and mRNA length range data in randomly select mRNA sequence and LncRNA is as experimental data.
Preferably, the first layer of convolutional neural networks model uses 32 sizes for the convolution kernel of 3*3 in the step 4), Relu activation primitives are chosen, ensure that the size of matrix before and after convolutional calculation is constant in a manner of 0 progress periphery filling;The second layer is adopted The convolution kernel for being 3*3 with 64 sizes chooses Relu activation primitives;Third layer is maximum pond layer, and pond area size is 2* 2, with the connection of 0.25 probability Dropout partial nerve members between pond layer and full articulamentum;Last layer is to connect entirely Layer, using 128 neurons, taken after being connect entirely with pond layer with 0.5 probability to full articulamentum and output layer neuron it Between connection carry out Dropout, finally obtain classification results using softmax functions as activation primitive.
Preferably, the quantity of model training sample data set is no less than 10000 in the step 5), model measurement sample The quantity of data set is no less than 1000.
Preferably, the value of k is 1,2,3 in the step 2).
Preferably, the length range of the lncRNA and mRNA is respectively 250nt~3500nt and 200nt~4000nt.
The present invention is distinguished according to the kmer features of mRNA and lncRNA, it is proposed that one kind having supervision based on convolutional neural networks Differentiation lncRNA and the mRNA method for learning kmer features, the biological function further to analyze lncRNA sequences have been established good Good basis.This method effectively incorporates the advantage of convolutional neural networks, is trained using the convolutional neural networks built The mRNA and lncRNA of people in Genecode databases, training and test mRNA and lncRNA, using the k-mer of sequence as mould Type inputs, and determines model parameter, is used for the lncRNA or mRNA of Accurate Prediction people, the discriminating model accuracy after training is up to 98% More than, it lays the foundation for follow-up study.
Description of the drawings
Fig. 1 is the flow chart for the supervised learning method for differentiating mRNA and lncRNA.
Fig. 2 is the structural schematic diagram of convolutional neural networks in the embodiment of the present invention.
Fig. 3 is that the flow of convolutional neural networks training and test mRNA and lncRNA sequences is utilized in the embodiment of the present invention Figure.
Specific implementation mode
In order to make technical scheme of the present invention, technical characterstic and advantage statement become apparent from, in conjunction with embodiment, to the present invention It is further elaborated.
As shown in Figure 1, the supervised learning method of discriminating mRNA and lncRNA proposed by the present invention a kind of, including walk as follows Suddenly:
1) mRNA manned under in Genecode databases transcription notebook data and lncRNA transcribe notebook data, and selection meets The mRNA sequence and lncRNA of length requirement are as experimental data.
The data set that the present invention downloads includes that 27720 lncRNA transcribe notebook data and 199324 mRNA transcript numbers According to.In order to make the data volume of lncRNA and mRNA reach balanced, we are respectively from 27720 lncRNA data and 199324 Random selection 5000lncRNA sequences and mRNA sequence respectively are for statistical analysis to the length of sequence in mRNA sequence, true respectively It is 250nt~3500nt and 200nt~4000nt respectively to have determined lncRNA and mRNA as the length range of test sample.Then Randomly select reality of 20000 sequences for meeting length (length range) requirement as lncRNA and mRNA respectively from data set Test data.Wherein, the training sample data of 15000 lncRNA transcription notebook datas and mRNA transcription notebook datas as model are chosen Test sample data set of the notebook datas as model is transcribed in collection, in addition each selection respectively 5000.
2) each transcript sequence sample in experimental data is converted to k-mer sequences, wherein k is the nature more than 0 Number.
Convert each transcript sequence sample in experimental data to k-mer sequences.By taking k=6 as an example, the 6- of sequence Mer subsequence segments have 4096 kinds, by its being ranked sequentially according to A, T, C, G, then the collection of 6-mer subsequences is combined into AAAAAA, AAAAAT, AAAAAC, AAAAAG, AAAATA, AAAATT, AAAATC, AAAATG, AAAACA, AAAACT ..., GGGGGG }.
3) each frequency of k-mer sequence fragments type in the sequence in every sequence is counted, is then normalized Processing, finds out frequency of each k-mer sequence appeared in the k-mers sequences in every sequence, k-mers in every sequence The sum of frequency be 1.By taking k=6 as an example, count respectively 6-mer subsequences AAAAAA, AAAAAT, AAAAAC, AAAAAG, AAAATA, AAAATT, AAAATC, AAAATG, AAAACA, AAAACT ..., GGGGGG in each 6-mer frequency, Then the ratio that all 6-mers sums in this sequence are occupied according to each 6-mer calculates each 6-mer and goes out in the sequence Existing frequency.
4) k-mers frequencies are configured to matrix form, the input as convolutional neural networks model convolutional layer.Then divide It is not used as the full articulamentum of activation primitive by convolutional calculation layer, pond computation layer and use softmax functions, builds the present invention Convolutional neural networks model framework, including two convolutional layers, a pond layer and a full articulamentum composition.
Specific convolutional neural networks model framework such as Fig. 2, corresponding process flow diagram flow chart such as Fig. 3.Convolutional neural networks mould The first layer of type uses 32 sizes for the convolution kernel of 3*3, chooses Relu activation primitives, is protected in a manner of 0 progress periphery filling The size of matrix is constant before and after card convolutional calculation.The second layer is still convolutional layer, uses 64 sizes for the convolution kernel of 3*3, is swashed Function living is still Relu functions.Third layer is maximum pond layer, and pond area size is 2*2.Pond layer and full articulamentum it Between with the connection of 0.25 probability Dropout partial nerve members, to prevent over-fitting the phenomenon that occurs.Last layer is to connect entirely Layer, full articulamentum has 128 neurons, take connection with 0.5 probability between full articulamentum and output layer neuron into Row Dropout finally obtains classification results using softmax functions as activation primitive.Loss letter during model training Number selection is cross entropy loss function, and optimizer is then Adadelta.
5) notebook data is transcribed as the training sample of model using the 15000 lncRNA transcription notebook datas and mRNA chosen Data set is trained convolutional neural networks, obtains disaggregated model.
6) by adjusting the parameter of convolutional neural networks model and k values, optimize convolutional neural networks model, and utilize model Test sample data set verifies classification accuracy, to Accurate Prediction mRNA and lncRNA sequence.
By adjusting the parameter and k values of convolutional neural networks model, finally determine that optimal k values are combined as k=1,2,3, this When under the calculated case of 10 times of cross validations, the loss function value average out to 0.0430 of training set, average classification accuracy is 0.9872.The loss function average value of verification collection is 0.0431, and average classification accuracy is 0.9790.
Finally, under above-mentioned the same terms, we are also with random forest (RF), logistic regression (LR), decision tree (DT) data are trained and are verified with support vector machines (SVM) these four machine learning methods, the classifying quality of each model With convolutional neural networks (CNN) comparison such as the following table 1:
The classifying quality comparing result of 1 each model of table
As a result, table 1 illustrate model proposed by the present invention in model accuracy, accuracy rate and recall rate compared with the prior art its He has a distinct increment at method.
It will be understood by those of skill in the art that specific embodiments described herein, which is only used, explains patent of the present invention, and It is not used in limitation patent of the present invention.Any modification for being made within the spirit and principle of patent of the present invention and changes equivalent replacement Into etc., it should be included among the protection domain of patent of the present invention.

Claims (6)

1. a kind of supervised learning method differentiating mRNA and lncRNA, it is characterised in that:Include the following steps:
1) mRNA manned under in Genecode databases transcription notebook data and lncRNA transcribe notebook data, and selection meets length It is required that mRNA sequence and lncRNA as experimental data;
2) each transcript sequence sample in experimental data is converted to k-mer sequences, wherein k is the natural number more than 0;
3) each frequency of k-mer sequence fragments type in the sequence in every sequence is counted, is then normalized, Find out frequency of each k-mer sequence appeared in the k-mers sequences in every sequence, the frequency of k-mers in every sequence The sum of rate is 1;
4) k-mers frequencies are configured to matrix form, as the input of convolutional neural networks model convolutional layer, then led to respectively Convolution computation layer, pond computation layer and the full articulamentum using softmax functions as activation primitive are crossed, convolutional Neural net is built Network model framework;
5) experimental data is divided into model training sample data set and model measurement sample data set, utilizes model training sample number It is trained according to set pair convolutional neural networks model, obtains disaggregated model;
6) by adjusting the parameter of convolutional neural networks model and k values, optimize convolutional neural networks model, and utilize model measurement Sample data set verifies classification accuracy, to Accurate Prediction mRNA and lncRNA sequence.
2. the supervised learning method according to claim 1 for differentiating mRNA and lncRNA, it is characterised in that:The step 1) in choose meet length requirement mRNA sequence and lncRNA as experimental data the specific steps are:The manned mRNA under 2000~10000 sequence pair sequence lengths are randomly selected in transcription notebook data and lncRNA transcription notebook datas to be analyzed, really Determine the length range of lncRNA and mRNA, then meets in mRNA manned under transcription notebook data and lncRNA transcription notebook datas MRNA sequence and lncRNA are randomly selected in the data of the length range of lncRNA and mRNA as experimental data.
3. the supervised learning method according to claim 1 for differentiating mRNA and lncRNA, it is characterised in that:The step 4) first layer of convolutional neural networks model uses 32 sizes for the convolution kernel of 3*3 in, choose Relu activation primitives, with 0 into The mode of row periphery filling ensures that the size of matrix before and after convolutional calculation is constant;The second layer uses 64 sizes for the convolution of 3*3 Core chooses Relu activation primitives;Third layer is maximum pond layer, and pond area size is 2*2, in pond layer and full articulamentum it Between with the connection of 0.25 probability Dropout partial nerve members;Last layer is full articulamentum, using 128 neurons, with pond Change and the connection with 0.5 probability between full articulamentum and output layer neuron is taken to carry out Dropout after layer connects entirely, finally Classification results are obtained using softmax functions as activation primitive.
4. the supervised learning method according to claim 1 for differentiating mRNA and lncRNA, it is characterised in that:The step 5) quantity of model training sample data set is no less than 10000 in, and the quantity of model measurement sample data set is no less than 1000 Item.
5. the supervised learning method according to claim 1 for differentiating mRNA and lncRNA, it is characterised in that:The step 2) value of k is 1,2,3 in.
6. the supervised learning method according to claim 2 for differentiating mRNA and lncRNA, it is characterised in that:It is described The length range of lncRNA and mRNA is respectively 250nt~3500nt and 200nt~4000nt.
CN201810449074.1A 2018-05-11 2018-05-11 Supervised learning method for identifying mRNA and lncRNA Active CN108595913B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810449074.1A CN108595913B (en) 2018-05-11 2018-05-11 Supervised learning method for identifying mRNA and lncRNA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810449074.1A CN108595913B (en) 2018-05-11 2018-05-11 Supervised learning method for identifying mRNA and lncRNA

Publications (2)

Publication Number Publication Date
CN108595913A true CN108595913A (en) 2018-09-28
CN108595913B CN108595913B (en) 2021-07-06

Family

ID=63637233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810449074.1A Active CN108595913B (en) 2018-05-11 2018-05-11 Supervised learning method for identifying mRNA and lncRNA

Country Status (1)

Country Link
CN (1) CN108595913B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448795A (en) * 2018-11-12 2019-03-08 山东农业大学 The recognition methods of circRNA a kind of and device
CN109559781A (en) * 2018-10-24 2019-04-02 成都信息工程大学 A kind of two-way LSTM and CNN model that prediction DNA- protein combines
CN109903813A (en) * 2019-01-23 2019-06-18 华东师范大学 A kind of gene order similarity calculating method based on multiple k values
CN110189797A (en) * 2019-06-17 2019-08-30 福建师范大学 A kind of sequence errors number prediction technique based on DBN
CN110428870A (en) * 2019-08-08 2019-11-08 苏州泓迅生物科技股份有限公司 A kind of method and its application of prediction heavy chain of antibody light chain pairing probability
CN110942805A (en) * 2019-12-11 2020-03-31 云南大学 Insulator element prediction system based on semi-supervised deep learning
CN111223522A (en) * 2020-01-06 2020-06-02 西安理工大学 Method for identifying lncRNA based on fuzzy k-mer utilization rate
CN111933217A (en) * 2020-06-17 2020-11-13 西安电子科技大学 DNA (deoxyribonucleic acid) motif length prediction method and prediction system based on deep learning
CN112711907A (en) * 2020-12-29 2021-04-27 浙江大学 Energy consumption-based manufacturing equipment yield analysis method
CN112786112A (en) * 2021-01-19 2021-05-11 中山大学 Prediction method and system for combination of lncRNA and target DNA
CN113658643A (en) * 2021-07-22 2021-11-16 西安理工大学 Prediction method for lncRNA and mRNA based on attention mechanism
CN113808671A (en) * 2021-08-30 2021-12-17 西安理工大学 Method for distinguishing coding ribonucleic acid from non-coding ribonucleic acid based on deep learning
CN113851190A (en) * 2021-11-01 2021-12-28 四川大学华西医院 Heterogeneous mRNA sequence optimization method
CN115579062A (en) * 2022-11-17 2023-01-06 南京腾鸿医疗科技有限公司 Specific promoter expression information prediction method based on convolutional neural network
CN116417068A (en) * 2023-02-03 2023-07-11 中国人民解放军军事科学院军事医学研究院 Method, system and device for predicting laboratory source of engineering nucleic acid sequence based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332064A (en) * 2011-10-07 2012-01-25 吉林大学 Biological species identification method based on genetic barcode
WO2013138727A1 (en) * 2012-03-15 2013-09-19 Sabiosciences Corp. Method, kit and array for biomarker validation and clinical use
CN106778079A (en) * 2016-11-22 2017-05-31 重庆邮电大学 A kind of DNA sequence dna k mer frequency statistics methods based on MapReduce

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332064A (en) * 2011-10-07 2012-01-25 吉林大学 Biological species identification method based on genetic barcode
WO2013138727A1 (en) * 2012-03-15 2013-09-19 Sabiosciences Corp. Method, kit and array for biomarker validation and clinical use
CN106778079A (en) * 2016-11-22 2017-05-31 重庆邮电大学 A kind of DNA sequence dna k mer frequency statistics methods based on MapReduce

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559781A (en) * 2018-10-24 2019-04-02 成都信息工程大学 A kind of two-way LSTM and CNN model that prediction DNA- protein combines
CN109448795A (en) * 2018-11-12 2019-03-08 山东农业大学 The recognition methods of circRNA a kind of and device
CN109448795B (en) * 2018-11-12 2021-04-16 山东农业大学 Method and device for recognizing circRNA
CN109903813A (en) * 2019-01-23 2019-06-18 华东师范大学 A kind of gene order similarity calculating method based on multiple k values
CN109903813B (en) * 2019-01-23 2023-03-31 华东师范大学 Gene sequence similarity calculation method based on multiple k values
CN110189797B (en) * 2019-06-17 2022-10-21 福建师范大学 Sequence error number prediction method based on DBN
CN110189797A (en) * 2019-06-17 2019-08-30 福建师范大学 A kind of sequence errors number prediction technique based on DBN
CN110428870A (en) * 2019-08-08 2019-11-08 苏州泓迅生物科技股份有限公司 A kind of method and its application of prediction heavy chain of antibody light chain pairing probability
CN110428870B (en) * 2019-08-08 2023-03-21 苏州泓迅生物科技股份有限公司 Method for predicting antibody heavy chain and light chain pairing probability and application thereof
CN110942805A (en) * 2019-12-11 2020-03-31 云南大学 Insulator element prediction system based on semi-supervised deep learning
CN111223522B (en) * 2020-01-06 2023-04-28 西安理工大学 Method for identifying lncRNA based on fuzzy k-mer utilization rate
CN111223522A (en) * 2020-01-06 2020-06-02 西安理工大学 Method for identifying lncRNA based on fuzzy k-mer utilization rate
CN111933217B (en) * 2020-06-17 2024-04-05 西安电子科技大学 DNA motif length prediction method and prediction system based on deep learning
CN111933217A (en) * 2020-06-17 2020-11-13 西安电子科技大学 DNA (deoxyribonucleic acid) motif length prediction method and prediction system based on deep learning
CN112711907A (en) * 2020-12-29 2021-04-27 浙江大学 Energy consumption-based manufacturing equipment yield analysis method
CN112786112A (en) * 2021-01-19 2021-05-11 中山大学 Prediction method and system for combination of lncRNA and target DNA
CN112786112B (en) * 2021-01-19 2023-10-20 中山大学 Method and system for predicting combination of lncRNA and target DNA
CN113658643A (en) * 2021-07-22 2021-11-16 西安理工大学 Prediction method for lncRNA and mRNA based on attention mechanism
CN113658643B (en) * 2021-07-22 2024-02-13 西安理工大学 Method for predicting lncRNA and mRNA based on attention mechanism
CN113808671B (en) * 2021-08-30 2024-02-06 西安理工大学 Method for distinguishing coding ribonucleic acid from non-coding ribonucleic acid based on deep learning
CN113808671A (en) * 2021-08-30 2021-12-17 西安理工大学 Method for distinguishing coding ribonucleic acid from non-coding ribonucleic acid based on deep learning
CN113851190A (en) * 2021-11-01 2021-12-28 四川大学华西医院 Heterogeneous mRNA sequence optimization method
CN113851190B (en) * 2021-11-01 2023-07-21 四川大学华西医院 Heterogeneous mRNA sequence optimization method
CN115579062A (en) * 2022-11-17 2023-01-06 南京腾鸿医疗科技有限公司 Specific promoter expression information prediction method based on convolutional neural network
CN116417068A (en) * 2023-02-03 2023-07-11 中国人民解放军军事科学院军事医学研究院 Method, system and device for predicting laboratory source of engineering nucleic acid sequence based on deep learning
CN116417068B (en) * 2023-02-03 2024-01-16 中国人民解放军军事科学院军事医学研究院 Method, system and device for predicting laboratory source of engineering nucleic acid sequence based on deep learning

Also Published As

Publication number Publication date
CN108595913B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN108595913A (en) Differentiate the supervised learning method of mRNA and lncRNA
CN111798921B (en) RNA binding protein prediction method and device based on multi-scale attention convolution neural network
US20060230018A1 (en) Mahalanobis distance genetic algorithm (MDGA) method and system
CN112232413B (en) High-dimensional data feature selection method based on graph neural network and spectral clustering
CN106202952A (en) A kind of Parkinson disease diagnostic method based on machine learning
CN111785328B (en) Coronavirus sequence identification method based on gated cyclic unit neural network
CN105808976B (en) A kind of miRNA microRNA target prediction methods based on recommended models
CN111462820A (en) Non-coding RNA prediction method based on feature screening and integration algorithm
Sapkota et al. Data summarization using clustering and classification: Spectral clustering combined with k-means using nfph
Benso et al. A cDNA microarray gene expression data classifier for clinical diagnostics based on graph theory
CN108776919B (en) Article recommendation method for constructing information core based on clustering and evolutionary algorithm
CN106548041A (en) A kind of tumour key gene recognition methods based on prior information and parallel binary particle swarm optimization
CN110853756A (en) Esophagus cancer risk prediction method based on SOM neural network and SVM
CN106951728B (en) Tumor key gene identification method based on particle swarm optimization and scoring criterion
CN104376234B (en) promoter recognition method and system
CN113420291B (en) Intrusion detection feature selection method based on weight integration
CN112215278B (en) Multi-dimensional data feature selection method combining genetic algorithm and dragonfly algorithm
CN113241114A (en) LncRNA-protein interaction prediction method based on graph convolution neural network
CN117272025A (en) High-dimensional data feature selection method based on fuzzy competition particle swarm multi-objective optimization
CN112256209A (en) Parameter configuration optimization method and optimization system of cloud storage system
CN105651941B (en) A kind of cigarette sense organ intelligent evaluation system based on decomposition aggregation strategy
Gil et al. Fusion of feature selection methods in gene recognition
CN110739028B (en) Cell line drug response prediction method based on K-nearest neighbor constraint matrix decomposition
CN111383716A (en) Method and device for screening gene pairs, computer equipment and storage medium
CN112580606B (en) Large-scale human body behavior identification method based on clustering grouping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant