CN107609351A - A kind of method based on convolutional neural networks prediction pseudouridine decorating site - Google Patents

A kind of method based on convolutional neural networks prediction pseudouridine decorating site Download PDF

Info

Publication number
CN107609351A
CN107609351A CN201710989588.1A CN201710989588A CN107609351A CN 107609351 A CN107609351 A CN 107609351A CN 201710989588 A CN201710989588 A CN 201710989588A CN 107609351 A CN107609351 A CN 107609351A
Authority
CN
China
Prior art keywords
sequence
pseudouridine
convolutional neural
neural networks
nucleotides
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710989588.1A
Other languages
Chinese (zh)
Inventor
樊永显
李永贞
杨辉华
蔡国永
张向文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201710989588.1A priority Critical patent/CN107609351A/en
Publication of CN107609351A publication Critical patent/CN107609351A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of method based on convolutional neural networks prediction pseudouridine decorating site, it is characterized in that, comprise the following steps:1)Data set arranges and conversion;2)Model construction and training convolutional neural networks model;3)Treat forecasting sequence interception and coding;4)Feature extraction and prediction.This method can improve the accuracy rate of pseudouridine site estimation, pseudouridine site estimation is preferably extended to application.

Description

A kind of method based on convolutional neural networks prediction pseudouridine decorating site
Technical field
The present invention relates to gene in RNA sequence during transcription pseudouridine decorating site Predicting Technique, specifically one Method of the kind based on convolutional neural networks prediction pseudouridine decorating site.
Background technology
Gene is during transcription, phenomenon that many RNA are modified.Up to now, it has been found that 100 A variety of RNA modification.With the method for chemistry, the RNA modifications of these covalent atoms have been investigated for 12 years or so, some this There are many positions in a life in kind modification, they influence RNA two level and tertiary structure, influence the speed of gene expression And precision, it is able to maintain that RNA stability, the generation for helping RNA to be correctly decoded on ribosomes, preventing some diseases etc. just It is often significant in terms of traveling biological function.
In this more than hundred kinds modification, pseudouridine is first and is found, and be so far quantity it is most one Kind RNA modifications.Widely known pseudouridine modification at present is present in some non-coding RNAs such as tRNA, rRNA, sRNA, later Thomas M.Carlile et al. are had found by the method for high-flux sequence, are also deposited on the mRNA in the mankind and yeast cell Modification in pseudouridine site.Pseudouridine position is the isomers of uracil, is under certain condition by the transfer shape of covalent bond Into.Such as in eucaryote, the process of pseudouridineization is mainly to pass through box H/ACA RNPs catalytic action, box H/ Each hair clip of ACA RNAs has two bulge loops, it by identifying specific RNA sequence, and below at the structure of bulge loop therewith Base pair complementarity, then by the catalysis of certain enzyme, the uracil on the unpaired place right side in bulge loop top is acted on, makes uracil Chemical constitution rotate 180 ° by axle of the line of position 3 and 6, then phosphoric acid C5 is rotated clockwise to bottom, so and ribose Connected original C-N keys become for C-C keys, form pseudouridine.
Pseudouridine can change RNA structure, increase base stacking, improve base pairing, fixed ribose-phosphate backbone. The bone marrow exhaustion syndrome keratosis of neurogenic disease and x linkages is directly or indirectly related to Parkinson etc. for it, by In its special structure and chemical property, and the meaning of its biology and medical science, the research in pseudouridine site is increasingly drawn Play the concern of people.This problem is identified for pseudouridine site, high throughput sequencing technologies are referred to as ψ-SEQ and are suggested (Carlile,T.M.et al.Pseudouridine profiling reveals regulated mRNA Pseudouridylation in yeast and human cells.Nature 515,143 (2014)), to some species ψ Site has carried out comprehensive, high-resolution mapping, and to determine pseudouridine site, but this technology is that genome sequence is surveyed Sequence, cost is huge, and the consuming time is long, and can be more and more difficult with the increase sequencing of sequence length.Therefore, compel to be essential The information in some more easily computerized algorithm extraction pseudouridine sites is developed, then loci is predicted.
Li at present, Y et al. (Li, Y.H., Zhang, G.&Cui, Q.PPUS:a web server to predict PUS- Specific pseudouridine sites.Bioinformatics 31,3362 (2015)) and Chen W et al. (Wei, C.,Hua,T.,Jing,Y.,Hao,L.&Chou,K.C.iRNA-PseU:Identifying RNA pseudouridine Sites.Molecular Therapy Nucleic Acids 5, e332 (2016)) et al. by being cut to gene order Take, then sequence is encoded, Chen W add the physicochemical properties of nucleotides in coding, finally use LIBSVM again Algorithm carries out feature extraction and classification, and to determine pseudouridine site, but LIBSVM algorithms carry out the accurate of feature extraction and classification Rate has much room for improvement, in order to more accurately predict pseudouridine site, it is necessary to which the algorithm of higher efficiency carries out sequence signature extraction.
The content of the invention
Mesh of the present invention is in view of the shortcomings of the prior art, and to provide a kind of based on convolutional neural networks prediction pseudouridine modification The method in site.This method can improve the accuracy rate of pseudouridine site estimation, pseudouridine site estimation is preferably extended to Using.
Realizing the technical scheme of the object of the invention is:
A kind of method based on convolutional neural networks prediction pseudouridine decorating site, comprises the following steps:
1) data set is arranged and changed:Choose Wei, C., Hua, T., Jing, Y., Hao, L.&Chou, K.C.iRNA- PseU:Identifying RNA pseudouridine sites.Molecular Therapy Nucleic Acids 5, The yeast being made up of the positive sample containing pseudouridine site and the negative sample without pseudouridine site in e332-2016 papers These data sets are encoded by bacterium, the data set of three species of people and house mouse, by each in people and house mouse data set Sample is converted into the matrix of 20 × 20 sizes, and saccharomycete data set sample is converted into the matrix of 20 × 30 sizes;
2) model construction and training convolutional neural networks model:Build convolutional neural networks (Convolutional Neural Network, abbreviation CNN) structure, we will be converted into the positive negative sample of matrix as the defeated of CNN in step 1) Enter, while meet the harmony of positive negative sample, adjust the CNN number of plies and the number of convolution kernel and size, then utilize adjustment Good CNN structures carry out feature extraction to data set sequence, train a model for including characteristic vector;
3) forecasting sequence interception and coding are treated:It is FASTA forms that the whole piece sequence of required detection, which is arranged, i.e. first trip First character for '>', behind explanation of the addition to sequence, next behavior sequence to be predicted, with the data of same step 1) The sliding window of collection sample equal length is treated forecasting sequence and intercepted, the sequence form and data set sample form phase of interception Together, and by the sequence of interception the matrix form being converted into step 1);
4) feature extraction and prediction:Inputted the transformation result of step 3) as forecast set, it is special using convolutional neural networks After sign extraction, the convolutional neural networks model trained according to step 2) is predicted to list entries, then pre- to treating The direction sliding window at sequencing row end, the interception conversion to sequence and step 4) in repetitive cycling step 3), until whole piece sequence The end of row, the pseudouridine site predicted is finally given.
Being encoded to described in step 1):Shared an A, U, G, tetra- kinds of ribonucleotides of C, arbitrarily successively takes in RNA sequence Two are one group, and one shares 16 kinds of combinations, then carry out 16 dimension displacement codings, and every a pair of combinations can all be encoded as one The column vector of 16 dimensions, for a sample sequence, from left to right takes two adjacent nucleotide codings, then moves to right a nucleosides Acid, take two nucleotides of rear adjacent to carry out displacement coding, repeat such operation and encoded, to the last a nucleosides Acid, understood according to such coded system, two neighboring nucleotides can be converted to the column vector of one 16 dimension, it is simple this Sample encodes or inadequate, for more accurately converting characteristic, also needs the chemical property plus nucleotides, nucleotides is chemically Matter is shown in Table 1, with the 17th dimension represent it is two neighboring in first nucleotides loop configuration, purine with numeral ' 1 ' represent, pyrimidine use Numeral ' 0 ' represents;18th dimension represent it is two neighboring in first nucleotides functional group, amino with numeral ' 1 ' represent, ketone group use Numeral ' 0 ' represents;19th dimension represent it is two neighboring in the pairing of first nucleotide complementary when hydrogen bond power, strong numeral ' 1 ' Represent, it is weak to be represented with numeral ' 0 ';20th dimension table show with it is two neighboring in first nucleotide type identical nucleotides account for sample The middle ratio removed after last nucleotides;For a sample sequence being made up of L+R+1 nucleotides, changed after coding As a matrix, the matrix size is 20 × (L+R),
The chemical property of the ribonucleotide of table 1
The application being extracted in using convolutional neural networks progress sequence signature in pseudouridine site estimation.
This method is extracted and predicted to sequence signature using convolutional neural networks algorithm in deep learning.
The beneficial effect of this method is:Pseudouridine plays an important roll in terms of normal biological function is travelled, because We need accurately to predict pseudouridine site for this, and convolutional neural networks have implying for the mining data for being capable of automatic depth The characteristics of feature, compared with prior art used SVMs (Support Vector Machine, SVM) algorithm, energy Enough more preferable abstraction sequence features, and then improve the accuracy rate of pseudouridine site estimation.
This method can improve the accuracy rate of pseudouridine site estimation, and pseudouridine site estimation is preferably extended to should With.
Brief description of the drawings
Fig. 1 is the method flow schematic diagram of embodiment;
Fig. 2 is the forming process schematic diagram in pseudouridine site in embodiment;
Fig. 3 is the displacement coded system schematic diagram of sequence in embodiment;
Fig. 4 is the structural representation of the CNN by taking species people as an example in embodiment.
Embodiment
Connect and present invention is further elaborated below with drawings and examples, but be not limitation of the invention.
Embodiment:
Pseudouridine is the isomer of uracil, and it is during rna transcription, by the catalytic action of enzyme, such as Shown in Fig. 2, the chemical constitution of uracil is set to rotate 180 ° by axle of the line of position 3 and 6, then phosphoric acid C5 is rotated clockwise to most Below, the original C-N being so connected with ribose becomes for C-C keys, forms pseudouridine.
A kind of reference picture 1, method based on convolutional neural networks prediction pseudouridine decorating site, comprises the following steps:
1) data set is arranged and changed:Choose Wei, C., Hua, T., Jing, Y., Hao, L.&Chou, K.C.iRNA- PseU:Identifying RNA pseudouridine sites.Molecular Therapy Nucleic Acids 5, The yeast being made up of the positive sample containing pseudouridine site and the negative sample without pseudouridine site in e332-2016 papers These data sets are encoded by bacterium, the data set of three species of people and house mouse, by each in people and house mouse data set Sample is converted into the matrix of 20 × 20 sizes, and saccharomycete data set sample is converted into the matrix of 20 × 30 sizes;
Specifically coded system is:Displacement coding is first carried out, as shown in figure 3, shared an A, U, G, tetra- kinds of C in RNA sequence Ribonucleotide, it is one group arbitrarily successively to take two, and one shares 16 kinds of combinations, then carries out 16 dimension displacement codings, each The column vector of one 16 dimension can be all encoded as to combination, for a sample sequence, from left to right takes two adjacent nucleosides Acid encoding, then move to right a nucleotides, take two nucleotides of rear adjacent to carry out displacement coding, repeat as operate into Row coding, to the last a nucleotides, understands that two neighboring nucleotides can be converted to one according to such coded system The column vectors of individual 16 dimension, simple so coding or inadequate, for more accurately converting characteristic, also need plus nucleotides Chemical property, the chemical property of nucleotides are shown in Table 1, with the 17th dimension represent it is two neighboring in first nucleotides loop configuration, Purine represents that pyrimidine is represented with numeral ' 0 ' with numeral ' 1 ';18th dimension represent it is two neighboring in first nucleotides functional group, Amino represents that ketone group is represented with numeral ' 0 ' with numeral ' 1 ';19th dimension represent it is two neighboring in the pairing of first nucleotide complementary When hydrogen bond power, it is strong to be represented with numeral ' 1 ', it is weak to be represented with numeral ' 0 ';20th dimension table show with it is two neighboring in first nucleosides The ratio that acids type identical nucleotides is accounted in sample after removing last nucleotides;Such as the coding knot of sequence ' AGAUCU ' Fruit R (AGAUCU) is as shown in formula (1):
The chemical property of the ribonucleotide of table 1
2) model construction and training convolutional neural networks model:The structure of convolutional neural networks is built, we are by step 1) In be converted into input of the positive negative sample of matrix as CNN, while meet the harmony of positive negative sample, adjust the CNN number of plies with And the number and size of convolution kernel, as shown in figure 4, the structure for the convolutional neural networks that the species people provided adjusts, Ran Houli Feature extraction is carried out to data set sequence with the CNN structures adjusted, trains a model for including characteristic vector;
3) forecasting sequence interception and coding are treated:Using sliding window to whole piece sequence truncation to be predicted and coding, by institute It is FASTA forms to need the whole piece sequence that detects to arrange, i.e., first trip first character be '>', behind explanation of the addition to sequence Illustrate, next behavior sequence to be predicted, forecasting sequence is treated with the sliding window of the data set sample equal length of same step 1) Intercepted, the sequence form of interception is identical with data set sample form, so the interception way of sequence to be predicted is such as formula (2) It is shown, on the basis of the site U being predicted, take L and R nucleotides respectively with downstream at its upstream, the length for intercepting sequence is L + R+1 nucleotides,
S (U)=N-L N-(L-1)N-(L-2)...N-2N-1U N+1N+2N+(R-2)...N+(R-1)N+R(2),
According to the length of data set sample, if sequence to be predicted comes from species people and house mouse, we take L=R=10; If sequence to be predicted comes from species saccharomycete, we take L=R=15, and the sequence of interception is converted into the square in step 1) Formation formula;
4) feature extraction and prediction:Inputted the transformation result of step 3) as forecast set, it is special using convolutional neural networks After sign extraction, the convolutional neural networks model trained according to step 2) is predicted to list entries, then pre- to treating It is sequenced and arranges end direction sliding window, the interception conversion to sequence and step 4) in repetitive cycling step 3), until whole piece sequence End, the pseudouridine site finally predicted in whole piece sequence to be predicted.
Fig. 1 gives the step of pseudouridine site estimation based on convolutional neural networks, and we will enter to data set first Row arranges and code conversion, and sequence data collection is converted into matrix form;Secondly, building for convolutional neural networks model is carried out, Then the convolutional neural networks model that the matrix training being converted into using data set is put up;And then, cut using sliding window Sequence to be predicted is taken, then to the sequential coding of interception;Finally, after carrying out feature extraction using convolutional neural networks, based on instruction The model perfected is predicted to list entries.
Experimental example:
Three species are predicted using three independent test collection S (4), S (5), S (6):S (4), S (5), S (6) is respectively From species people, saccharomycete and house mouse, wherein, S (4), S (5) from paper (Wei, C., Hua, T., Jing, Y., Hao, L.&Chou,K.C.iRNA-PseU:Identifying RNA pseudouridine sites.Molecular Therapy Nucleic Acids 5, e332 (2016)), S (6) needs individually to construct according to the present embodiment method, S (4), S (5), S (6) respectively comprising 100 positive samples containing pseudouridine site and 100 negative samples for being free of site, prediction result such as table 2 It is shown:
Table 2:The present embodiment method and the prediction result of only two prediction devices contrast
As can be seen from Table 2, predicted using the present embodiment method, its prediction result shows, CNN is substantially better than the current world It is upper only based on SVMs (Support vector machine abbreviations:SVM) two fallout predictor PPUS of algorithm and iRNA-PseU。
Sequence table
<110>Guilin Electronic Science and Technology Univ.
<120>A kind of method based on convolutional neural networks prediction pseudouridine decorating site
<141> 2017-10-20
<160> 3
<170> SIPOSequenceListing 1.0
<210> 2
<211> 6200
<212> RNA
<213> Saccharomyces cerevisiae
<400> 2
cuaucaucgc ugaucuccca cucccugauc ugaagagguc aucgguucga uuccgguugc 60
guguaagaug caagaguucg aaucucuuag caagcgaaag auuagaaauc uuuugggcuu 120
ugccgguuaa ggcgaaagau uagaaaucuu uuggguuuag gaccgagcuu uuaguggaug 180
ucaucaggac acuucugaug uuucaaaaga uauuccaggu acuggacgag aaucgcagaa 240
caauuugacg uagauguuug uuguucaccc acaacugaag aguugucgag uuuuuugagg 300
uuaagaauga aaggucgaaa aaguuucagg caguuucuca gcguugggcc cccgguucga 360
uuccgggcuu gcugguaaaa uccaacguug ccaucguugg gccuaagcgc aagugguuua 420
gugguaaaau ccaagguuaa ggcgaaagau uagaaaucuu uuggggcgaa agauuagaaa 480
ucuuuugggc uuugccggcu ucauuaacau guacuucaac uacggaagug gagaucaucg 540
guucaaaucc gauuggaauu ugguuuucaa guguaauagg cuacgugauc agugguucaa 600
gacgucgccu uuacacggcg uagugguuau cacuuucggu uuugauccgg acacuuucgg 660
uuuugauccg gacaaccccg guaauugauc uauguuguag cugcgcuggc ggcaacucca 720
guucuuuauc uucuuucucc gcuggcgucu gacuucuaau cagaagauua uggguucuuc 780
cgugauaguu uaauggucag aaugggcaga augggcgcuu gucgcgugcc agaucgggug 840
ccagaucggg guucaauucc ccgucgcgag aaaaagccaa ugaugagaua caagccauua 900
ucgacauaug cugguuacau ggcaguagaa gaauauacau ucuauuaucg aaccuggcca 960
ugaaacaaga uuucuguagc auacucgcuu cauacuuguu uucuuuuuug ugccuuuguu 1020
acguugcuuu guggaaguuc gaaacuccaa aguaugagug auggaagugu aguuauccgg 1080
agaucagggu caaaucuucg uugaccguca auuacaugca gcacaaauuu guagacaggc 1140
ugguuugagg auuacuugga cauuaacggu ucuccuauuc aagacaaaag uguucuuuca 1200
ucugcagugu uggcguacag auuguaguug uggcugcuac cuuuuuuaau guccguuucu 1260
augauugggc uauuguucga agguaaugcc uugaucagaa gacuguuggu ccuuaguucg 1320
auccugagug cgagcagcag auugcaaauc uguugguccu uaguuuaucc gauauagugu 1380
aacggcuauc acauccgugg agaccggggu ucgacucccc guaucgggua uguuauuuau 1440
guaacgggua ugcgaacauu cuuuuuuuga uguaauagga uaagcuugcu guucuuuuca 1500
guguaacaac ugaaaugacu guaguaucug uucuuuucag uguaacaacu guguaguauc 1560
uguucuuuuc aguguaacaa caaguguagu aucuguucuu uucaguguaa cacaagugua 1620
guaucuguuc uuuucagugu aacaucaagu guaguaucug uucuuuucag uguaucauug 1680
uucuuggauu ucaaugggug cugucuaaau uucgccacug uagaugaaga agacgaaaaa 1740
ugagaagagu guagauguau uauccuucca agauagacua uguaauggua aagaacauau 1800
ggcggcgggu gccuuuggag cagcaaucga uggugugguc acuguaagag auuggcccca 1860
ccauggacga gccuguagua uacaacggua aacaaagguc uuccuaugau uccggcguuc 1920
gucuuucuca uacccuguag accagaccuc ucuagaauac uuugaagguu uaaccgagga 1980
aaugcgugga gaccgggguu cgacuccccg uaucguuauc cgauauagug uaacggcuau 2040
cacaucggac acuucugaug uuucaaaaga uauuccauaa cugugggaau acucagguau 2100
cguaagaugu aagaugcaag aguucgaauc ucuuagcaaa caauuuucac aguuuaaggc 2160
caagaacaag gcccguuuac acauuuugau acaaccguag acgggagguc ccggguucga 2220
gucccggcuc gcgauucucg cuuagggugc gggagguccc ggggcgugcg acuguuaauc 2280
gcaagaucgu gagucgcaag aucgugaguu caacccucac uggggcguug ggcccccggu 2340
ucgauuccgg gcuugcuggu aaaauccaac guugccaucg uugggcccua gcgcaagugg 2400
uuuaguggua aaauccaagg cuguguucuu cuuucuaaau ucccuaucgg gaaaaacccg 2460
uugcuagaag cgcaacuggu gaaaaaaguu cagaauugca gaaaaguggu gagugguuuc 2520
cuaguguauc agccacuauc ggcauaaggu uagggguucg agcccccuac agggcaaucg 2580
guagcgcgua ugacucuuaa ucauaaacaa aagaagcugu uccagagagc ccaagccgga 2640
caaccccggu ucgaauccgg guaggacacu uucgguuuug auccggacaa ccccgguggu 2700
uaucacuuuc gguuuugauc cggacaacua gugguuauca cuuucgguuu ugauccggau 2760
guugccgcua aguguaagga agucgguauc cugguauauu cuauauacuc acuuauuacu 2820
uuucugguau auucuauaua cucacuuauu acaauggcuc uuuuuguuau ucgaaagcuu 2880
acauaaaaag uucggcuauc ucuugggcuc ugccucugcc cgcgcugguu caaauccugc 2940
uggugcaugg augauauuug uaguauggcg gaaaacgugg agaucaucgg uucaaauccg 3000
auuggaaaua cuauucaguu ucucagauau agguugcagc aauuggaaaa aucuauuaac 3060
ccagaugaac cagugcgucu acuauuacuc ggccaaauau ucguaauuug agaucucugc 3120
aaaacaaugc aaaacaaugc accuccuggc aaaaacauca auaaacauca augucaauug 3180
uuugaacguc aaugaacguc aauucuuguu cguuguccgc aagcaauuaa uauggcuugu 3240
aauggaaaca agcaaaaaca agcaagaucu ucccauaccg uuucccaccg uuuccccugc 3300
auguagaaug caacgauauc aauguuuaau cauaacagau caaagagcau caaagagcag 3360
ugguacuaca gaugcgucaa gguacgcaua agcgugaacc ccggucgacg ccggucgacg 3420
auacauacag agcuguuaca auauagcaaa ggacaguaga aaccugagua auccugagua 3480
auggaucuuu gaaugauauu aacugauauu aacgaaaaug aagagcucca aaaugcucca 3540
aaauuuccau agaaaaauca gcgaauuccc caaggaaaaa uagcgaaacc agaaagguua 3600
augcgggaag auuacauugc cuugaaaugc cuugaaacaa ccuccaagcu ugggagauga 3660
ggagaucucg ucguuuaaga accaagucaa accaagucau ucgguaacaa guuccaagac 3720
guuccaagac auuacugucg aaccucaauc ccguagguaa aguguauuua gugagggaac 3780
gccgcgacaa gugaucaucc auuuauugug acauauugug acacuguauc auuccuuuca 3840
aaccaaacau auuacugcau caaucugguc augucugguc augucaugcu uucugacuuu 3900
gauuuaagau acaaaaauuu guucagaugg auucagaugg auucagaacu aauuccuuug 3960
uugguacuug gcuguacucc auuuaaagga gauaauucac gucaaauuuc cacaugauaa 4020
ggaaguuucg ggaaguuucg aagaauugua aagaccugau acuucuucaa aaaaguucag 4080
uggucguucu uaccccccuc uaauaccugc auuaaaugau aacuccuuuu auauugucuu 4140
gcaauaaaca cccgaaacga ugaugaaauu gaugaggcug auccaggcug auccauucca 4200
ugauuuuaau ucuaugcuac ucugaaaauu auaccuacgg aaaaauuauc uuaugauaaa 4260
aauguaaaaa auuauuuaaa acgagaaagu gaaugaaaaa uauaauauca uauaauauca 4320
uuuauugucu gauaaugcug uacguaccau ccgcaucagu ggauauccaa ugauauccaa 4380
ugauaguaau uucgcgaguu uacgcgaguu uauccguugc uguuauauua ucauauauua 4440
ucacuuuuua auauucuuuu caaaggauuc cuuccgcaau ucuucugaaa uacugcucgc 4500
caguuuuuug uucuuccacg uaauccccuu auuaacggag auuugauuuc ucccagcacc 4560
gauucgagug aguacguuuu caaauauguu caaauaugcu uaaucugauc ucuucugcgg 4620
ccgaucugug ccauuauagu aagcagugcc aagcagugcc acuugucuaa uauaagauga 4680
ugauuuuacc guuuucuggg gacaucauga uacaucauga uaucauuugg uacauaauga 4740
acagauaauu ggauuucuug cauuuuuugc gauuuuuugc gauuauggcu uguugaccau 4800
ucacaaccau ucacaaaagu uggucuaaca uaauuuuaag uccuuguaau auucuagcuu 4860
uugagucucu gggaguggua aaucuacuga ccaucuucuu uuauccaauc aucuggcaag 4920
uccuuaauuu ucaucucuaa aauuuagaua uggacguuug uggacguuug agauauuuuc 4980
guauuucugc cuauuucugc caauucuucc uuuaacugug acuaacugug accguacuga 5040
uucgcuuucc cuugcuuucc cuuugaauuu uuauuauacc cucucauuac uugcuuaucu 5100
gaauuuuuuu ccauuuuguu ugccuauccu uccaucugau gacuugugau gacuugaaau 5160
guucugacag guaagauucu caacauucuu aauccaaacg auguccuucu ccugcuugug 5220
uauuaaagga caugaaauau uucgcuacau guaauggaag aucaucugua gauucguauc 5280
uguaguuucu caucagcaag aucuuucaaa aacgcuugau uugcuggcac cucuuaauag 5340
cgcuuguuuc ugcuugcucu acccucuagg ugaacguuua aucugacauc cgggaaguuu 5400
gaugugaagu auucugcuaa ccguucaggu ccuucaccug uuuguccaag ggaguaaggg 5460
gacuuucugg cuuuuuuuuu uacgaaacuc uuccucauca ucuucagccu caacauuuuc 5520
caaccgcaac uucuuguucu ugcuuaugcc cugcuuauug uggguugucc cgccauuauu 5580
cgccauuauu guuaauagau ucaacaaaau auuuaucauu gaaaauucac gugaucgcaa 5640
uagaucgcaa uauuccguca ggagugauaa auaucgucau ugcacaaauu aguuuauuau 5700
ucacuauuau ucacgacucu uaacaacgac aauuuuagac aggucguccg uagauauuua 5760
cauaaauacu acacagacua cuauuagaau uugcgaaaau uugcgaagga uuuaccgaag 5820
aaaagcacag aaaagcacag accuuauuga gcuuuugaau caauaaccag gaguuucaaa 5880
aacaaacagg cacuuuucau ugaucuauuu gauaaaucug ccacuagagu ccaaucuacg 5940
cgacuuauug caacuuauug cauuccuugg aaggugaaag ucuugcacga ugguccaguu 6000
auugaugaau uuuuauuugg ccauucaacu ucauaagugg ucgguaaggu accaggaaag 6060
uuucugaaac caucccaauc uuuacucuac uuauuaucca uugcaucccc cugaauaucu 6120
uauuuuagca uuagucaaca uuagucaaag aaaugaagcg guucguuuug guucguuuua 6180
uugauagaaa acaggacagu 6200
<210> 3
<211> 4200
<212> RNA
<213> Homo sapiens
<400> 3
gcuaaacagg uacugcuggg cuuauugagu gucuacugug uggauaaacu guuacgcaua 60
uauuugucgg uguuaacaaa auggucgggc cuaguucaaa ccuuuuuuuu aaguauacag 120
gggucuggcc ggucuguagc ggaucacuag cuaucgcuuc ucggccuuug aaaguaacuu 180
ugcccgagca cuauucuguu aaaaucagga gcagcugccu uuccaacagc ccaaaaugac 240
uuucguucuu cuuucagaua cuuacauagu uuuccgaauc aacuuugccg uguugacuca 300
aaguuacucu ccuuccuacc caccuuuccc agaaguggac aauauauuaa auggauugag 360
gacaauauau uaaauggagu guaguaucug uucuuaucaa aguguaguau cuguucuuau 420
ucaaguguag uaucuguucu uagaucaagu guaguaucug uucuucucgg ccuuuuggcu 480
aagaguaauc gcuucucggc cuuugaguaa ucgcuucucg gccuugaugu auuguuugca 540
cucuucauga uucuauuaua guauucuugu uuuuguauug uugcuccuuu cuuuuuuuug 600
gccuuucucg cuaaacaggu acugcugggc ccauuaucgc uucucggccu ucauuaucgc 660
uucucggccu uuuguaauau uuuaucccug gacuaguauc uguucuuauc aguuguagua 720
ucuguucuua ucagugugua guaucuguuc uuaucaaagu guaguaucug uucuuauuca 780
aguguaguau cuguucuuag aucaagugua guaucuguug uauugagugu cuacugugug 840
uuuucaucac uauggcuuag cgcaucaaaa cuucacuuuu ugauuggugg uauaguggug 900
agcgauaaaa ggcuaauauc cagagguccc ugguucgauc ccgggagacu gaagaucuaa 960
agguccggga gagcguuaga cugaagaugg gagagcguua gacugaagaa auccuuucua 1020
aauugcaugc auaaaaaguu uuuucuucag agaguaugga uuccgauaug aaagacauga 1080
auaagaacug augacuuuca auuaucugug ugagccuuuu cuuuguuugu aacuagccau 1140
cagguaagcc aagaucuucu cggccuuuug gcuaagaucc aucgcuucuc ggccuuuggg 1200
cccagggugc uguggagaau uguccuccuu cugaagcccc cuccuuuucu gaggaaggug 1260
auuggaacga uacagagaag aagacuauac uuucagggau cagcgcccca auuauuauga 1320
cuguaaguua uuuugcucuc acuggcaauu ugguuccacc acaucacuca auacuuaccu 1380
ggcaggcacu caauacuuac cuggcagcug gcugcuguag gucuuuucau uguugauauu 1440
ugcccagcag ggccucaguu agcucucaag ucccauggug uaaugguuag cguuagcacu 1500
cuggacuuug aaggacuuug aauccagcga uccgcgaucc gaguucaaau cucgcgaucc 1560
gaguucaaau cucggucauu uuauguauau uuaucaccuu uccaguuacu ccuuauauaa 1620
guuauuuugc ucucacuguc aaguguagua ucuguucuug guaggugagu uuaaagucuu 1680
cucuuaccug uuaaaaucag ggcaacagag uucaacuauc uccauuugcu guuacucugg 1740
agaucaagug uaguaucugu ucuuguaaaa ggguuacucu cauacuuuua uuauuuggau 1800
gaauaucuuc ucggccuuuu ggcuaagaac uaucgcuucu cggccuuuaa acuaucgcuu 1860
cucggccuuc ccuggagguu ccaauccugc uucuccauga uucgugcauc ucuaauuaug 1920
cuggacuguu uuauuggaac gauacagaga agaauauuuc ucauuucuuu uaguuauacu 1980
aaaauuggaa cgauaauugg aacgauacag agaagaacac gcaaauucgu gaagcguaag 2040
uguaguaucu guucuuauuc aaguguagua ucuguucuuu caaguguagu aucuguucuu 2100
gugauauaac ucaguggcag aggccuugga uuucaucccc agggagaggg agugggaaca 2160
ggauuugcaa gacuccuagu accuugugua gcaauggugu ccaggaguaa caaguucagg 2220
uucaccgcaa agucacucua uucugauccc aaagguuuac uuaauguuua gguuccuguu 2280
gcuugccauc uaagagguuu guuguccuau uggaagucuu uuccuuuaaa gucucuuagc 2340
aucagacacu uaagagagag aaugagaauc aucguggaau gaauagacuu aacugucagg 2400
aggcugucuu acguacacaa uugcaugugg aagcugcaau aacucauucc uacagcccca 2460
caaacgguuu aagcuugagu cacaauaauc aucauuucau uccuucaaau aaaaaaaaau 2520
cauuucugaa uucagaugua ucuaucauag uuggguuuaa gaaucagaac auuggguaua 2580
uuccaccaug gugucuggga gcacacauua ccccucccuu cccgcaccaa cgaucugcuu 2640
gugaacagag cuuuagucca gagcaagccc ccgccuuuuu uucuguugua aauuuuguua 2700
ugcaauuaau uuagaggaau agggaaagug gacgugucug uuguuucuca aggguccgga 2760
cuguuugaca cugaugaaug cuuucucaaa aguuuaaaca guuucauuug gaaguagggu 2820
cgccuuaagu caacaucaca gaugcuccag caggcaacca uauguuuaga aauaaaacca 2880
gccgcggugc cagcaaagaa cagacacauu acuugaacuu guucugaguu cuacugucuu 2940
acccaaaugc ucggaaacuc ucuuaugacu gugacuucag aaaaagaagg auuccaaaga 3000
caaacucaaa uucuuagaug accaaggcag acaguaggaa gaguaaugga aauccuuuug 3060
uuuuguuguu cuguuguugu caagugcaaa aauauaauuu guugaauaug ugugcuucug 3120
uccuacuaca uuucuuccau uuuuaauuaa aaaguagagc uaggacccac ucuuguuccu 3180
guacucacug uaggacccca ccuaaaagua uaauccugag aguucacgcu gagccuuuuc 3240
ucucucuucc ugaaaacuga aguguuccca aagcuaugug uaaagguuug guucucaucu 3300
cucucucucu cucucucuug uaggugggua guaggugagc agcugggagu uaaauacucu 3360
guggaaccuc ucuaguuaaa aguaaccagu cugugggaag uaaaagcaac auucccugcu 3420
ggaggcucca ggauccuaag ggacgucugu acucuaaggg gacauuuaaa uugcaucucc 3480
cucauuaaau gaugacugau gcuacuaugu uuaaacauug gauuuaacgu uuauuucauu 3540
guuuuuauuu cacugugggu cugggcuuua agacccucau uuuagcugcc uagccuucag 3600
augaaggggg ggucucugcu aauuauacau cuggaguuca gccuucagaa cuugucagcc 3660
acccuacccu acuuggacca ugucuugaaa agacaagugg uugacuuugg guuucuuaug 3720
uguuuguuug uuuguuuguu uguuuugcuc cugacaccac cacccucuuu uaaguagauu 3780
gugaccagaa uaguaacuaa aauguugaau uuauuugcuu aacaaaugug gcucuaaauu 3840
uuaaggauca uuaugaaaga ugaauagcuc cccuuucucu gcuugugaac acguaugcca 3900
auggacucug cucccguguu acagugugac cuaacuuugg auacuuuuuc cucuauaguu 3960
aaccacauua auuucaaaau ugcagagaaa uggaucacuu ugcaucagua gggcugguaa 4020
auugaaauac uggaccauca cauauuuccu ggugcuucuu uguuuauuca uuuggcuauu 4080
ccauuguucc uguaccauca aucuuucuca guuugugaac augagcucuu gagauucauu 4140
caggaggucu cagaacacua aggcuuuauu gucuccuaau cuuaacucuu ggggcuggua 4200
<210> 3
<211> 4200
<212> RNA
<213> Mus musculus
<400> 3
gacucugcug uuccaaagga caacccagaa uuauuauuuc uuauucuugg uuuuuuuuuu 60
cucauguacu uuguaguggu uuaucugccu uuguuugauc ugagcuauuc uuauauuugu 120
uuuuuagcuu cugggguuug ugauucuucu gcacugcugu cccgagaccu cgcugcuuuu 180
cucaagcaaa ugccccaccu cuggacaagu ggcccugcac uaugauauau guucucaggg 240
uuagaucccc auugccagug gcuucauugg uggcuguuca cuguauuggg ggaaaacaaa 300
uccuuauuca gccuccccag gagguuccaa uccugcggga cccgacuuau uccuuagcgg 360
ucagcccucc gugugcuuuu acagacaauu ucaaagucag uuggugguau uaaagaagac 420
guccucacug uacagugcca aaacaaagau guucuuuugu cucauuugga uuugcauucc 480
agcuacuaag acuuguuggu agcccaccuc uuccuuaagc cugcugcaug gaugcuaugc 540
accccagaag uuuuuguaca ggcagauaag aagcaaguau uaggaccacu gguggcagug 600
gaagcaccac cugcuacucu acccaccaaa agguaccgcu uuccucaagu ggucuacaag 660
cuacacgugg uuccuuuuug aauuuguaag gacguaacau cuguauauuu aaucgaaggc 720
acacuuucag ccagcgucuu ugaaauauua guuucaucuu aacagauuag ugccuuugga 780
ucccaaguuc cuggugaacg cugcugcuuu ucauggucca cccagugacu aacaucugcc 840
gcgcugucuu uuccgaucuc guacauggag guuccucugg gggggcugcg gcuacuucug 900
cacaucggcu cuguagacac uuucuugccc agacuauaua auggcuuguu aaugauuuuu 960
uuuuuucuuu uggcucuaug agcucuggac uccaaacuuc aucauggcgc ucagcuacua 1020
caaccagaga guaauggguu agaaaccauc aguaaugggu uagaaaccau cacacucugc 1080
uuacggucag acucuggacu uuuacaucca cgaccucuug ucaucccugg aagcccucuu 1140
gucaucccug gaagcccagg agcccuacac uucuguagac cgggguucaa uuccuagaga 1200
ccgggguuca auuccuagcc ucuuuccgua ccauucuaga cuaacucugu agaacagucu 1260
uuaucuugua uucauuguac cgaugcugag guacugccuc gcuccaccuu uuuaccaaaa 1320
ugucugugau gucuacaaag uucuacucca agaguaugcg gcagcagaau uauauuauau 1380
ggacaccuac uacuacuacu acuaacuuga agcuguuuau aguagacuga uggccgacua 1440
acuaaaaagc cacgaugugu aucgagccac augcucacuu uauuauccau gggacuccuc 1500
uuuccaucgg aaaaguugag cauauguucu cagaguuuga cauuucgaaa ucugugguca 1560
guauuuuaau aauuagcuuu aaaguuauaa aaagcaaaca gaugucuuug uacccagagc 1620
ugcuauuuuu agauacuuag ucaacuuuua aaauaccacc auaggcaguu acauucugca 1680
guucuuucuu ucuuuuuuuu ucggaccagc uuacagaguc agcugcuaca uuuacauaga 1740
gugcaguuuc uuuuuucaga uuuuuuugua ucacuuugua gaccaacuag gaaucuacag 1800
auuaagugaa gcucuuuaua uaguugaguc uguaacauuc cugaugaucu ucauaaugua 1860
ucccuuacag gguccuuccu acaagaaaga acuuuuaaua uuaguagcag aauuuuuacu 1920
aucuauccau uacagccagu ccuguggcuu gcuagccugg aguucuaauc uucagaucuu 1980
gauuuaacag cagaggaaaa aggcauauag aaaauuugug acaguguagc ugugauucag 2040
ggcccgguuc augacccggc agucuucguu ugucagucaa aaagaagccu uuagugugug 2100
ucaacccacc ugcucucugu agacaguuug cuauuggggu gaauuuagau auucaucuag 2160
caaggugggg caguaauauc uuacccauuu uucauagaug ucuuuucaua ggcaauguaa 2220
gcuuuuaccc agcaccugua gaacaggucu guuuugguga gaaucuguuu ggugagaauc 2280
uguuuuggug gaacgggaau acccugcaug ccucuauugc ucauucuggg ccagcucauc 2340
auguagaugu uggaucuucu ccuaaaggcu uugacagacc cacuggucau agucacuccc 2400
uggauuguau ugcucgcaaa aucacuuaug cuucccuaua auuuucuggc ccuucuacac 2460
uguuauguca uuuuguuucu ugacugaguc uaucugugug accauagguu ucauacugug 2520
ucuagugcac uagugacauu ucccaauuca guuugauuuu uuugagguau uauaaugucc 2580
aaugaagcaa ggcuaacaaa gccuagucau caugucaagu ggacauaugg gaaaguaaaa 2640
uucaccaaau ucaccauaau acuagccacc auauguguaa ggaauuauaa cagggaguuu 2700
guaaucgguu uucagagcau ccauugucac ucuacacaag uagcagucau cgccuuagua 2760
auacaugauc uuuagaguau uauacucuac ucauugauuu gacuacauau uucuuaauga 2820
aaacuaugcu caacuuuuug cacaauggga agacuuaacc uguacauagc uguuuaauuu 2880
cucugacuca cgauaucccu guuuccaaug ugaagacagg caguguaaau agagaugaug 2940
gcagaaugcc ugaauuuaug gauugacugg cugggagccu ucgcaaugag uauuaauuaa 3000
uacagagaga gaauaggaua uggaguuaga gacguggaaa caaugccaug gagacuacag 3060
aguugacagc cugaacuugu acagcacauu aaucuacugg aaaguauaac ugggaagauu 3120
uccaggaaca uaaaauguau uugacucuug cuccaaauaa uaaauuaagg gggccuuaca 3180
cucacuggac agucaccccc ucugaugagc uguagaguug gacuauucug gugagcuuag 3240
ucccugggcc augccgcuug guuagugugu uuugugcucu uuaaaguuga gugauauacc 3300
ucauauauac aacacaccga agagacacuc aggguccuau aucuuuugcu guugagggac 3360
caaugcaggu ucaaggugac acacacuagg uuuagaguca uguguucugu gaucagugga 3420
ccaucuguau ggcugagaug aaauuugucc uucaucucac cauuguguag cccgauccuc 3480
uccucuugau gccuacucau uuuucaguuc uuacuucugc cagagucucc ugcuaaguuu 3540
cccauggaac auguacaucu gaaucuuugc aaccaagcag ugacugaccu uuaauuuggc 3600
aucuuugagu uggaaauccc aguacagagu aaccauuagc ugauaaauga gugagacaua 3660
aagcucugag caggcauugc aaugauaaaa ugaauaaaca caggacuaac uuuacauacu 3720
uaauuacuuc auaacugcaa aauaggaauu auagaucucu cauaaugcuu ugcccagugc 3780
uacugggugc uacuguacaa uaagcagccg auauagguag uucccaccca aguaauucau 3840
ucuccaguuu accuugcaaa accagaugca gagauaggcc ccaauaaaga ggaaaugguc 3900
auaauuuuau uuaauauuuc ccauauacac cucauuaucu gcuguaccuc auuauauaug 3960
aucuauuuuu uagucuuccu uagcaggccc aacucucagc uuuaugacuc cuuacgccag 4020
cucucugagg ccgcagauaa uguauuugug uugaauaaau cccucacaca cagauagcag 4080
cacauagcag cucaucgggc uuucauauca caucaccagc cucucauuag auccuugauu 4140
accacugugc cuucuucaca cagauguuug acacucaauu uuuacccccu ucugauuuac 4200

Claims (3)

1. a kind of method based on convolutional neural networks prediction pseudouridine decorating site, it is characterized in that, comprise the following steps:
1) data set is arranged and changed:Choose Wei, C., Hua, T., Jing, Y., Hao, L.&Chou, K.C.iRNA-PseU: Identifying RNA pseudouridine sites.Molecular Therapy Nucleic Acids 5,e332- The saccharomycete being made up of the positive sample containing pseudouridine site and the negative sample without pseudouridine site, people in 2016 papers and These data sets are encoded by the data set of three species of house mouse, and each sample in people and house mouse data set is turned Change the matrix of 20 × 20 sizes into, saccharomycete data set sample is converted into the matrix of 20 × 30 sizes;
2) model construction and training convolutional neural networks model:The structure of convolutional neural networks is built, will be converted into step 1) Input of the positive negative sample of matrix as convolutional neural networks, while meet the harmony of positive negative sample, adjust the CNN number of plies with And the number and size of convolution kernel, feature then is carried out to data set sequence using the convolutional neural networks structure adjusted and carried Take, train a model for including characteristic vector;
3) forecasting sequence interception and coding are treated:Using sliding window to whole piece sequence truncation to be predicted and coding, by required for It is FASTA forms that the whole piece sequence of detection, which arranges, i.e., first trip first character be '>', behind explanation of the addition to sequence, Next behavior sequence to be predicted, treat forecasting sequence with the sliding window of the data set sample equal length of same step 1) and cut Take, the sequence form of interception is identical with data set sample form, and the sequence of interception is converted into the matrix form in step 1);
4) feature extraction and prediction:Input the transformation result of step 3) as forecast set, carried using convolutional neural networks feature After taking, the convolutional neural networks model trained according to step 2) is predicted to list entries, then to sequence to be predicted End direction sliding window is arranged, the interception conversion to sequence and step 4) in repetitive cycling step 3), until the end of whole piece sequence Tail, the pseudouridine site finally predicted in whole piece sequence to be predicted.
2. the method according to claim 1 based on convolutional neural networks prediction pseudouridine decorating site, it is characterized in that, step It is rapid 1) described in be encoded to:Shared an A, U, G, tetra- kinds of ribonucleotides of C in RNA sequence, it is one group arbitrarily successively to take two, One shares 16 kinds of combinations, then carry out 16 dimension displacement coding, it is every a pair of combination can all be encoded as one 16 dimension row to Amount, for a sample sequence, from left to right take two adjacent nucleotide codings, then move to right a nucleotides, take behind Two adjacent nucleotides carry out displacement coding, repeat such operation and are encoded, to the last a nucleotides, according to this The coded system of sample understands that two neighboring nucleotides can be converted to the column vector of one 16 dimension, plus the chemistry of nucleotides Property, the chemical property of nucleotides are shown in Table 1, with the 17th dimension represent it is two neighboring in first nucleotides loop configuration, purine Represented with numeral ' 1 ', pyrimidine is represented with numeral ' 0 ';18th dimension represent it is two neighboring in first nucleotides functional group, amino Represented with numeral ' 1 ', ketone group is represented with numeral ' 0 ';19th dimension represent it is two neighboring in the pairing of first nucleotide complementary when hydrogen The power of key, it is strong to be represented with numeral ' 1 ', it is weak to be represented with numeral ' 0 ';20th dimension table show with it is two neighboring in first ucleotides The ratio that type identical nucleotides is accounted in sample after removing last nucleotides, it is made up of for one L+R+1 nucleotides Sample sequence, be converted into a matrix after coding, the matrix size is 20 × (L+R),
The chemical property of the ribonucleotide of table 1
3. the method according to claim 1 based on convolutional neural networks prediction pseudouridine decorating site, it is characterized in that, profit The application that sequence signature is extracted in pseudouridine site estimation is carried out with convolutional neural networks.
CN201710989588.1A 2017-10-23 2017-10-23 A kind of method based on convolutional neural networks prediction pseudouridine decorating site Pending CN107609351A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710989588.1A CN107609351A (en) 2017-10-23 2017-10-23 A kind of method based on convolutional neural networks prediction pseudouridine decorating site

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710989588.1A CN107609351A (en) 2017-10-23 2017-10-23 A kind of method based on convolutional neural networks prediction pseudouridine decorating site

Publications (1)

Publication Number Publication Date
CN107609351A true CN107609351A (en) 2018-01-19

Family

ID=61077911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710989588.1A Pending CN107609351A (en) 2017-10-23 2017-10-23 A kind of method based on convolutional neural networks prediction pseudouridine decorating site

Country Status (1)

Country Link
CN (1) CN107609351A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804867A (en) * 2018-06-15 2018-11-13 中国人民解放军军事科学院军事医学研究院 The model building method of pyrimidine dimer in radiation injury is identified based on Nanopore sequencing technologies
CN109215740A (en) * 2018-11-06 2019-01-15 中山大学 Full-length genome RNA secondary structure prediction method based on Xgboost
CN110070914A (en) * 2019-03-15 2019-07-30 崔大超 A kind of gene order recognition methods, system and computer readable storage medium
CN110892484A (en) * 2018-07-11 2020-03-17 因美纳有限公司 Deep learning-based framework for identifying sequence patterns causing sequence-specific errors (SSEs)
CN111161793A (en) * 2020-01-09 2020-05-15 青岛科技大学 Stacking integration based N in RNA6Method for predicting methyladenosine modification site
CN111899869A (en) * 2020-08-03 2020-11-06 东南大学 Depression patient identification system and identification method thereof
CN112217663A (en) * 2020-09-17 2021-01-12 暨南大学 Lightweight convolutional neural network security prediction method
CN115424663A (en) * 2022-10-14 2022-12-02 徐州工业职业技术学院 RNA modification site prediction method based on attention bidirectional representation model
US11842794B2 (en) 2019-03-19 2023-12-12 The University Of Hong Kong Variant calling in single molecule sequencing using a convolutional neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893787A (en) * 2016-06-21 2016-08-24 南昌大学 Prediction method for protein post-translational modification methylation loci
CN106250718A (en) * 2016-07-29 2016-12-21 於铉 N based on individually balanced Boosting algorithm1methylate adenosine site estimation method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893787A (en) * 2016-06-21 2016-08-24 南昌大学 Prediction method for protein post-translational modification methylation loci
CN106250718A (en) * 2016-07-29 2016-12-21 於铉 N based on individually balanced Boosting algorithm1methylate adenosine site estimation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEI CHEN等: "IRNA-PseU:Identifying RNA pseudouridine sites", 《MOLECULAR THERAPY-NUCLEIC ACIDS》 *
XUAN HE等: "Characterizing RNA Pseudouridylation by Convolutional Neural Networks", 《BIORXIV》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804867A (en) * 2018-06-15 2018-11-13 中国人民解放军军事科学院军事医学研究院 The model building method of pyrimidine dimer in radiation injury is identified based on Nanopore sequencing technologies
CN110892484A (en) * 2018-07-11 2020-03-17 因美纳有限公司 Deep learning-based framework for identifying sequence patterns causing sequence-specific errors (SSEs)
CN109215740A (en) * 2018-11-06 2019-01-15 中山大学 Full-length genome RNA secondary structure prediction method based on Xgboost
CN110070914A (en) * 2019-03-15 2019-07-30 崔大超 A kind of gene order recognition methods, system and computer readable storage medium
CN110070914B (en) * 2019-03-15 2020-07-03 崔大超 Gene sequence identification method, system and computer readable storage medium
US11842794B2 (en) 2019-03-19 2023-12-12 The University Of Hong Kong Variant calling in single molecule sequencing using a convolutional neural network
CN111161793B (en) * 2020-01-09 2023-02-03 青岛科技大学 Stacking integration based N in RNA 6 Method for predicting methyladenosine modification site
CN111161793A (en) * 2020-01-09 2020-05-15 青岛科技大学 Stacking integration based N in RNA6Method for predicting methyladenosine modification site
CN111899869A (en) * 2020-08-03 2020-11-06 东南大学 Depression patient identification system and identification method thereof
CN112217663A (en) * 2020-09-17 2021-01-12 暨南大学 Lightweight convolutional neural network security prediction method
CN112217663B (en) * 2020-09-17 2023-04-07 暨南大学 Lightweight convolutional neural network security prediction method
CN115424663A (en) * 2022-10-14 2022-12-02 徐州工业职业技术学院 RNA modification site prediction method based on attention bidirectional representation model
CN115424663B (en) * 2022-10-14 2024-04-12 徐州工业职业技术学院 RNA modification site prediction method based on attention bidirectional expression model

Similar Documents

Publication Publication Date Title
CN107609351A (en) A kind of method based on convolutional neural networks prediction pseudouridine decorating site
Griffiths-Jones Annotating noncoding RNA genes
Liao et al. Microbial community composition in alpine lake sediments from the Hengduan Mountains
Kang et al. Effect of microbial community structure in inoculum on the stimulation of direct interspecies electron transfer for methanogenesis
Cai et al. Desmonostoc danxiaense sp. nov.(Nostocales, Cyanobacteria) from Danxia mountain in China based on polyphasic approach
CN103617203A (en) Protein-ligand binding site predicting method based on inquiry drive
Do et al. Precursor microRNA identification using deep convolutional neural networks
Guo et al. Diversity and structure of soil bacterial community in intertidal zone of Daliao River estuary, Northeast China
CN103559423B (en) Method and device for predicting methylation
Fang et al. Seasonal changes driving shifts in microbial community assembly and species coexistence in an urban river
CN112348154A (en) DNA sequence design method based on chaos optimization whale algorithm
CN103186715A (en) Novel algorithm for predicting interaction of nucleic acid and protein
CN110534154B (en) Whale DNA sequence optimization method based on harmony search
Sualp et al. Using network context as a filter for miRNA target prediction
CN113658643B (en) Method for predicting lncRNA and mRNA based on attention mechanism
Zhang et al. Evaluating the different combinatorial constraints in DNA computing based on minimum free energy
CN103150491B (en) Based on the frequency spectrum 3-periodically signal to noise ratio (S/N ratio) acquisition methods of nucleotide potential difference
CN113362898A (en) RNA subcellular localization method for identifying by fusing multiple sequence frequency information
CN112786112B (en) Method and system for predicting combination of lncRNA and target DNA
CN101256602A (en) Method for rebuilding individual single somatotype based on optimizing solution aggregate
Nacher et al. On the relation between fluctuation and scaling-law in gene expression time series from yeast to human
Reinharz et al. Using structural and evolutionary information to detect and correct pyrosequencing errors in noncoding RNAs
Jiménez-Sánchez DNA computer code based on expanded genetic alphabet
CN113808671B (en) Method for distinguishing coding ribonucleic acid from non-coding ribonucleic acid based on deep learning
Reinharz et al. A linear inside-outside algorithm for correcting sequencing errors in structured RNAs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180119

WD01 Invention patent application deemed withdrawn after publication