CN107169312B - 一种低复杂度的天然无序蛋白质的预测方法 - Google Patents
一种低复杂度的天然无序蛋白质的预测方法 Download PDFInfo
- Publication number
- CN107169312B CN107169312B CN201710388664.3A CN201710388664A CN107169312B CN 107169312 B CN107169312 B CN 107169312B CN 201710388664 A CN201710388664 A CN 201710388664A CN 107169312 B CN107169312 B CN 107169312B
- Authority
- CN
- China
- Prior art keywords
- sequence
- protein
- residues
- length
- residue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 43
- 102000004169 proteins and genes Human genes 0.000 title claims abstract description 43
- 238000000034 method Methods 0.000 title claims abstract description 32
- 150000001413 amino acids Chemical class 0.000 claims abstract description 11
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 16
- 229940024606 amino acid Drugs 0.000 claims description 14
- 239000011159 matrix material Substances 0.000 claims description 7
- 239000012634 fragment Substances 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- -1 aromatic amino acids Chemical class 0.000 claims description 4
- 230000002209 hydrophobic effect Effects 0.000 claims description 4
- AGPKZVBTJJNPAG-WHFBIAKZSA-N L-isoleucine Chemical compound CC[C@H](C)[C@H](N)C(O)=O AGPKZVBTJJNPAG-WHFBIAKZSA-N 0.000 claims description 2
- ROHFNLRQFUQHCH-YFKPBYRVSA-N L-leucine Chemical compound CC(C)C[C@H](N)C(O)=O ROHFNLRQFUQHCH-YFKPBYRVSA-N 0.000 claims description 2
- COLNVLDHVKWLRT-QMMMGPOBSA-N L-phenylalanine Chemical compound OC(=O)[C@@H](N)CC1=CC=CC=C1 COLNVLDHVKWLRT-QMMMGPOBSA-N 0.000 claims description 2
- QIVBCDIJIAJPQS-VIFPVBQESA-N L-tryptophane Chemical compound C1=CC=C2C(C[C@H](N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-VIFPVBQESA-N 0.000 claims description 2
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 claims description 2
- KZSNJWFQEVHDMF-BYPYZUCNSA-N L-valine Chemical compound CC(C)[C@H](N)C(O)=O KZSNJWFQEVHDMF-BYPYZUCNSA-N 0.000 claims description 2
- ROHFNLRQFUQHCH-UHFFFAOYSA-N Leucine Natural products CC(C)CC(N)C(O)=O ROHFNLRQFUQHCH-UHFFFAOYSA-N 0.000 claims description 2
- QIVBCDIJIAJPQS-UHFFFAOYSA-N Tryptophan Natural products C1=CC=C2C(CC(N)C(O)=O)=CNC2=C1 QIVBCDIJIAJPQS-UHFFFAOYSA-N 0.000 claims description 2
- KZSNJWFQEVHDMF-UHFFFAOYSA-N Valine Natural products CC(C)C(N)C(O)=O KZSNJWFQEVHDMF-UHFFFAOYSA-N 0.000 claims description 2
- 238000002790 cross-validation Methods 0.000 claims description 2
- 229960000310 isoleucine Drugs 0.000 claims description 2
- AGPKZVBTJJNPAG-UHFFFAOYSA-N isoleucine Natural products CCC(C)C(N)C(O)=O AGPKZVBTJJNPAG-UHFFFAOYSA-N 0.000 claims description 2
- COLNVLDHVKWLRT-UHFFFAOYSA-N phenylalanine Natural products OC(=O)C(N)CC1=CC=CC=C1 COLNVLDHVKWLRT-UHFFFAOYSA-N 0.000 claims description 2
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 claims description 2
- 239000004474 valine Substances 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims 1
- 238000004088 simulation Methods 0.000 abstract description 2
- 230000035508 accumulation Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 208000024827 Alzheimer disease Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 208000018737 Parkinson disease Diseases 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000000739 chaotic effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000009510 drug design Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000009711 regulatory function Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Data Mining & Analysis (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Analytical Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Peptides Or Proteins (AREA)
Abstract
本发明给出了一种低计算复杂度的天然无序蛋白质的预测方法。该方法针对蛋白质序列的每个残基,计算其香农熵、拓扑熵和三种氨基酸倾向性的加权平均值,利用瑞利熵最大化对天然无序蛋白质区域进行预测。该方案仅使用了5种特征和线性分类器,使其具有较高的运算速度和鲁棒性。仿真结果表明,在相似的预测准确度下,本发明设计的天然无序蛋白质的预测方案与现有的同类型预测方案相比,大大减少了特征个数和计算复杂度。
Description
技术领域
本发明属于生物信息学领域,涉及一种高效、低计算复杂度的天然无序蛋白质的预测方案。
背景技术
天然无序蛋白质是指一个蛋白质至少有一个缺少唯一的三维结构且具有动态构象的区域,在药物设计、蛋白质表达和功能注释等方面都有重要的作用。因为研究发现一些天然无序蛋白质参与细胞中的重要调节功能,对阿尔茨海默病、帕金森病与某些癌症等疾病有重要影响。由于无序蛋白质区域提纯和结晶困难,通过实验来测定不但费用高昂且耗时很长。因此,通过计算的方法由蛋白质序列来测定无序区域的研究是十分重要的。
在过去的十数年间,提出了许多无序蛋白质预测方案,大致可分为两类:第一类利用无序蛋白质序列的氨基酸倾向性,第二类利用机器学习的方法。其中,第一类方法十分简单但是准确度不高。第二类方法主要基于人工神经网络和支持向量机,可以得到较高的预测准确度,但是要求计算一系列特征计算复杂度很高。
发明内容
本发明的目的是克服现有技术存在的上述不足,设计一种低复杂度的天然无序蛋白质的预测方法,可以使用少量的特征和计算,得到较高的预测准确度、较快的运算速度和鲁棒性。
本发明提供的低复杂度的天然无序蛋白质的预测方法的具体步骤如下:
(1)针对学习样本DIS数据集,令w表示其中一条蛋白质序列,用长度为N的滑动窗口截取相应长度的连续残基片段进行计算。此时假设w的长度即为N。
其中,fk代表第k种氨基酸在w中出现的频率。
(3)计算拓扑熵:将由20种氨基酸组成的蛋白质序列w映射为0-1序列,其中疏水性氨基酸包括异亮氨酸、亮氨酸和缬氨酸,芳香族氨基酸包括苯丙氨酸、色氨酸和酪氨酸,疏水性氨基酸和芳香族氨基酸被映射为1,其余为0。计算w的拓扑熵:
(4)对于长度为N的序列w,计算其Remark465,Deleage/Roux以及Bfactor(2STD)三种倾向性尺度的加权平均值:
其中wp(l),1≤l≤N代表序列w到第p种的倾向性的值。
(5)对于一条长度为L>N的序列w,将每个滑动窗口计算得到的五个特征值作为一个矢量分配给窗口的每个残基;针对每个残基,累加得到矢量并除以累加次数,得到最终的特征矢量;
截取N长片段wj=w(j) w(j+N-1),1≤j≤L-N+1,计算其香农熵、拓扑熵和三种倾向性的加权平均值这五种特征,得到一个5×1矢量vj:
vj=[HS(wj) Htop(wj) M1(wj) M2(wj) M3(wj)]T (5)
之后计算序列w的特征矩阵F=[x1 x2…xl…xL],其中
(6)利用5-fold交叉验证,训练分类器。将学习样本中的无序残基和有序残基的特征矢量输入分类器进行学习,得到分类器的参数:投影方向W和分类阈值。
计算训练集的特征矩阵:
其中Ns代表训练集中蛋白质序列的个数,Fi代表长度为Li的第i条蛋白质序列的特征矩阵,1≤i≤Ns。最佳投影方向为:
W=SW(mdis-mord) (8)
其中Ndis和Nord分别代表训练集中无序残基和有序残基的总个数,Xdis和Xord分别代表所有无序残基和有序残基的特征矩阵,如公式(10)所定义,和分别代表Xdis和Xord中的第j列向量。在W上的投影为Y=WTX。通过线性搜索,可以得到在Y上的分类阈值。
本发明的优点和积极效果:
1、本发明仅使用了5种特征和线性分类器,就使天然无序蛋白质的预测方法具有较高的运算速度和鲁棒性。2、仿真结果表明,在相似的预测准确度下,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法相比,大大减少了特征个数和计算复杂度。
附图说明
图1:实现本发明预测天然无序蛋白质方法的流程图。
图2:针对PU159数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。
图3:针对R80数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。
具体实施方式
实施例1:
本发明提供的天然无序蛋白质的预测方法具体步骤如下:
针对一条未判定无序区域的蛋白质序列w(以R80数据集中一条标号为1g4m的蛋白质序列为例),利用本发明提供的无序蛋白质预测方案进行预测的具体步骤如下:
步骤一:该序列长度为393,用N=35的滑动窗口对序列进行截取。针对每个窗口区间计算五种特征的值。
序列w=MGDKGTRVFKKASPNGKLTVYLGKRDFVDHIDLVEPV
针对第一个长度为N的窗口,按照公式(1)(3)(4),计算窗口所截取的序列片段的五种特征的值,并将这五个值分别赋给片段中的每个残基;之后,滑动窗口,计算从第二个残基开始的长度为N的序列片段的五种特征的值并累加给片段中每个残基;重复上述过程,直至窗口覆盖到最后一个残基。统计序列中每个残基的累加次数,用残基的各个累加的特征的值除以累加次数,得到其最终的特征矢量。
计算得到的序列w的特征矩阵如下,其中每一列为对应该位置残基的特征矢量:
步骤二:利用学习样本计算得到的投影方向和阈值,对X投影和判定,其中35个无序残基有29个被正确判定为无序,358个有序残基有314个被正确判定为有序。
为了验证该预测方法的有效性,利用R80数据集和PU159数据集对该方法进行了天然无序蛋白质的预测。其中,R80数据集中包含80条蛋白质序列,每条蛋白质序列都含有至少一个无序区域;PU159数据集中包含79条完全无序序列和80条完全有序序列。表1中列出了针对PU159数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。表2列出了针对R80数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。表3列出了各个预测准确度参数的定义,其中TP表示预测正确的无序残基个数,TN表示预测正确的有序残基个数,FN表示原本是无序残基被错判为有序残基的个数,FP表示原本是有序残基被错判为无序残基的个数。
表1
methods | Sens. | Spec. | Prob.Ex. | Mcc |
OurMethod | 0.812 | 0.783 | 0.596 | 0.594 |
DisPSSMP | 0.825 | 0.765 | 0.590 | 0.589 |
BVDEA | 0.796 | 0.785 | 0.581 | 0.586 |
RONN | 0.675 | 0.888 | 0.563 | 0.580 |
FoldIndex | 0.722 | 0.815 | 0.536 | 0.540 |
DISOPRED2 | 0.469 | 0.981 | 0.449 | 0.543 |
PONDR | 0.632 | 0.782 | 0.414 | 0.420 |
DISPRO | 0.383 | 0.982 | 0.365 | 0.467 |
PreLink | 0.319 | 0.991 | 0.310 | 0.430 |
表2
methods | Sens. | Spec. | Prob.Ex. | Mcc |
OurMethod | 0.727 | 0.897 | 0.624 | 0.515 |
DisPSSMP | 0.767 | 0.848 | 0.615 | 0.463 |
BVDEA | 0.817 | 0.728 | 0.545 | 0.451 |
RONN | 0.603 | 0.878 | 0.481 | 0.395 |
FoldIndex | 0.488 | 0.811 | 0.299 | 0.224 |
DISOPRED2 | 0.405 | 0.972 | 0.377 | 0.470 |
PONDR | 0.557 | 0.816 | 0.373 | 0.278 |
DISPRO | 0.418 | 0.993 | 0.411 | 0.578 |
PreLink | 0.237 | 0.947 | 0.183 | 0.219 |
表3
Measures | Equation |
Sens | TP/(TP+FN) |
Spec | TN/(TN+FP) |
ProbEx | (TP*TN-FP*FN)/((TP+FN)(TN+FP)) |
Mcc | (TP*TN-FP*FN)/sqrt((TP+FP)(TN+FN)(TP+FN)(TN+FP)) |
参考文献
1.Jing Y,Marcin JM,Paul LF,Vladimir NU,Lukasz K,RAPID:Fast andaccuratesequence-based prediction of intrinsic disorder content on proteomicscale,Biochimicaet BiophysicaActa,1671-1680,2013.
2.VN Uversky,The mysterious unfoldome:structureless,underappreciated,yet vital part of any given proteome,J.Biomed.Biotechnol,2010.
3.Wright P,Dyson H,Intrinsically unstructured proteins:re-assessingthe protein structure-function paradigm,J.Mol.Biol.,293:321-331,1999.
4.Irem EK,Turgay I,Okan KE,Prediction ofdisorder with newcomputational tool:BVDEA.Expert Systems withApplications,38:14451-14459,2011.
5.Oldfield CJ,Ulrich EL,Cheng Y,Dunker AK,Markley JL,Addressing theintrinsic disorder bottleneck in structural proteomics,Proteins,59:444-453,2005.
6.Jaime P,Clifford EF,Tzviya ZBM,Edwin HR,Orna M,Jacques SB,IsraelSJLS,FoldIndex:a simple tool to predict whether a given protein sequence isintrinsically unfolded,BIOINFORMATICS,21(16):3435-3438,2005.
7.R Linding,RB Russell,V Neduva,TJ Gibson,Globplot:Exploring ProteinSequences for Globularity and Disorder.Nucleic Acids Research,31(13):3701-3708,2003.
8.Ferenc O,Judit O,Proteins without 3D structure:definition,detectionand beyond,BIOINFORMATICS,27(11):1449-1454,2011.
9.K Peng,S Vucetic,P Radivojac,C J Brown,A K Dunker,Z Obradovic,Optimizing LongIntrinsic Disorder Predictors with Protein EvolutionaryInformation,Journal of Bioinformatics and Computational Biology,3(1):35-60,2005.
10.Yang ZR,Thomson R,McNeil P,Esnouf RM,RONN:the bio-basis functionneural network technique applied to the detection of natively disorderedregions in proteins.Bioinformatics Advance Access Published 9,2005.
11.JJ Ward,JS Sodhi,LJ Mcguffin,BF Buxton,DT Jones,Prediction andFunctional Analysis ofNative Disorder in Proteins from the Three KingdomsofLife.J.Mol.Biol.,337:635-645,2004.
12.Su C,Chen C,Ou Y,Protein disorder prediction by condensed pssmconsidering propensity for order or disorder,BMC Bioinformatics,307-319,2006.
13.Ishida T,Kinoshita K,Prediction of disordered regions in proteinsbased on the meta approach,Bioinformatics 24:1344-1348,2008.
14.Schlessinger A,Improved disorder prediction by combination oforthogonal approaches,PLoS One,4:4433,2009.
15.Cheng J,Sweredoski MJ,Baldi P,Accurate prediction of proteindisordered regions by mining protein structure data,Data Mining and KnowledgeDiscovery,11:213-222,2005.
16.Weathers EA,Paulaitis ME,Woolf TB,Hoh JH,Reduced amino acidalphabet is sufficient to accurately recognize intrinsically disorderedprotein,FEBS Letters,576:348-352,2004.
17.David K,Topological entropy of DNA sequences.Bioinformatics,27(8):1061-1067,2011.
18.Mika S,Ratsch G,Weston J,Scholkoph B,Mullers KR,Fisherdiscriminant analysis with kernels,Neural Networks for Signal Processing,1999.
19.Kohavi,Ron,A study ofcross-validation and bootstrap for accuracyestimation and model selection.Proceedings of the Fourteenth InternationalJoint Conference on Artificial Intelligence,San Mateo,CA:Morgan Kaufmann,2(12):1137-1143,1995.
20.Uversky VN,Gillespie JR,Fink AL,Why are"natively unfolded"proteinsunstructured under physiologic conditions,Proteins 41:415-427,2000.
Claims (2)
1.一种低复杂度的天然无序蛋白质的预测方法,该方法仅使用了香农熵、拓扑熵和三种倾向性的加权平均值这五种特征和线性分类器实现了较准确的天然无序蛋白质的预测,提高了运算速度和鲁棒性,该方法的具体步骤如下:
第1、针对学习样本,令w表示其中一条蛋白质序列,用长度为N的滑动窗口截取相应长度的连续残基片段进行计算,此时假设w的长度即为N;
第2、首先计算w的香农熵,公式如下:
其中,fk代表第k种氨基酸在w中出现的频率,1≤k≤20;
第3、计算w的拓扑熵:
将由20种氨基酸组成的蛋白质序列w映射为0-1序列,其中疏水性氨基酸包括异亮氨酸、亮氨酸和缬氨酸,芳香族氨基酸包括苯丙氨酸、色氨酸和酪氨酸,疏水性氨基酸和芳香族氨基酸被映射为1,其余14种氨基酸被映射为0,计算w的拓扑熵为:
第4、对于长度为N的序列w,计算该序列的Remark465,Deleage/Roux以及Bfactor(2STD)三种倾向性尺度的加权平均值:
其中wp(l),1≤l≤N代表序列w到第p种的倾向性的值;
第5、对于一条长度为L>N的序列w,将每个滑动窗口计算得到的五个特征值作为一个矢量分配给窗口中的每个残基;针对每个残基,累加得到矢量并除以累加次数,得到最终的特征矢量;
第6、利用5-fold交叉验证,训练分类器;将学习样本中的无序残基和有序残基的特征矢量输入分类器进行学习,得到分类器的相关参数:投影方向W和分类阈值;
第7、对于待预测的蛋白质序列,按照第1至第5步计算序列各残基的特征矢量,然后利用第6步得到的投影方向和分类阈值对各残基给予判定。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710388664.3A CN107169312B (zh) | 2017-05-27 | 2017-05-27 | 一种低复杂度的天然无序蛋白质的预测方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710388664.3A CN107169312B (zh) | 2017-05-27 | 2017-05-27 | 一种低复杂度的天然无序蛋白质的预测方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107169312A CN107169312A (zh) | 2017-09-15 |
CN107169312B true CN107169312B (zh) | 2020-05-08 |
Family
ID=59821327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710388664.3A Expired - Fee Related CN107169312B (zh) | 2017-05-27 | 2017-05-27 | 一种低复杂度的天然无序蛋白质的预测方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107169312B (zh) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102012977A (zh) * | 2010-12-21 | 2011-04-13 | 福建师范大学 | 一种基于概率神经网络集成的信号肽预测方法 |
JP2011130677A (ja) * | 2009-12-22 | 2011-07-07 | National Institute Of Advanced Industrial Science & Technology | 発現予測装置および発現予測方法 |
CN103955628A (zh) * | 2014-04-22 | 2014-07-30 | 南京理工大学 | 基于子空间融合的蛋白质-维他命绑定位点预测方法 |
CN104636635A (zh) * | 2015-01-29 | 2015-05-20 | 南京理工大学 | 基于两层svm学习机制的蛋白质结晶预测方法 |
CN105868583A (zh) * | 2016-04-06 | 2016-08-17 | 东北师范大学 | 一种基于序列使用代价敏感集成和聚类预测表位的方法 |
WO2016168090A1 (en) * | 2015-04-14 | 2016-10-20 | Nueon, Inc. | Method and apparatus for determining markers of health by analysis of blood |
CN106295242A (zh) * | 2016-08-04 | 2017-01-04 | 上海交通大学 | 基于代价敏感lstm网络的蛋白质域检测方法 |
-
2017
- 2017-05-27 CN CN201710388664.3A patent/CN107169312B/zh not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011130677A (ja) * | 2009-12-22 | 2011-07-07 | National Institute Of Advanced Industrial Science & Technology | 発現予測装置および発現予測方法 |
CN102012977A (zh) * | 2010-12-21 | 2011-04-13 | 福建师范大学 | 一种基于概率神经网络集成的信号肽预测方法 |
CN103955628A (zh) * | 2014-04-22 | 2014-07-30 | 南京理工大学 | 基于子空间融合的蛋白质-维他命绑定位点预测方法 |
CN104636635A (zh) * | 2015-01-29 | 2015-05-20 | 南京理工大学 | 基于两层svm学习机制的蛋白质结晶预测方法 |
WO2016168090A1 (en) * | 2015-04-14 | 2016-10-20 | Nueon, Inc. | Method and apparatus for determining markers of health by analysis of blood |
CN105868583A (zh) * | 2016-04-06 | 2016-08-17 | 东北师范大学 | 一种基于序列使用代价敏感集成和聚类预测表位的方法 |
CN106295242A (zh) * | 2016-08-04 | 2017-01-04 | 上海交通大学 | 基于代价敏感lstm网络的蛋白质域检测方法 |
Non-Patent Citations (8)
Title |
---|
Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data;JIANLIN CHENG 等;《Data Mining and Knowledge Discovery》;20051231;1-10 * |
FISHER DISCRIMINANT ANALYSIS WITH KERNELS;Sebastian Mika 等;《The 9th IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing》;19991231;41-48 * |
GlobPlot: exploring protein sequences for globularity and disorder;Rune Linding 等;《Nucleic Acids Research》;20031231;第31卷(第13期);3701-3708 * |
OPTIMIZING LONG INTRINSIC DISORDER PREDICTORS WITH PROTEIN EVOLUTIONARY INFORMATION;KANG PENG 等;《Journal of Bioinformatics and Computational Biology》;20040504;1-23 * |
Prediction of Disorder with New Computational Tool: BVDEA;Irem Ersoz Kaya 等;《Electrical and Computer Engineering》;20091231;1-31 * |
RAPID: Fast and accurate sequence-based prediction of intrinsic disorder content on proteomic scale;Jing Yan 等;《Biochimica et Biophysica Acta》;20131231;第1834卷;1671-1680 * |
Topological entropy of DNA sequences;David Koslicki;《BIOINFORMATICS》;20110210;第27卷(第8期);1061-1067 * |
固有无序蛋白质无序与有序接点处的氨基酸序列分析;曹赞霞 等;《生物物理学报》;20110930;第27卷(第9期);801-811 * |
Also Published As
Publication number | Publication date |
---|---|
CN107169312A (zh) | 2017-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11620567B2 (en) | Method, apparatus, device and storage medium for predicting protein binding site | |
Valledor et al. | Back to the basics: maximizing the information obtained by quantitative two dimensional gel electrophoresis analyses by an appropriate experimental design and statistical analyses | |
You et al. | An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers | |
Saidi et al. | Protein sequences classification by means of feature extraction with substitution matrices | |
Möller-Levet et al. | Clustering of unevenly sampled gene expression time-series data | |
Zhou et al. | EL_LSTM: prediction of DNA-binding residue from protein sequence by combining long short-term memory and ensemble learning | |
CN107885971B (zh) | 采用改进花授粉算法识别关键蛋白质的方法 | |
Zhang et al. | Short-term traffic flow prediction based on LSTM-XGBoost combination model | |
Moler et al. | Integrating naive Bayes models and external knowledge to examine copper and iron homeostasis in S. cerevisiae | |
Midic et al. | Intrinsic disorder in putative protein sequences | |
CN107169312B (zh) | 一种低复杂度的天然无序蛋白质的预测方法 | |
Qian et al. | Automatic transcription factor classifier based on functional domain composition | |
Ahmed et al. | Prediction of protein acetylation sites using kernel naive Bayes classifier based on protein sequences profiling | |
Su et al. | Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing | |
Li et al. | Multidimensional scaling method for prediction of lysine glycation sites | |
Fabian et al. | Developing a new SVM classifier for the extended ES protein structure prediction | |
Nandi et al. | Comparative genomics using data mining tools | |
Faridoon et al. | Combining SVM and ECOC for identification of protein complexes from protein protein interaction networks by integrating amino acids’ physical properties and complex topology | |
Kim et al. | Binding matrix: a novel approach for binding site recognition | |
Wang et al. | Prediction of Protein‐Protein Interactions from Protein Sequences by Combining MatPCA Feature Extraction Algorithms and Weighted Sparse Representation Models | |
Minakuchi et al. | Prediction of protein-protein interaction sites using support vector machines | |
Vostrikova et al. | Strategy for the study of the proteome in animal muscle tissue | |
Fu et al. | Prediction of anuran antimicrobial peptides using AdaBoost and improved PSSM profiles | |
Ding et al. | Quality assessment of tandem mass spectra by using a weighted k-means | |
Shen et al. | FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200508 |