CN107169312B - 一种低复杂度的天然无序蛋白质的预测方法 - Google Patents

一种低复杂度的天然无序蛋白质的预测方法 Download PDF

Info

Publication number
CN107169312B
CN107169312B CN201710388664.3A CN201710388664A CN107169312B CN 107169312 B CN107169312 B CN 107169312B CN 201710388664 A CN201710388664 A CN 201710388664A CN 107169312 B CN107169312 B CN 107169312B
Authority
CN
China
Prior art keywords
sequence
protein
residues
length
residue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710388664.3A
Other languages
English (en)
Other versions
CN107169312A (zh
Inventor
赵加祥
何昊
徐微
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN201710388664.3A priority Critical patent/CN107169312B/zh
Publication of CN107169312A publication Critical patent/CN107169312A/zh
Application granted granted Critical
Publication of CN107169312B publication Critical patent/CN107169312B/zh
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Analytical Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Peptides Or Proteins (AREA)

Abstract

本发明给出了一种低计算复杂度的天然无序蛋白质的预测方法。该方法针对蛋白质序列的每个残基,计算其香农熵、拓扑熵和三种氨基酸倾向性的加权平均值,利用瑞利熵最大化对天然无序蛋白质区域进行预测。该方案仅使用了5种特征和线性分类器,使其具有较高的运算速度和鲁棒性。仿真结果表明,在相似的预测准确度下,本发明设计的天然无序蛋白质的预测方案与现有的同类型预测方案相比,大大减少了特征个数和计算复杂度。

Description

一种低复杂度的天然无序蛋白质的预测方法
技术领域
本发明属于生物信息学领域,涉及一种高效、低计算复杂度的天然无序蛋白质的预测方案。
背景技术
天然无序蛋白质是指一个蛋白质至少有一个缺少唯一的三维结构且具有动态构象的区域,在药物设计、蛋白质表达和功能注释等方面都有重要的作用。因为研究发现一些天然无序蛋白质参与细胞中的重要调节功能,对阿尔茨海默病、帕金森病与某些癌症等疾病有重要影响。由于无序蛋白质区域提纯和结晶困难,通过实验来测定不但费用高昂且耗时很长。因此,通过计算的方法由蛋白质序列来测定无序区域的研究是十分重要的。
在过去的十数年间,提出了许多无序蛋白质预测方案,大致可分为两类:第一类利用无序蛋白质序列的氨基酸倾向性,第二类利用机器学习的方法。其中,第一类方法十分简单但是准确度不高。第二类方法主要基于人工神经网络和支持向量机,可以得到较高的预测准确度,但是要求计算一系列特征计算复杂度很高。
发明内容
本发明的目的是克服现有技术存在的上述不足,设计一种低复杂度的天然无序蛋白质的预测方法,可以使用少量的特征和计算,得到较高的预测准确度、较快的运算速度和鲁棒性。
本发明提供的低复杂度的天然无序蛋白质的预测方法的具体步骤如下:
(1)针对学习样本DIS数据集,令w表示其中一条蛋白质序列,用长度为N的滑动窗口截取相应长度的连续残基片段进行计算。此时假设w的长度即为N。
(2)计算w的香农熵,公式为:
Figure GDA0002424529720000011
其中,fk代表第k种氨基酸在w中出现的频率。
(3)计算拓扑熵:将由20种氨基酸组成的蛋白质序列w映射为0-1序列,其中疏水性氨基酸包括异亮氨酸、亮氨酸和缬氨酸,芳香族氨基酸包括苯丙氨酸、色氨酸和酪氨酸,疏水性氨基酸和芳香族氨基酸被映射为1,其余为0。计算w的拓扑熵:
Figure GDA0002424529720000021
其中
Figure GDA0002424529720000022
代表
Figure GDA0002424529720000023
中长度为n的不同子序列的个数,n满足:
Figure GDA0002424529720000024
Figure GDA0002424529720000025
表示
Figure GDA0002424529720000026
的从l开始的长度为2n+n-1的连续符号。
(4)对于长度为N的序列w,计算其Remark465,Deleage/Roux以及Bfactor(2STD)三种倾向性尺度的加权平均值:
Figure GDA0002424529720000027
其中wp(l),1≤l≤N代表序列w到第p种的倾向性的值。
(5)对于一条长度为L>N的序列w,将每个滑动窗口计算得到的五个特征值作为一个矢量分配给窗口的每个残基;针对每个残基,累加得到矢量并除以累加次数,得到最终的特征矢量;
截取N长片段wj=w(j) w(j+N-1),1≤j≤L-N+1,计算其香农熵、拓扑熵和三种倾向性的加权平均值这五种特征,得到一个5×1矢量vj:
vj=[HS(wj) Htop(wj) M1(wj) M2(wj) M3(wj)]T (5)
之后计算序列w的特征矩阵F=[x1 x2…xl…xL],其中
Figure GDA0002424529720000028
(6)利用5-fold交叉验证,训练分类器。将学习样本中的无序残基和有序残基的特征矢量输入分类器进行学习,得到分类器的参数:投影方向W和分类阈值。
计算训练集的特征矩阵:
Figure GDA0002424529720000029
其中Ns代表训练集中蛋白质序列的个数,Fi代表长度为Li的第i条蛋白质序列的特征矩阵,1≤i≤Ns。最佳投影方向为:
W=SW(mdis-mord) (8)
Figure GDA0002424529720000031
Figure GDA0002424529720000032
其中Ndis和Nord分别代表训练集中无序残基和有序残基的总个数,Xdis和Xord分别代表所有无序残基和有序残基的特征矩阵,如公式(10)所定义,
Figure GDA0002424529720000033
Figure GDA0002424529720000034
分别代表Xdis和Xord中的第j列向量。在W上的投影为Y=WTX。通过线性搜索,可以得到在Y上的分类阈值。
本发明的优点和积极效果:
1、本发明仅使用了5种特征和线性分类器,就使天然无序蛋白质的预测方法具有较高的运算速度和鲁棒性。2、仿真结果表明,在相似的预测准确度下,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法相比,大大减少了特征个数和计算复杂度。
附图说明
图1:实现本发明预测天然无序蛋白质方法的流程图。
图2:针对PU159数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。
图3:针对R80数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。
具体实施方式
实施例1:
本发明提供的天然无序蛋白质的预测方法具体步骤如下:
针对一条未判定无序区域的蛋白质序列w(以R80数据集中一条标号为1g4m的蛋白质序列为例),利用本发明提供的无序蛋白质预测方案进行预测的具体步骤如下:
步骤一:该序列长度为393,用N=35的滑动窗口对序列进行截取。针对每个窗口区间计算五种特征的值。
序列w=MGDKGTRVFKKASPNGKLTVYLGKRDFVDHIDLVEPV
针对第一个长度为N的窗口,按照公式(1)(3)(4),计算窗口所截取的序列片段的五种特征的值,并将这五个值分别赋给片段中的每个残基;之后,滑动窗口,计算从第二个残基开始的长度为N的序列片段的五种特征的值并累加给片段中每个残基;重复上述过程,直至窗口覆盖到最后一个残基。统计序列中每个残基的累加次数,用残基的各个累加的特征的值除以累加次数,得到其最终的特征矢量。
计算得到的序列w的特征矩阵如下,其中每一列为对应该位置残基的特征矢量:
Figure GDA0002424529720000041
步骤二:利用学习样本计算得到的投影方向和阈值,对X投影和判定,其中35个无序残基有29个被正确判定为无序,358个有序残基有314个被正确判定为有序。
为了验证该预测方法的有效性,利用R80数据集和PU159数据集对该方法进行了天然无序蛋白质的预测。其中,R80数据集中包含80条蛋白质序列,每条蛋白质序列都含有至少一个无序区域;PU159数据集中包含79条完全无序序列和80条完全有序序列。表1中列出了针对PU159数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。表2列出了针对R80数据集,本发明设计的天然无序蛋白质的预测方法与现有的同类型预测方法的预测准确度比较。表3列出了各个预测准确度参数的定义,其中TP表示预测正确的无序残基个数,TN表示预测正确的有序残基个数,FN表示原本是无序残基被错判为有序残基的个数,FP表示原本是有序残基被错判为无序残基的个数。
表1
methods Sens. Spec. Prob.Ex. Mcc
OurMethod 0.812 0.783 0.596 0.594
DisPSSMP 0.825 0.765 0.590 0.589
BVDEA 0.796 0.785 0.581 0.586
RONN 0.675 0.888 0.563 0.580
FoldIndex 0.722 0.815 0.536 0.540
DISOPRED2 0.469 0.981 0.449 0.543
PONDR 0.632 0.782 0.414 0.420
DISPRO 0.383 0.982 0.365 0.467
PreLink 0.319 0.991 0.310 0.430
表2
methods Sens. Spec. Prob.Ex. Mcc
OurMethod 0.727 0.897 0.624 0.515
DisPSSMP 0.767 0.848 0.615 0.463
BVDEA 0.817 0.728 0.545 0.451
RONN 0.603 0.878 0.481 0.395
FoldIndex 0.488 0.811 0.299 0.224
DISOPRED2 0.405 0.972 0.377 0.470
PONDR 0.557 0.816 0.373 0.278
DISPRO 0.418 0.993 0.411 0.578
PreLink 0.237 0.947 0.183 0.219
表3
Measures Equation
Sens TP/(TP+FN)
Spec TN/(TN+FP)
ProbEx (TP*TN-FP*FN)/((TP+FN)(TN+FP))
Mcc (TP*TN-FP*FN)/sqrt((TP+FP)(TN+FN)(TP+FN)(TN+FP))
参考文献
1.Jing Y,Marcin JM,Paul LF,Vladimir NU,Lukasz K,RAPID:Fast andaccuratesequence-based prediction of intrinsic disorder content on proteomicscale,Biochimicaet BiophysicaActa,1671-1680,2013.
2.VN Uversky,The mysterious unfoldome:structureless,underappreciated,yet vital part of any given proteome,J.Biomed.Biotechnol,2010.
3.Wright P,Dyson H,Intrinsically unstructured proteins:re-assessingthe protein structure-function paradigm,J.Mol.Biol.,293:321-331,1999.
4.Irem EK,Turgay I,Okan KE,Prediction ofdisorder with newcomputational tool:BVDEA.Expert Systems withApplications,38:14451-14459,2011.
5.Oldfield CJ,Ulrich EL,Cheng Y,Dunker AK,Markley JL,Addressing theintrinsic disorder bottleneck in structural proteomics,Proteins,59:444-453,2005.
6.Jaime P,Clifford EF,Tzviya ZBM,Edwin HR,Orna M,Jacques SB,IsraelSJLS,FoldIndex:a simple tool to predict whether a given protein sequence isintrinsically unfolded,BIOINFORMATICS,21(16):3435-3438,2005.
7.R Linding,RB Russell,V Neduva,TJ Gibson,Globplot:Exploring ProteinSequences for Globularity and Disorder.Nucleic Acids Research,31(13):3701-3708,2003.
8.Ferenc O,Judit O,Proteins without 3D structure:definition,detectionand beyond,BIOINFORMATICS,27(11):1449-1454,2011.
9.K Peng,S Vucetic,P Radivojac,C J Brown,A K Dunker,Z Obradovic,Optimizing LongIntrinsic Disorder Predictors with Protein EvolutionaryInformation,Journal of Bioinformatics and Computational Biology,3(1):35-60,2005.
10.Yang ZR,Thomson R,McNeil P,Esnouf RM,RONN:the bio-basis functionneural network technique applied to the detection of natively disorderedregions in proteins.Bioinformatics Advance Access Published 9,2005.
11.JJ Ward,JS Sodhi,LJ Mcguffin,BF Buxton,DT Jones,Prediction andFunctional Analysis ofNative Disorder in Proteins from the Three KingdomsofLife.J.Mol.Biol.,337:635-645,2004.
12.Su C,Chen C,Ou Y,Protein disorder prediction by condensed pssmconsidering propensity for order or disorder,BMC Bioinformatics,307-319,2006.
13.Ishida T,Kinoshita K,Prediction of disordered regions in proteinsbased on the meta approach,Bioinformatics 24:1344-1348,2008.
14.Schlessinger A,Improved disorder prediction by combination oforthogonal approaches,PLoS One,4:4433,2009.
15.Cheng J,Sweredoski MJ,Baldi P,Accurate prediction of proteindisordered regions by mining protein structure data,Data Mining and KnowledgeDiscovery,11:213-222,2005.
16.Weathers EA,Paulaitis ME,Woolf TB,Hoh JH,Reduced amino acidalphabet is sufficient to accurately recognize intrinsically disorderedprotein,FEBS Letters,576:348-352,2004.
17.David K,Topological entropy of DNA sequences.Bioinformatics,27(8):1061-1067,2011.
18.Mika S,Ratsch G,Weston J,Scholkoph B,Mullers KR,Fisherdiscriminant analysis with kernels,Neural Networks for Signal Processing,1999.
19.Kohavi,Ron,A study ofcross-validation and bootstrap for accuracyestimation and model selection.Proceedings of the Fourteenth InternationalJoint Conference on Artificial Intelligence,San Mateo,CA:Morgan Kaufmann,2(12):1137-1143,1995.
20.Uversky VN,Gillespie JR,Fink AL,Why are"natively unfolded"proteinsunstructured under physiologic conditions,Proteins 41:415-427,2000.

Claims (2)

1.一种低复杂度的天然无序蛋白质的预测方法,该方法仅使用了香农熵、拓扑熵和三种倾向性的加权平均值这五种特征和线性分类器实现了较准确的天然无序蛋白质的预测,提高了运算速度和鲁棒性,该方法的具体步骤如下:
第1、针对学习样本,令w表示其中一条蛋白质序列,用长度为N的滑动窗口截取相应长度的连续残基片段进行计算,此时假设w的长度即为N;
第2、首先计算w的香农熵,公式如下:
Figure FDA0002424529710000011
其中,fk代表第k种氨基酸在w中出现的频率,1≤k≤20;
第3、计算w的拓扑熵:
将由20种氨基酸组成的蛋白质序列w映射为0-1序列,其中疏水性氨基酸包括异亮氨酸、亮氨酸和缬氨酸,芳香族氨基酸包括苯丙氨酸、色氨酸和酪氨酸,疏水性氨基酸和芳香族氨基酸被映射为1,其余14种氨基酸被映射为0,计算w的拓扑熵为:
Figure FDA0002424529710000012
其中
Figure FDA0002424529710000013
代表
Figure FDA0002424529710000014
中长度为n的不同子序列的个数,n满足:
Figure FDA0002424529710000015
Figure FDA0002424529710000016
表示
Figure FDA0002424529710000017
的从l开始的长度为2n+n-1的连续残基片段;
第4、对于长度为N的序列w,计算该序列的Remark465,Deleage/Roux以及Bfactor(2STD)三种倾向性尺度的加权平均值:
Figure FDA0002424529710000018
其中wp(l),1≤l≤N代表序列w到第p种的倾向性的值;
第5、对于一条长度为L>N的序列w,将每个滑动窗口计算得到的五个特征值作为一个矢量分配给窗口中的每个残基;针对每个残基,累加得到矢量并除以累加次数,得到最终的特征矢量;
第6、利用5-fold交叉验证,训练分类器;将学习样本中的无序残基和有序残基的特征矢量输入分类器进行学习,得到分类器的相关参数:投影方向W和分类阈值;
第7、对于待预测的蛋白质序列,按照第1至第5步计算序列各残基的特征矢量,然后利用第6步得到的投影方向和分类阈值对各残基给予判定。
2.根据权利要求1所述的低复杂度的天然无序蛋白质的预测方法,其特征在于所述的投影方向W和分类阈值的计算方法如下,
最佳投影方向计算公式为:
W=SW(mdis-mord) (5)
Figure FDA0002424529710000021
Figure FDA0002424529710000022
其中Ndis和Nord分别代表训练集中无序残基和有序残基的总个数,Xdis和Xord分别代表所有无序残基和有序残基的特征矩阵,如公式(7)所定义,
Figure FDA0002424529710000023
Figure FDA0002424529710000024
分别代表Xdis和Xord中的第j列向量;
Figure FDA0002424529710000025
其中Ns代表训练集中蛋白质序列的个数,Fi代表长度为Li的第i条蛋白质序列的特征矩阵,1≤i≤Ns
在W上的投影为Y=WTX;通过线性搜索,可以得到在Y上的分类阈值。
CN201710388664.3A 2017-05-27 2017-05-27 一种低复杂度的天然无序蛋白质的预测方法 Expired - Fee Related CN107169312B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710388664.3A CN107169312B (zh) 2017-05-27 2017-05-27 一种低复杂度的天然无序蛋白质的预测方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710388664.3A CN107169312B (zh) 2017-05-27 2017-05-27 一种低复杂度的天然无序蛋白质的预测方法

Publications (2)

Publication Number Publication Date
CN107169312A CN107169312A (zh) 2017-09-15
CN107169312B true CN107169312B (zh) 2020-05-08

Family

ID=59821327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710388664.3A Expired - Fee Related CN107169312B (zh) 2017-05-27 2017-05-27 一种低复杂度的天然无序蛋白质的预测方法

Country Status (1)

Country Link
CN (1) CN107169312B (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012977A (zh) * 2010-12-21 2011-04-13 福建师范大学 一种基于概率神经网络集成的信号肽预测方法
JP2011130677A (ja) * 2009-12-22 2011-07-07 National Institute Of Advanced Industrial Science & Technology 発現予測装置および発現予測方法
CN103955628A (zh) * 2014-04-22 2014-07-30 南京理工大学 基于子空间融合的蛋白质-维他命绑定位点预测方法
CN104636635A (zh) * 2015-01-29 2015-05-20 南京理工大学 基于两层svm学习机制的蛋白质结晶预测方法
CN105868583A (zh) * 2016-04-06 2016-08-17 东北师范大学 一种基于序列使用代价敏感集成和聚类预测表位的方法
WO2016168090A1 (en) * 2015-04-14 2016-10-20 Nueon, Inc. Method and apparatus for determining markers of health by analysis of blood
CN106295242A (zh) * 2016-08-04 2017-01-04 上海交通大学 基于代价敏感lstm网络的蛋白质域检测方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011130677A (ja) * 2009-12-22 2011-07-07 National Institute Of Advanced Industrial Science & Technology 発現予測装置および発現予測方法
CN102012977A (zh) * 2010-12-21 2011-04-13 福建师范大学 一种基于概率神经网络集成的信号肽预测方法
CN103955628A (zh) * 2014-04-22 2014-07-30 南京理工大学 基于子空间融合的蛋白质-维他命绑定位点预测方法
CN104636635A (zh) * 2015-01-29 2015-05-20 南京理工大学 基于两层svm学习机制的蛋白质结晶预测方法
WO2016168090A1 (en) * 2015-04-14 2016-10-20 Nueon, Inc. Method and apparatus for determining markers of health by analysis of blood
CN105868583A (zh) * 2016-04-06 2016-08-17 东北师范大学 一种基于序列使用代价敏感集成和聚类预测表位的方法
CN106295242A (zh) * 2016-08-04 2017-01-04 上海交通大学 基于代价敏感lstm网络的蛋白质域检测方法

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data;JIANLIN CHENG 等;《Data Mining and Knowledge Discovery》;20051231;1-10 *
FISHER DISCRIMINANT ANALYSIS WITH KERNELS;Sebastian Mika 等;《The 9th IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing》;19991231;41-48 *
GlobPlot: exploring protein sequences for globularity and disorder;Rune Linding 等;《Nucleic Acids Research》;20031231;第31卷(第13期);3701-3708 *
OPTIMIZING LONG INTRINSIC DISORDER PREDICTORS WITH PROTEIN EVOLUTIONARY INFORMATION;KANG PENG 等;《Journal of Bioinformatics and Computational Biology》;20040504;1-23 *
Prediction of Disorder with New Computational Tool: BVDEA;Irem Ersoz Kaya 等;《Electrical and Computer Engineering》;20091231;1-31 *
RAPID: Fast and accurate sequence-based prediction of intrinsic disorder content on proteomic scale;Jing Yan 等;《Biochimica et Biophysica Acta》;20131231;第1834卷;1671-1680 *
Topological entropy of DNA sequences;David Koslicki;《BIOINFORMATICS》;20110210;第27卷(第8期);1061-1067 *
固有无序蛋白质无序与有序接点处的氨基酸序列分析;曹赞霞 等;《生物物理学报》;20110930;第27卷(第9期);801-811 *

Also Published As

Publication number Publication date
CN107169312A (zh) 2017-09-15

Similar Documents

Publication Publication Date Title
US11620567B2 (en) Method, apparatus, device and storage medium for predicting protein binding site
Valledor et al. Back to the basics: maximizing the information obtained by quantitative two dimensional gel electrophoresis analyses by an appropriate experimental design and statistical analyses
You et al. An improved sequence-based prediction protocol for protein-protein interactions using amino acids substitution matrix and rotation forest ensemble classifiers
Saidi et al. Protein sequences classification by means of feature extraction with substitution matrices
Möller-Levet et al. Clustering of unevenly sampled gene expression time-series data
Zhou et al. EL_LSTM: prediction of DNA-binding residue from protein sequence by combining long short-term memory and ensemble learning
CN107885971B (zh) 采用改进花授粉算法识别关键蛋白质的方法
Zhang et al. Short-term traffic flow prediction based on LSTM-XGBoost combination model
Moler et al. Integrating naive Bayes models and external knowledge to examine copper and iron homeostasis in S. cerevisiae
Midic et al. Intrinsic disorder in putative protein sequences
CN107169312B (zh) 一种低复杂度的天然无序蛋白质的预测方法
Qian et al. Automatic transcription factor classifier based on functional domain composition
Ahmed et al. Prediction of protein acetylation sites using kernel naive Bayes classifier based on protein sequences profiling
Su et al. Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing
Li et al. Multidimensional scaling method for prediction of lysine glycation sites
Fabian et al. Developing a new SVM classifier for the extended ES protein structure prediction
Nandi et al. Comparative genomics using data mining tools
Faridoon et al. Combining SVM and ECOC for identification of protein complexes from protein protein interaction networks by integrating amino acids’ physical properties and complex topology
Kim et al. Binding matrix: a novel approach for binding site recognition
Wang et al. Prediction of Protein‐Protein Interactions from Protein Sequences by Combining MatPCA Feature Extraction Algorithms and Weighted Sparse Representation Models
Minakuchi et al. Prediction of protein-protein interaction sites using support vector machines
Vostrikova et al. Strategy for the study of the proteome in animal muscle tissue
Fu et al. Prediction of anuran antimicrobial peptides using AdaBoost and improved PSSM profiles
Ding et al. Quality assessment of tandem mass spectra by using a weighted k-means
Shen et al. FoldExplorer: Fast and Accurate Protein Structure Search with Sequence-Enhanced Graph Embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200508