CN106096326B - A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy - Google Patents
A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy Download PDFInfo
- Publication number
- CN106096326B CN106096326B CN201610390675.0A CN201610390675A CN106096326B CN 106096326 B CN106096326 B CN 106096326B CN 201610390675 A CN201610390675 A CN 201610390675A CN 106096326 B CN106096326 B CN 106096326B
- Authority
- CN
- China
- Prior art keywords
- conformation
- centroid
- energy
- target
- rand1
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000035772 mutation Effects 0.000 title claims abstract description 20
- 238000000455 protein structure prediction Methods 0.000 title claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000001174 ascending effect Effects 0.000 claims abstract description 6
- 102000004169 proteins and genes Human genes 0.000 claims description 26
- 108090000623 proteins and genes Proteins 0.000 claims description 26
- 150000001413 amino acids Chemical class 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims 2
- 239000012634 fragment Substances 0.000 abstract description 27
- 238000005457 optimization Methods 0.000 description 11
- 230000002068 genetic effect Effects 0.000 description 4
- 125000003275 alpha amino acid group Chemical group 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 238000005564 crystal structure determination Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000324 molecular mechanic Methods 0.000 description 1
- 238000010995 multi-dimensional NMR spectroscopy Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000002424 x-ray crystallography Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
一种基于质心变异策略的差分进化蛋白质结构预测方法,首先,根据各构象的能量值进行升序排列,并计算各构象与能量最低构象的平均能量误差值;然后,选取部分能量较低的构象计算质心构象;最后,根据平均能量误差值判断算法所达到的搜索状态,从而设计不同的质心变异策略生成测试构象,即如果平均能量误差值大于设定的阈值,则设计DE/rand‑to‑centroid/1策略进行变异,通过提取质心构象中的部分片段替换随机选取的构象中的对应片段生成测试构象,否则设计DE/centroid/2策略进行变异,通过提取随机选择的构象中的片段替换质心构象中的对应片段生成测试构象,从而提高算法搜索效率和预测精度。
A differential evolution protein structure prediction method based on the centroid mutation strategy. First, the energy values of each conformation are arranged in ascending order, and the average energy error value between each conformation and the lowest energy conformation is calculated; then, some conformations with lower energy are selected to calculate Centroid conformation; finally, judge the search state achieved by the algorithm according to the average energy error value, so as to design different centroid mutation strategies to generate the test conformation, that is, if the average energy error value is greater than the set threshold, design DE/rand-to-centroid The /1 strategy is mutated, and the test conformation is generated by extracting some fragments in the centroid conformation to replace the corresponding fragments in the randomly selected conformation, otherwise, the DE/centroid/2 strategy is designed to mutate, and the centroid conformation is replaced by extracting fragments in the randomly selected conformation The corresponding fragments in generate a test conformation, thereby improving the algorithm search efficiency and prediction accuracy.
Description
技术领域technical field
本发明涉及一种生物学信息学、智能优化、计算机应用领域,尤其涉及的是,一种基于质心变异策略的差分进化蛋白质结构预测方法。The present invention relates to the fields of biological informatics, intelligent optimization and computer application, in particular to a differential evolution protein structure prediction method based on centroid mutation strategy.
背景技术Background technique
1953年,J.Watson和F.Crick在英国《Nature》杂志上发表了DNA分子双螺旋结构模型,标志着分子生物学真正意义上的诞生;五年后,F.Crick提出分子生物学“中心法则”的设想,揭示了生命遗传信息传递的一般规律。作为该法则的关键部分,从DNA到蛋白质氨基酸序列的三联遗传密码(简称“第一密码”)的破译工作早在1965年就已经全部完成;然而,从氨基酸序列到空间结构的折叠密码(简称“第二密码”)至今尚未破解。随着2003年人类基因组测序工作的完成,蛋白质氨基酸序列数量激增,蛋白质折叠密码的理论研究成为当前蛋白质工程领域迫切需要解决的一个关键问题。In 1953, J.Watson and F.Crick published the DNA molecular double helix structure model in the British "Nature", marking the birth of molecular biology in a true sense; five years later, F.Crick proposed the "Center of Molecular Biology". The idea of "law" reveals the general law of life genetic information transmission. As a key part of this law, the deciphering of the triple genetic code from DNA to protein amino acid sequence (referred to as "the first code") has been completed as early as 1965; however, the folding code from amino acid sequence to spatial structure (referred to as The "Second Code") has not yet been cracked. With the completion of the sequencing of the human genome in 2003, the number of protein amino acid sequences has increased sharply, and the theoretical study of protein folding codes has become a key issue that needs to be solved urgently in the field of protein engineering.
结构基因组学利用实验手段来测定蛋白质的三维结构。X射线晶体学方法是至今为止研究蛋白质结构最有效的方法,所能达到的精度是任何其他方法所不能比拟的,它的缺点主要是蛋白质的晶体难以培养且晶体结构测定的周期较长。多维核磁共振方法可以直接测定蛋白质在溶液中的构象,但是由于对样品的需要量大、纯度要求高,目前只能测定小分子蛋白质。总体上,蛋白质结构实验测定方法极其费时费钱费力。Structural genomics uses experimental means to determine the three-dimensional structure of proteins. X-ray crystallography is by far the most effective method for studying protein structure, and the precision it can achieve is unmatched by any other method. Its main disadvantages are that protein crystals are difficult to cultivate and the period of crystal structure determination is long. The multidimensional NMR method can directly determine the conformation of proteins in solution, but due to the large amount of samples required and the high purity requirements, currently only small molecular proteins can be determined. In general, protein structure experimental determination methods are extremely time-consuming, costly and labor-intensive.
从头预测方法被誉为蛋白质结构预测领域的圣杯,鉴于其重要的生物学意义和问题的复杂性,2005年《Science》杂志将其列为当前科学界亟待解决的100个最具挑战性问题之一。蛋白质从头预测方法必须考虑以下两个因素:(1)蛋白质结构能量函数;(2)构象空间搜索方法。第一个因素本质上属于分子力学问题,主要是为了能够计算得到每个蛋白质结构对应的能量值。第二个因素本质上属于全局优化问题,通过选择一种合适的优化方法,对构象空间进行快速搜索,得到与某一全局最小能量对应的构象。其中,蛋白质构象空间优化属于一类非常难解的NP-Hard问题。群体进化类算法是研究蛋白质分子构象优化的重要方法,主要包括差分进化算法(DE)、遗传算法(GA)、粒子群算法(PSO),这些算法不仅结构简单,易于实现,而且鲁棒性强,因此,经常被用于从头预测方法中的全局最小能量构象搜索。然而群体优化算法属于一类随机优化算法,现有蛋白质构象优化方面的文献主要研究如何从一个局部最小解跳到另一个局部最小解,没有提供一种机制有效利用群体进化过程的智能信息指导搜索,从而导致算法效率较低。此外,受选择压力和随机采样过程中遗传漂变的影响,群体中所有个体将不可避免收敛到某个吸收态。对于蛋白质构象这类优化问题,该吸收态并不一定就是全局最优解,从而影响预测精度。The de novo prediction method is known as the Holy Grail in the field of protein structure prediction. In view of its important biological significance and the complexity of the problem, in 2005, "Science" magazine listed it as one of the 100 most challenging problems to be solved in the current scientific community. one. The following two factors must be considered in protein de novo prediction methods: (1) protein structure energy function; (2) conformational space search method. The first factor is essentially a molecular mechanics problem, mainly to be able to calculate the energy value corresponding to each protein structure. The second factor is essentially a global optimization problem. By choosing an appropriate optimization method, the conformation space is quickly searched to obtain the conformation corresponding to a certain global minimum energy. Among them, protein conformation space optimization belongs to a class of very difficult NP-Hard problems. Population evolution algorithm is an important method to study protein molecular conformation optimization, mainly including differential evolution algorithm (DE), genetic algorithm (GA), particle swarm algorithm (PSO), these algorithms are not only simple in structure, easy to implement, but also strong in robustness , therefore, is often used for global minimum energy conformation search in ab initio prediction methods. However, the population optimization algorithm belongs to a class of stochastic optimization algorithms. The existing literature on protein conformation optimization mainly studies how to jump from one local minimum solution to another local minimum solution, and does not provide a mechanism to effectively use the intelligent information of the population evolution process to guide the search. , resulting in low algorithm efficiency. In addition, affected by selection pressure and genetic drift during random sampling, all individuals in the population will inevitably converge to a certain absorbing state. For optimization problems such as protein conformation, the absorption state is not necessarily the global optimal solution, which affects the prediction accuracy.
因此,现有的基于群体的蛋白质结构预测方法在搜索效率和预测精度方面存在着缺陷,需要改进。Therefore, the existing population-based protein structure prediction methods have shortcomings in search efficiency and prediction accuracy, which need to be improved.
发明内容Contents of the invention
为了克服现有的蛋白质结构预测方法在搜索效率和预测精度方面的不足,本发明通过提取能量较低的构象信息,设计质心变异策略,同时基于片段组装技术,提出一种搜索效率高、预测精度高的基于质心变异策略的差分进化蛋白质结构预测方法。In order to overcome the shortcomings of existing protein structure prediction methods in terms of search efficiency and prediction accuracy, the present invention designs a centroid mutation strategy by extracting low-energy conformation information, and at the same time, based on fragment assembly technology, proposes a method with high search efficiency and high prediction accuracy. Gao's method for protein structure prediction based on differential evolution based on centroid mutation strategy.
本发明解决其技术问题所采用的技术方案是:The technical solution adopted by the present invention to solve its technical problems is:
一种基于质心变异策略的差分进化蛋白质结构预测方法,所述优化方法包括以下步骤:A differential evolution protein structure prediction method based on centroid mutation strategy, the optimization method includes the following steps:
1)选取蛋白质力场模型,即能量函数E(X);1) Select the protein force field model, that is, the energy function E(X);
2)给定输入序列信息;2) given input sequence information;
3)初始化:设置种群大小NP,交叉因子CR,最大迭代次数,由输入序列产生初始构象种群并初始化迭代次数G=0,其中,N表示维数,表示第i个构象Ci的第N维元素;3) Initialization: Set the population size NP, cross factor CR, maximum number of iterations, and generate the initial conformation population from the input sequence And initialize the number of iterations G=0, where N represents the number of dimensions, represents the N-th dimension element of the i-th conformation Ci ;
4)计算当前种群各构象的能量函数值E(Ci),i=1,2,…,N,并根据当前种群中各构象能量值对各构象进行升序排列;4) Calculate the energy function value E(C i ) of each conformation of the current population, i=1, 2, ..., N, and arrange the conformations in ascending order according to the energy value of each conformation in the current population;
5)找出当前种群中能量最低的构象Cbest,并计算其他构象的能量与Cbest的能量E(Cbest)的平均能量误差如果迭代次数G=0,则令δmax=δ;5) Find the conformation C best with the lowest energy in the current population, and calculate the average energy error between the energy of other conformations and the energy E(C best ) of C best If the number of iterations G=0, let δ max =δ;
6)针对种群中的每个构象个体Ci,i∈{1,2,3,…,NP},令Ctarget=Ci,Ctarget表示目标构象个体,提取当前种群中能量较低的构象信息,执行以下操作生成变异构象Cmutant:6) For each conformation individual C i in the population, i∈{1,2,3,…,NP}, let C target =C i , C target represents the target conformation individual, and extract the conformation with lower energy in the current population information, perform the following operations to generate a variant conformation C mutant :
6.1)选取排名前CT个构象其中CT=rand(NP/3,NP/2),rand(NP/3,NP/2)表示NP/3和NP/2之间的随机整数,表示第m个选取构象的第N维元素;6.1) Select the top CT conformations Where CT=rand(NP/3,NP/2), rand(NP/3,NP/2) represents a random integer between NP/3 and NP/2, Represents the Nth dimensional element of the mth selected conformation;
6.2)计算所选取的CT个构象的质心构象Ccentroid=(xcentroid,1,xcentroid,2,…,xcentroid,N),其中,构象Ccentroid的第j维元素j=1,2,…,N;6.2) Calculate the centroid conformation C centroid of the selected CT conformations = (x centroid,1 ,x centroid,2 ,...,x centroid,N ), where the j-th dimension element of the conformation C centroid j=1,2,...,N;
6.3)设置序列长度L,在1和L之间随机生成4个整数randint1、randint2、randint3和randint4,其中randint1和randint2,randint3和randint4互不相同,令a=min(randint1,randint2),b=max(randint1,randint2),c=min(randint3,randint4),d=max(randint3,randint4),其中min表示取两个数的最小值,max表示取两个数的最大值;6.3) Set the sequence length L, randomly generate 4 integers randint1, randint2, randint3 and randint4 between 1 and L, where randint1 and randint2, randint3 and randint4 are different from each other, let a=min(randint1, randint2), b= max(randint1, randint2), c=min(randint3, randint4), d=max(randint3, randint4), where min means taking the minimum value of two numbers, and max means taking the maximum value of two numbers;
6.4)如果δ>0.5δmax,则设计DE/rand-to-centroid/策略进行变异:从当前种群中随机选取两个不同的构象Crand1和Crand2,其中rand1≠rand2∈[1,NP],提取质心构象Ccentroid位置a到位置b的片段的氨基酸所对应的二面角替换构象Crand1的相同位置所对应的二面角,同时提取构象Crand2位置c到位置d的片段的氨基酸所对应的二面角替换构象Crand1相同位置所对应的二面角,然后将所得Crand1进行片段组装得到变异构象个体Cmutant;6.4) If δ>0.5δ max , design DE/rand-to-centroid/strategy for mutation: randomly select two different conformations C rand1 and C rand2 from the current population, where rand1≠rand2∈[1,NP] , extract the dihedral angle corresponding to the amino acid of the fragment from position a to position b of centroid conformation C centroid to replace the dihedral angle corresponding to the same position of conformation C rand1 , and extract the amino acid of the fragment from position c to position d of conformation C rand2 The corresponding dihedral angle replaces the dihedral angle corresponding to the same position of the conformation C rand1 , and then the resulting C rand1 is fragment assembled to obtain a variant conformation individual C mutant ;
6.5)如果δ≤0.5δmax,则设计DE/centroid/2策略进行变异:从当前种群中随机选取两个不同的构象Crand1和Crand2,其中rand1≠rand2∈[1,NP],提取构象Crand1位置a到位置b的片段的氨基酸所对应的二面角替换质心构象Ccentroid的相同位置所对应的二面角,同时使用Crand2上位置c到位置d的片段的氨基酸所对应的二面角替换质心构象Ccentroid相同位置所对应的二面角,然后将所得Ccentroid进行片段组装得到变异构象个体Cmutant;;6.5) If δ≤0.5δ max , design a DE/centroid/2 strategy for mutation: randomly select two different conformations C rand1 and C rand2 from the current population, where rand1≠rand2∈[1,NP], extract the conformation The dihedral angle corresponding to the amino acid of the fragment from position a to position b of C rand1 replaces the dihedral angle corresponding to the same position of the centroid conformation C centroid , and uses the dihedral angle corresponding to the amino acid of the fragment from position c to position d on C rand2 The face angle is replaced by the dihedral angle corresponding to the same position of the centroid conformation C centroid , and then the resulting C centroid is fragment assembled to obtain a variant conformation individual C mutant ;
7)对变异构象Cmutant执行交叉操作生成测试构象Ctrial:7) Perform a crossover operation on the variant conformation C mutant to generate a test conformation C trial :
7.1)在0和1之间随机生成小数rand3;7.1) Randomly generate a decimal number rand3 between 0 and 1;
7.2)若rand3≤CR,则在1和L之间随机生成整数rand4,利用变异构象Cmutant中的片段rand4替换目标构象Ctarget中对应的片段,从而生成测试构象Ctrial,若rand3>CR,则Ctrial直接等于变异构象Cmutant;7.2) If rand3≤CR, then randomly generate an integer rand4 between 1 and L, use the segment rand4 in the variant conformation C mutant to replace the corresponding segment in the target conformation C target , thereby generating a test conformation C trial , if rand3>CR, Then C trial is directly equal to the variant conformation C mutant ;
8)计算测试构象Ctrial的能量值E(Ctrial),如果E(Ctrial)-E(Ctarget)<0,表明测试构象优于目标构象,则测试构象Ctrial替换目标构象Ctarget;8) Calculate the energy value E(C trial ) of the test conformation C trial , if E(C trial )-E(C target )<0, it indicates that the test conformation is better than the target conformation, then the test conformation C trial replaces the target conformation C target ;
9)判断是否满足终止条件,若满足则输出结果并退出,否则返回步骤4)。9) Judging whether the termination condition is satisfied, if so, output the result and exit, otherwise return to step 4).
进一步,所述步骤9)中,对种群中的每个构象个体都执行完步骤6)-8)以后,迭代次数G=G+1,终止条件为迭代次数G达到步骤3)中预设的最大迭代次数。Further, in step 9), after performing steps 6)-8) for each conformation individual in the population, the number of iterations G=G+1, the termination condition is that the number of iterations G reaches the preset value in step 3) The maximum number of iterations.
本发明的技术构思为:首先,根据各构象的能量值进行升序排列,并计算各构象与能量最低构象的平均能量误差值;然后,选取部分能量较低的构象计算质心构象;最后,根据平均能量误差值判断算法所达到的搜索状态,从而设计不同的质心变异策略生成测试构象,即如果平均能量误差值大于设定的阈值,则设计DE/rand-to-centroid/1策略进行变异,通过提取质心构象中的部分片段替换随机选取的构象中的对应片段生成测试构象,否则设计DE/centroid/2策略进行变异,通过提取随机选择的构象中的片段替换质心构象中的对应片段生成测试构象,从而提高算法搜索效率和预测精度。The technical idea of the present invention is as follows: firstly, arrange in ascending order according to the energy values of each conformation, and calculate the average energy error value between each conformation and the lowest energy conformation; then, select some conformations with lower energy to calculate the centroid conformation; finally, according to the average The energy error value judges the search state achieved by the algorithm, so as to design different centroid mutation strategies to generate the test conformation, that is, if the average energy error value is greater than the set threshold, design the DE/rand-to-centroid/1 strategy to mutate, through Extract some fragments in the centroid conformation to replace the corresponding fragments in the randomly selected conformation to generate a test conformation, otherwise design a DE/centroid/2 strategy to mutate, and generate a test conformation by extracting fragments in the randomly selected conformation to replace the corresponding fragments in the centroid conformation , so as to improve the algorithm search efficiency and prediction accuracy.
本发明的有益效果表现在:根据能量较低的构象计算质心构象,并通过提取质心构象的进化信息设计质心变异策略生成测试构象,从而提高预测精度;其次,根据平均能量误差值判断算法所达到的搜索状态,从而设计适合对应状态的质心变异策略生成测试构象,达到提高算法搜索效率的效果。The beneficial effects of the present invention are as follows: the centroid conformation is calculated according to the conformation with lower energy, and the centroid mutation strategy is designed to generate the test conformation by extracting the evolution information of the centroid conformation, thereby improving the prediction accuracy; secondly, according to the average energy error value, the judgment algorithm achieves The search state, so as to design a centroid mutation strategy suitable for the corresponding state to generate a test conformation, so as to improve the search efficiency of the algorithm.
附图说明Description of drawings
图1是本发明中蛋白质结构预测方法的流程图。Fig. 1 is a flowchart of the protein structure prediction method in the present invention.
图2是本发明中的预测方法对蛋白质4ICB预测时的构象更新示意图。Fig. 2 is a schematic diagram of conformation update of protein 4ICB predicted by the prediction method in the present invention.
图3是本发明中的预测方法对蛋白质4ICB预测时得到的构象分布图。Fig. 3 is a conformational distribution diagram obtained when the prediction method of the present invention predicts the protein 4ICB.
图4是本发明中的预测方法对蛋白质4ICB预测得到的三维结构。Fig. 4 is the three-dimensional structure predicted by the prediction method of the present invention for protein 4ICB.
具体实施方式Detailed ways
下面结合附图对本发明作进一步描述。The present invention will be further described below in conjunction with the accompanying drawings.
参照图1和图4,一种基于质心变异策略差分进化蛋白质结构预测方法,包括以下步骤:Referring to Figure 1 and Figure 4, a protein structure prediction method based on centroid mutation strategy differential evolution, including the following steps:
1)选取蛋白质力场模型,即能量函数E(X);1) Select the protein force field model, that is, the energy function E(X);
2)给定输入序列信息;2) given input sequence information;
3)初始化:设置种群大小NP,交叉因子CR,最大迭代次数,由输入序列产生初始构象种群并初始化迭代次数G=0,其中,N表示维数,表示第i个构象Ci的第N维元素;3) Initialization: Set the population size NP, cross factor CR, maximum number of iterations, and generate the initial conformation population from the input sequence And initialize the number of iterations G=0, where N represents the number of dimensions, represents the N-th dimension element of the i-th conformation Ci ;
4)计算当前种群各构象的能量函数值E(Ci),i=1,2,…,N,并根据当前种群中各构象能量值对各构象进行升序排列;4) Calculate the energy function value E(C i ) of each conformation of the current population, i=1, 2, ..., N, and arrange the conformations in ascending order according to the energy value of each conformation in the current population;
5)记当前种群中能量最低的构象Cbest,并计算其他构象的能量与Cbest的能量E(Cbest)的平均能量误差如果迭代次数G=0,则令δmax=δ;5) Record the conformation C best with the lowest energy in the current population, and calculate the average energy error between the energy of other conformations and the energy E(C best ) of C best If the number of iterations G=0, let δ max =δ;
6)针对种群中的每个构象个体Ci,i∈{1,2,3,…,NP},令Ctarget=Ci,Ctarget表示目标构象个体,提取当前种群中能量较低的构象信息,执行以下操作生成变异构象Cmutant:6) For each conformation individual C i in the population, i∈{1,2,3,…,NP}, let C target =C i , C target represents the target conformation individual, and extract the conformation with lower energy in the current population information, perform the following operations to generate a variant conformation C mutant :
6.1)选取排名前CT个构象其中CT=rand(NP/3,NP/2),rand(NP/3,NP/2)表示NP/3和NP/2之间的随机整数,表示第m个选取构象的第N维元素;6.1) Select the top CT conformations Where CT=rand(NP/3,NP/2), rand(NP/3,NP/2) represents a random integer between NP/3 and NP/2, Represents the Nth dimensional element of the mth selected conformation;
6.2)计算所选取的CT个构象的质心构象Ccentroid=(xcentroid,1,xcentroid,2,…,xcentroid,N),其中,构象Ccentroid的第j维元素j=1,2,…,N;6.2) Calculate the centroid conformation C centroid of the selected CT conformations = (x centroid,1 ,x centroid,2 ,...,x centroid,N ), where the j-th dimension element of the conformation C centroid j=1,2,...,N;
6.3)设置序列长度L,在1和L之间随机生成4个整数randint1、randint2、randint3和randint4,其中randint1和randint2,randint3和randint4互不相同,令a=min(randint1,randint2),b=max(randint1,randint2),c=min(randint3,randint4),d=max(randint3,randint4),其中min表示取两个数的最小值,max表示取两个数的最大值;6.3) Set the sequence length L, randomly generate 4 integers randint1, randint2, randint3 and randint4 between 1 and L, where randint1 and randint2, randint3 and randint4 are different from each other, let a=min(randint1, randint2), b= max(randint1, randint2), c=min(randint3, randint4), d=max(randint3, randint4), where min means taking the minimum value of two numbers, and max means taking the maximum value of two numbers;
6.4)如果δ>0.5δmax,则设计DE/rand-to-centroid/策略进行变异:从当前种群中随机选取两个不同的构象Crand1和Crand2,其中rand1≠rand2∈[1,NP],提取质心构象Ccentroid位置a到位置b的片段的氨基酸所对应的二面角替换构象Crand1的相同位置所对应的二面角,同时提取构象Crand2位置c到位置d的片段的氨基酸所对应的二面角替换构象Crand1相同位置所对应的二面角,然后将所得Crand1进行片段组装得到变异构象个体Cmutant;6.4) If δ>0.5δ max , design DE/rand-to-centroid/strategy for mutation: randomly select two different conformations C rand1 and C rand2 from the current population, where rand1≠rand2∈[1,NP] , extract the dihedral angle corresponding to the amino acid of the fragment from position a to position b of centroid conformation C centroid to replace the dihedral angle corresponding to the same position of conformation C rand1 , and extract the amino acid of the fragment from position c to position d of conformation C rand2 The corresponding dihedral angle replaces the dihedral angle corresponding to the same position of the conformation C rand1 , and then the resulting C rand1 is fragment assembled to obtain a variant conformation individual C mutant ;
6.5)如果δ≤0.5δmax,则设计DE/centroid/2策略进行变异:从当前种群中随机选取两个不同的构象Crand1和Crand2,其中rand1≠rand2∈[1,NP],提取构象Crand1位置a到位置b的片段的氨基酸所对应的二面角替换质心构象Ccentroid的相同位置所对应的二面角,同时使用Crand2上位置c到位置d的片段的氨基酸所对应的二面角替换质心构象Ccentroid相同位置所对应的二面角,然后将所得Ccentroid进行片段组装得到变异构象个体Cmutant;;6.5) If δ≤0.5δ max , design a DE/centroid/2 strategy for mutation: randomly select two different conformations C rand1 and C rand2 from the current population, where rand1≠rand2∈[1,NP], extract the conformation The dihedral angle corresponding to the amino acid of the fragment from position a to position b of C rand1 replaces the dihedral angle corresponding to the same position of the centroid conformation C centroid , and uses the dihedral angle corresponding to the amino acid of the fragment from position c to position d on C rand2 The face angle is replaced by the dihedral angle corresponding to the same position of the centroid conformation C centroid , and then the resulting C centroid is fragment assembled to obtain a variant conformation individual C mutant ;
7)为了提高种群的多样性,对变异构象Cmutant执行交叉操作生成测试构象Ctrial:7) In order to increase the diversity of the population, perform a crossover operation on the variant conformation C mutant to generate a test conformation C trial :
7.1)在0和1之间随机生成小数rand3;7.1) Randomly generate a decimal number rand3 between 0 and 1;
7.2)若rand3≤CR,则在1和L之间随机生成整数rand4,利用变异构象Cmutant中的片段rand4替换目标构象Ctarget中对应的片段,从而生成测试构象Ctrial,若rand3>CR,则Ctrial直接等于变异构象Cmutant;7.2) If rand3≤CR, then randomly generate an integer rand4 between 1 and L, use the segment rand4 in the variant conformation C mutant to replace the corresponding segment in the target conformation C target , thereby generating a test conformation C trial , if rand3>CR, Then C trial is directly equal to the variant conformation C mutant ;
8)计算测试构象Ctrial的能量值E(Ctrial),如果E(Ctrial)-E(Ctarget)<0,表明测试构象优于目标构象,则测试构象Ctrial替换目标构象Ctarget;8) Calculate the energy value E(C trial ) of the test conformation C trial , if E(C trial )-E(C target )<0, it indicates that the test conformation is better than the target conformation, then the test conformation C trial replaces the target conformation C target ;
9)判断是否满足终止条件,若满足则输出结果并退出,否则返回步骤4)。9) Judging whether the termination condition is satisfied, if so, output the result and exit, otherwise return to step 4).
所述步骤9)中,对种群中的每个构象个体都执行完步骤6)-8)以后,迭代次数G=G+1,终止条件为迭代次数G达到步骤3)中预设的最大迭代次数In said step 9), after performing steps 6)-8) for each conformation individual in the population, the number of iterations G=G+1, the termination condition is that the number of iterations G reaches the preset maximum iteration in step 3) frequency
本实施例序列长度为76的α折叠蛋白质4ICB为实施例,一种基于质心变异策略的差分进化蛋白质结构预测方法,其中包含以下步骤:In this example, the α-fold protein 4ICB with a sequence length of 76 is an example, a differential evolution protein structure prediction method based on the centroid mutation strategy, which includes the following steps:
1)选取Rosetta score3力场模型,即能量函数E(X);1) Select the Rosetta score3 force field model, namely the energy function E(X);
2)输入蛋白质4ICB的序列信息;2) input the sequence information of protein 4ICB;
3)初始化:设置种群大小NP=50,交叉因子CR=0.5,最大迭代次数为10000,由输入序列产生初始构象种群 并初始化迭代次数G=0,其中,N表示维数,表示第i个构象Ci的第N维元素;3) Initialization: set population size NP=50, crossover factor CR=0.5, maximum number of iterations is 10000, generate initial conformation population from input sequence And initialize the number of iterations G=0, where N represents the number of dimensions, represents the N-th dimension element of the i-th conformation Ci ;
4)计算当前种群各构象的能量函数值E(Ci),i=1,2,…,N,并根据当前种群中各构象能量值对各构象进行升序排列;4) Calculate the energy function value E(C i ) of each conformation of the current population, i=1, 2, ..., N, and arrange the conformations in ascending order according to the energy value of each conformation in the current population;
5)记当前种群中能量最低的构象Cbest,并计算其他构象的能量与Cbest的能量E(Cbest)的平均能量误差如果迭代次数G=0,则令δmax=δ;5) Record the conformation C best with the lowest energy in the current population, and calculate the average energy error between the energy of other conformations and the energy E(C best ) of C best If the number of iterations G=0, let δ max =δ;
6)针对种群中的每个构象个体Ci,i∈{1,2,3,…,NP},令Ctarget=Ci,Ctarget表示目标构象个体,提取当前种群中能量较低的构象信息,执行以下操作生成变异构象Cmutant:6) For each conformation individual C i in the population, i∈{1,2,3,…,NP}, let C target =C i , C target represents the target conformation individual, and extract the conformation with lower energy in the current population information, perform the following operations to generate a variant conformation C mutant :
6.1)选取排名前CT个构象其中CT=rand(NP/3,NP/2),rand(NP/3,NP/2)表示NP/3和NP/2之间的随机整数,表示第m个选取构象的第N维元素;6.1) Select the top CT conformations Where CT=rand(NP/3,NP/2), rand(NP/3,NP/2) represents a random integer between NP/3 and NP/2, Represents the Nth dimensional element of the mth selected conformation;
6.2)计算所选取的CT个构象的质心构象Ccentroid=(xcentroid,1,xcentroid,2,…,xcentroid,N),其中,构象Ccentroid的第j维元素j=1,2,…,N;6.2) Calculate the centroid conformation C centroid of the selected CT conformations = (x centroid,1 ,x centroid,2 ,...,x centroid,N ), where the j-th dimension element of the conformation C centroid j=1,2,...,N;
6.3)设置序列长度L=76,在1和L之间随机生成4个整数randint1、randint2、randint3和randint4,其中randint1和randint2,randint3和randint4互不相同,令a=min(randint1,randint2),b=max(randint1,randint2),c=min(randint3,randint4),d=max(randint3,randint4),其中min表示取两个数的最小值,max表示取两个数的最大值;6.3) Set sequence length L=76, randomly generate 4 integers randint1, randint2, randint3 and randint4 between 1 and L, wherein randint1 and randint2, randint3 and randint4 are different from each other, let a=min(randint1, randint2), b=max(randint1, randint2), c=min(randint3, randint4), d=max(randint3, randint4), where min means taking the minimum value of two numbers, and max means taking the maximum value of two numbers;
6.4)如果δ>0.5δmax,则设计DE/rand-to-centroid/策略进行变异:从当前种群中随机选取两个不同的构象Crand1和Crand2,其中rand1≠rand2∈[1,NP],提取质心构象Ccentroid位置a到位置b的片段的氨基酸所对应的二面角替换构象Crand1的相同位置所对应的二面角,同时提取构象Crand2位置c到位置d的片段的氨基酸所对应的二面角替换构象Crand1相同位置所对应的二面角,然后将所得Crand1进行片段组装得到变异构象个体Cmutant;6.4) If δ>0.5δ max , design DE/rand-to-centroid/strategy for mutation: randomly select two different conformations C rand1 and C rand2 from the current population, where rand1≠rand2∈[1,NP] , extract the dihedral angle corresponding to the amino acid of the fragment from position a to position b of centroid conformation C centroid to replace the dihedral angle corresponding to the same position of conformation C rand1 , and extract the amino acid of the fragment from position c to position d of conformation C rand2 The corresponding dihedral angle replaces the dihedral angle corresponding to the same position of the conformation C rand1 , and then the resulting C rand1 is fragment assembled to obtain a variant conformation individual C mutant ;
6.5)如果δ≤0.5δmax,则设计DE/centroid/2策略进行变异:从当前种群中随机选取两个不同的构象Crand1和Crand2,其中rand1≠rand2∈[1,NP],提取构象Crand1位置a到位置b的片段的氨基酸所对应的二面角替换质心构象Ccentroid的相同位置所对应的二面角,同时使用Crand2上位置c到位置d的片段的氨基酸所对应的二面角替换质心构象Ccentroid相同位置所对应的二面角,然后将所得Ccentroid进行片段组装得到变异构象个体Cmutant;;6.5) If δ≤0.5δ max , design a DE/centroid/2 strategy for mutation: randomly select two different conformations C rand1 and C rand2 from the current population, where rand1≠rand2∈[1,NP], extract the conformation The dihedral angle corresponding to the amino acid of the fragment from position a to position b of C rand1 replaces the dihedral angle corresponding to the same position of the centroid conformation C centroid , and uses the dihedral angle corresponding to the amino acid of the fragment from position c to position d on C rand2 The face angle is replaced by the dihedral angle corresponding to the same position of the centroid conformation C centroid , and then the resulting C centroid is fragment assembled to obtain a variant conformation individual C mutant ;
7)为了提高种群的多样性,对变异构象Cmutant执行交叉操作生成测试构象Ctrial:7) In order to increase the diversity of the population, perform a crossover operation on the variant conformation C mutant to generate a test conformation C trial :
7.1)在0和1之间随机生成小数rand3;7.1) Randomly generate a decimal number rand3 between 0 and 1;
7.2)若rand3≤CR,则在1和L之间随机生成整数rand4,利用变异构象Cmutant中的片段rand4替换目标构象Ctarget中对应的片段,从而生成测试构象Ctrial,若rand3>CR,则Ctrial直接等于变异构象Cmutant;7.2) If rand3≤CR, then randomly generate an integer rand4 between 1 and L, use the segment rand4 in the variant conformation C mutant to replace the corresponding segment in the target conformation C target , thereby generating a test conformation C trial , if rand3>CR, Then C trial is directly equal to the variant conformation C mutant ;
8)计算测试构象Ctrial的能量值E(Ctrial),如果E(Ctrial)-E(Ctarget)<0,表明测试构象优于目标构象,则测试构象Ctrial替换目标构象Ctarget;8) Calculate the energy value E(C trial ) of the test conformation C trial , if E(C trial )-E(C target )<0, it indicates that the test conformation is better than the target conformation, then the test conformation C trial replaces the target conformation C target ;
9)对种群中的每个构象个体都执行完步骤6)-8)以后,迭代次数G=G+1,若迭代次数G达到最大迭代次数10000,则输出结果并退出,否则返回步骤4)。9) After executing steps 6)-8) for each conformation individual in the population, the number of iterations G=G+1, if the number of iterations G reaches the maximum number of iterations 10000, output the result and exit, otherwise return to step 4) .
以序列长度为76的α折叠蛋白质4ICB为实施例,运用以上方法得到了该蛋白质的近天然态构象,最小均方根偏差为平均均方根偏差为预测得到的三维结构如图4所示。Taking the α-fold protein 4ICB with a sequence length of 76 as an example, the near-native conformation of the protein was obtained by using the above method, and the minimum root mean square deviation is The average root mean square deviation is The predicted three-dimensional structure is shown in Fig. 4.
以上阐述的是本发明给出的一个实施例表现出来的优良优化效果,显然本发明不仅适合上述实施例,而且可以应用到实际工程中的各个领域,同时在不偏离本发明基本精神及不超出本发明实质内容所涉及内容的前提下可对其做种种变化加以实施。What has been set forth above is the excellent optimization effect shown by an embodiment of the present invention. Obviously, the present invention is not only suitable for the above-mentioned embodiment, but also can be applied to various fields in actual engineering, while not departing from the basic spirit of the present invention and not exceeding Under the premise of the content involved in the essence of the present invention, various changes can be made to it and implemented.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610390675.0A CN106096326B (en) | 2016-06-02 | 2016-06-02 | A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610390675.0A CN106096326B (en) | 2016-06-02 | 2016-06-02 | A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106096326A CN106096326A (en) | 2016-11-09 |
CN106096326B true CN106096326B (en) | 2018-09-07 |
Family
ID=57447580
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610390675.0A Active CN106096326B (en) | 2016-06-02 | 2016-06-02 | A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106096326B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066834B (en) * | 2017-03-23 | 2019-05-31 | 王晨彤 | A kind of protein structure ab initio prediction method based on particle swarm optimization algorithm |
CN109509510B (en) * | 2018-07-12 | 2021-06-18 | 浙江工业大学 | A protein structure prediction method based on multiple ensemble mutation strategies |
CN109411013B (en) * | 2018-08-29 | 2020-10-30 | 浙江工业大学 | A population protein structure prediction method based on an individual-specific mutation strategy |
CN109360601B (en) * | 2018-08-29 | 2021-05-18 | 浙江工业大学 | A Multimodal Protein Structure Prediction Method Based on Extrusion Strategy |
CN110634531B (en) * | 2019-08-13 | 2021-06-18 | 浙江工业大学 | Protein structure prediction method based on double-layer bias search |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778182A (en) * | 2013-12-12 | 2014-05-07 | 浙江工业大学 | Method for rapidly judging graph similarity |
CN104200130A (en) * | 2014-07-23 | 2014-12-10 | 浙江工业大学 | Protein structure prediction method based on tree structure replica exchange and fragment assembly |
-
2016
- 2016-06-02 CN CN201610390675.0A patent/CN106096326B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778182A (en) * | 2013-12-12 | 2014-05-07 | 浙江工业大学 | Method for rapidly judging graph similarity |
CN104200130A (en) * | 2014-07-23 | 2014-12-10 | 浙江工业大学 | Protein structure prediction method based on tree structure replica exchange and fragment assembly |
Non-Patent Citations (2)
Title |
---|
Two new approach incorporating centroid based mutation operators for Differential Evolution;Musrrat Ali 等;《World Journal of Modelling and Simulation》;20111231;第7卷(第1期);第16-28页 * |
一种基于片段组装的蛋白质构象空间优化算法;郝小虎 等;《计算机科学》;20150331;第42卷(第3期);第237-240页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106096326A (en) | 2016-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106096326B (en) | A kind of differential evolution Advances in protein structure prediction based on barycenter Mutation Strategy | |
CN106503484B (en) | A kind of multistage differential evolution Advances in protein structure prediction based on abstract convex estimation | |
CN105808973B (en) | One kind is based on interim shifty group's conformational space method of sampling | |
CN103077226B (en) | A kind of multi-modal protein conformation space search method | |
CN107633157A (en) | A kind of protein conformation space optimization method based on distribution estimation and copy exchanging policy | |
Yu et al. | An efficient algorithm for discovering motifs in large DNA data sets | |
CN108846256A (en) | A kind of group's Advances in protein structure prediction based on contact residues information | |
Ng et al. | Acceleration of short read alignment with runtime reconfiguration | |
CN106503486B (en) | A differentially evolved protein structure de novo prediction method based on a multi-stage subgroup co-evolution strategy | |
CN104200131B (en) | A kind of protein conformation space optimization method based on fragment assembling | |
CN109872770B (en) | A Multiple Variation Strategy Protein Structure Prediction Method Combined with Crowding Degree Evaluation | |
Hao et al. | Conformational space sampling method using multi-subpopulation differential evolution for de novo protein structure prediction | |
CN110706739B (en) | A protein conformation space sampling method based on multimodal internal and external crossover | |
CN109390035B (en) | Protein conformation space optimization method based on local structure comparison | |
CN109243526B (en) | A protein structure prediction method based on the intersection of specific fragments | |
CN109584954B (en) | Protein conformation space optimization method based on multi-population joint search | |
CN107145764B (en) | A kind of protein conformation space search method of dual distribution estimation guidance | |
CN109411013B (en) | A population protein structure prediction method based on an individual-specific mutation strategy | |
CN109300505B (en) | A protein structure prediction method based on biased sampling | |
CN109243524B (en) | A multi-level individual screening evolutionary protein structure prediction method | |
Haritha et al. | A Comprehensive Review on Protein Sequence Analysis Techniques | |
Yang et al. | A comprehensive review of predicting method of RNA tertiary structure | |
CN109448786A (en) | A kind of Lower Bound Estimation dynamic strategy Advances in protein structure prediction | |
Khaladkar et al. | Detecting conserved secondary structures in RNA molecules using constrained structural alignment | |
Tremblay-Savard et al. | Reconstruction of ancestral RNA sequences under multiple structural constraints |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |