CN107609342A - A kind of protein conformation searching method based on the constraint of secondary structure space length - Google Patents

A kind of protein conformation searching method based on the constraint of secondary structure space length Download PDF

Info

Publication number
CN107609342A
CN107609342A CN201710683896.1A CN201710683896A CN107609342A CN 107609342 A CN107609342 A CN 107609342A CN 201710683896 A CN201710683896 A CN 201710683896A CN 107609342 A CN107609342 A CN 107609342A
Authority
CN
China
Prior art keywords
individual
secondary structure
length
space
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710683896.1A
Other languages
Chinese (zh)
Other versions
CN107609342B (en
Inventor
张贵军
王小奇
马来发
周晓根
谢腾宇
王柳静
孙科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201710683896.1A priority Critical patent/CN107609342B/en
Publication of CN107609342A publication Critical patent/CN107609342A/en
Application granted granted Critical
Publication of CN107609342B publication Critical patent/CN107609342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于二级结构空间距离约束的蛋白质构象搜索方法,在遗传算法的基本框架下,利用目标蛋白中每个二级结构的空间长度以及相邻两个二级结构中心残基间的空间距离信息构成特征向量作为空间限制条件,使得在给定能量函数的条件下,在一个较小的构象空间中搜索解空间,同时在选择算子中加入了空间距离信息,弥补了能量函数的不精确性,进而有效提高了结构建模的精确度。本发明提出一种采样效率较高、预测精度较高、计算代价低的基于二级结构空间距离约束的蛋白质构象搜索方法。

A protein conformation search method based on the spatial distance constraints of the secondary structure. Under the basic framework of the genetic algorithm, the spatial length of each secondary structure in the target protein and the spatial distance between the central residues of two adjacent secondary structures are used. The information constitutes the eigenvector as a space constraint condition, so that under the condition of a given energy function, the solution space is searched in a smaller conformation space, and the spatial distance information is added to the selection operator to make up for the inaccuracy of the energy function Therefore, the accuracy of structural modeling is effectively improved. The invention proposes a protein conformation search method based on secondary structure space distance constraints with high sampling efficiency, high prediction accuracy and low calculation cost.

Description

一种基于二级结构空间距离约束的蛋白质构象搜索方法A Protein Conformation Search Method Based on Space Distance Constraints of Secondary Structure

技术领域technical field

本发明涉及一种生物学信息学、人工智能优化、计算机应用领域,尤其涉及的是一种基于二级结构空间距离约束的蛋白质构象搜索方法。The invention relates to the fields of biological informatics, artificial intelligence optimization and computer application, and in particular to a protein conformation search method based on the spatial distance constraints of secondary structures.

背景技术Background technique

蛋白质是由氨基酸脱水缩合形成的生物大分子,对人类的健康起着决定性作用,准确掌握蛋白质的结构和功能对疾病研究、生物制药等方面都有重要意义。目前蛋白质结构预测的方法主要有两种:实验方法和理论预测。实验方法包括X射线晶体学、核磁共振光谱、和电子显微镜等;虽然这些方法能够准确地测定某些蛋白质的三维结构,但是通过实验的方法来测定结构是耗时且昂贵的,同时有些蛋白质的结构通过实验方法根本无法获得。所以,利用计算的方法来预测蛋白质结构已成为生物信息学研究中的热点。理论预测方法主要利用计算机技术和智能优化算法从氨基酸一级序列来预测蛋白质三维结构,从而有效的节约了预测成本,减少了预测时间,因此这类方法相比于实验方法更能得到广泛应用。但由于蛋白质结构本身的复杂性,到目前为止蛋白质三维结构的预测问题仍是一个有待解决的难题。Protein is a biomacromolecule formed by the dehydration condensation of amino acids, which plays a decisive role in human health. Accurately grasping the structure and function of protein is of great significance to disease research and biopharmaceuticals. Currently, there are two main methods for protein structure prediction: experimental methods and theoretical predictions. Experimental methods include X-ray crystallography, nuclear magnetic resonance spectroscopy, and electron microscopy; although these methods can accurately determine the three-dimensional structure of some proteins, it is time-consuming and expensive to determine the structure through experimental methods, and some proteins The structure is simply not accessible by experimental methods. Therefore, using computational methods to predict protein structures has become a hot spot in bioinformatics research. Theoretical prediction methods mainly use computer technology and intelligent optimization algorithms to predict the three-dimensional structure of proteins from the primary sequence of amino acids, thereby effectively saving the cost of prediction and reducing the time of prediction. Therefore, such methods are more widely used than experimental methods. However, due to the complexity of the protein structure itself, the prediction of the three-dimensional protein structure is still a difficult problem to be solved so far.

在从头预测蛋白质结构的方法中,进化算法是研究蛋白质分子构象优化的重要方法,例如遗传算法、差分进化等算法,这些算法拥有收敛速度快、结构简单以及鲁棒性强等优点。然而,当蛋白质序列比较长时,因构象空间太大,如果按照特定的能量函数来搜索,由于能量函数的不精确性,并不能保证所找到的能量最小的构象最接近天然态结构,因此往往不能形成正确的折叠。In the method of predicting protein structure from scratch, evolutionary algorithm is an important method for studying protein molecular conformation optimization, such as genetic algorithm, differential evolution and other algorithms. These algorithms have the advantages of fast convergence speed, simple structure and strong robustness. However, when the protein sequence is relatively long, because the conformation space is too large, if you search according to a specific energy function, due to the inaccuracy of the energy function, it cannot guarantee that the conformation with the lowest energy found is the closest to the natural state structure, so often The correct fold cannot be formed.

因此,现有的构象空间搜索方法在预测精度和采样效率方面存在着缺陷,需要改进。Therefore, the existing conformational space search methods are deficient in prediction accuracy and sampling efficiency and need to be improved.

发明内容Contents of the invention

为了克服现有的蛋白质结构预测构象空间搜索方法存在采样效率较低、预测精度较低的不足,本发明提出一种采样效率较高、预测精度较高的基于二级结构空间距离约束的蛋白质构象搜索方法。In order to overcome the shortcomings of low sampling efficiency and low prediction accuracy in existing protein structure prediction conformation space search methods, the present invention proposes a protein conformation based on secondary structure space distance constraints with high sampling efficiency and high prediction accuracy Search method.

本发明解决其技术问题所采用的技术方案是:The technical solution adopted by the present invention to solve its technical problems is:

一种基于二级结构空间距离约束的蛋白质构象搜索方法,所述方法包括以下步骤:A protein conformation search method based on secondary structure space distance constraints, said method comprising the following steps:

1)给定输入序列信息;1) given input sequence information;

2)参数初始化:设置种群规模NP,最大遗传代数Gmax,确定交叉概率Pc,初始种群迭代次数iteration,交叉片段长度frag_length,组装计数器reject_number,最大组装次数reject_max,先验知识中二级结构的空间长度以及相邻两个二级结构中心残基间的空间距离构成的特征向量D={d1,…,dm,d1,2,…,dk,k+1},其中dm是目标蛋白的第m个二级结构块的长度,dk,k+1是第k个二级结构块和第k+1个二级结构中心残基的空间距离,最大距离约束范围δ,选择概率Ps2) Parameter initialization: set the population size NP, the maximum genetic algebra G max , determine the crossover probability P c , the initial population iteration number iteration, the crossover fragment length frag_length, the assembly counter reject_number, the maximum assembly number reject_max, and the secondary structure in prior knowledge The eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } composed of the spatial length and the spatial distance between the central residues of two adjacent secondary structures, where d m is the length of the mth secondary structure block of the target protein, d k,k+1 is the spatial distance between the kth secondary structure block and the k+1th secondary structure central residue, the maximum distance constraint range δ, selection probability P s ;

3)初始化种群:启动NP条Monte Carlo轨迹,每条轨迹搜索iteration次,即生成NP个初始个体;3) Initialize the population: start NP Monte Carlo trajectories, search iteration times for each trajectory, and generate NP initial individuals;

4)对每个目标个体xi和随机选取的个体xj进行如下操作,i,j∈(1,...,NP)且j≠i:4) Perform the following operations on each target individual x i and randomly selected individual x j , i, j∈(1,...,NP) and j≠i:

4.1)按概率Pc对个体xi和xj进行交叉操作,过程如下:4.1) Carry out crossover operation on individuals x i and x j according to probability P c , the process is as follows:

4.1.1)在允许范围[1,total_residue-frag_length]内随机选择交叉起始点begin_position,同时计算出交叉终止点end_position=begin_position+frag_length,其中total_residue为残基总数;4.1.1) Randomly select the intersection start point begin_position within the allowable range [1, total_residue-frag_length], and calculate the intersection end point end_position=begin_position+frag_length at the same time, where total_residue is the total number of residues;

4.1.2)在每个交叉位点position∈[begin_position,end_position]处进行扭转角度交换,生成新个体x′i,x′j,即交叉个体x′i,x′j4.1.2) Perform twist angle exchange at each intersection position ∈ [begin_position, end_position] to generate new individuals x′ i , x′ j , that is, cross individuals x′ i , x′ j ;

4.2)对交叉个体x′i,x′j进行如下变异操作,过程如下:4.2) Carry out the following mutation operation on cross individuals x′ i , x′ j , the process is as follows:

4.2.1)利用片段组装技术对交叉个体x′i进行空间构象搜索,计算出交叉个体x′i片段组装后的二级结构的长度以及相邻两个二级结构中心残基间的空间距离,并构成距离向量其中是交叉个体x′i中第m个二级结构块的长度,是第k个二级结构块中心残基和第k+1个二级结构块中心残基的空间距离;4.2.1) Use the fragment assembly technology to search for the spatial conformation of the crossover individual x'i , and calculate the length of the secondary structure of the crossover individual x'i after fragment assembly and the spatial distance between the central residues of two adjacent secondary structures , and form a distance vector in is the length of the mth secondary structure block in the cross individual x′ i , is the spatial distance between the central residue of the kth secondary structure block and the central residue of the k+1th secondary structure block;

4.2.2)根据公式计算出个体x′i的特征向量与先验知识中的特征向量D={d1,…,dm,d1,2,…,dk,k+1}的Manhattan距离,若similarity_mutation_1≤δ则变异生成的个体x″i满足二级结构空间距离约束,转至步骤4.2.4),否则转至4.2.3);4.2.2) According to the formula Calculate the eigenvector of individual x′ i The Manhattan distance from the eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } in prior knowledge, if similarity_mutation_1≤δ, the individual x″ i generated by mutation satisfies Secondary structure space distance constraint, go to step 4.2.4), otherwise go to 4.2.3);

4.2.3)计数器reject_number开始计数,如果reject_number≤reject_max则依次执行步骤4.2.1)和4.2.2)生成新个体x″i,直到满足similarity_mutation_1≤δ停止;否则执行步骤4.2.1)生成新个体x″i4.2.3) The counter reject_number starts counting, if reject_number≤reject_max, execute steps 4.2.1) and 4.2.2) to generate a new individual x″ i in sequence, until the similarity_mutation_1≤δ is satisfied; otherwise, execute step 4.2.1) to generate a new individual x″ i ;

4.2.4)与步骤4.2.1)和4.2.2)同理对个体x′j进行片段组装并计算相应的Manhattan距离值similarity_mutation_2,最后得到新个体x″j4.2.4) In the same way as in steps 4.2.1) and 4.2.2), perform fragment assembly on the individual x'j and calculate the corresponding Manhattan distance value similarity_mutation_2, and finally obtain the new individual x"j ;

4.2.5)根据公式计算出目标个体xi的距离向量与先验知识中的特征向量D={d1,…,dm,d1,2,…,dk,k+1}的Manhattan距离;4.2.5) According to the formula Calculate the distance vector of the target individual x i The Manhattan distance from the eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } in the prior knowledge;

5)根据目标个体xi和变异个体x″i、x″j的能量和距离相似度进行选择,选出优势个体并更新种群,过程如下:5) Select according to the energy and distance similarity of the target individual x i and the mutant individual x″ i , x″ j , select the dominant individual and update the population, the process is as follows:

5.1)根据Rosetta Score3函数E(xi)分别计算目标个体xi和变异个体x″i、x″j的能量E(xi)、E(x″i)和E(x″j);5.1) According to the Rosetta Score3 function E( xi ), calculate the energy E( xi ), E(x″ i ) and E(x″ j ) of the target individual xi and the mutant individual x″ i , x″ j respectively;

5.2)在目标个体xi和变异个体x″i、x″j中,若某一个体X,X∈{xi,x″i,x″j}的能量值小于其他两个个体的能量值,同时对应的Manhattan距离值也比其他两个个体对应的Manhattan距离值小,则该个体为优势个体;若某一个体X′,X′∈{xi,x″i,x″j}只有能量值比其他两个个体的能量值小,则按选择概率Ps将该个体设为优势个体;同理,若某一个体X″,X″∈{xi,x″i,x″j}只有对应的Manhattan距离值比其他两个个体对应的Manhattan距离值小,则按选择概率Ps将该个体设为优势个体;最后,优势个体替代目标个体,更新种群;5.2) Among the target individual x i and the mutant individual x″ i , x″ j , if the energy value of an individual X, X∈{ xi ,x″ i ,x″ j } is smaller than the energy value of the other two individuals , and the corresponding Manhattan distance value is also smaller than that of the other two individuals, the individual is the dominant individual; if an individual X′,X′∈{ xi ,x″ i ,x″ j } has only If the energy value is smaller than that of the other two individuals, the individual is set as the dominant individual according to the selection probability P s ; similarly, if an individual X″,X″∈{ xi ,x″ i ,x″ j } Only the corresponding Manhattan distance value is smaller than the corresponding Manhattan distance value of the other two individuals, then the individual is set as the dominant individual according to the selection probability P s ; finally, the dominant individual replaces the target individual and the population is updated;

6)判断是否达到最大遗传代数Gmax,若满足终止条件,则输出结果,否则转至步骤4)。6) Judging whether the maximum genetic algebra G max is reached, if the termination condition is met, then output the result, otherwise go to step 4).

本发明的技术构思为:在遗传算法的基本框架下,利用目标蛋白中每个二级结构的空间长度以及相邻两个二级结构中心残基间的空间距离信息构成特征向量作为空间限制条件,使得在给定能量函数的条件下,在一个较小的构象空间中搜索解空间,同时在选择算子中加入了空间距离信息,弥补了能量函数的不精确性,进而有效提高了结构建模的精确度。The technical idea of the present invention is: under the basic framework of the genetic algorithm, use the spatial length of each secondary structure in the target protein and the spatial distance information between the central residues of two adjacent secondary structures to form a feature vector as a spatial constraint , so that under the condition of a given energy function, the solution space is searched in a small conformation space, and at the same time, the spatial distance information is added to the selection operator to make up for the inaccuracy of the energy function, thereby effectively improving the structure construction. The accuracy of the model.

本发明的有益效果表现在:一方面通过二级结构的空间长度以及相邻两个二级结构中心残基间的空间距离构成特征向量作为空间限制条件,降低了构象搜索空间,同时降低了能量函数不精确带来的误差,进而大大提高了预测精度;另一方面,在遗传算法的框架下,通过个体间的信息交互、父代个体的变异选择操作,加快了收敛速度、增加了种群的多样性。The beneficial effect of the present invention is manifested in: on the one hand, the space length of the secondary structure and the space distance between the central residues of two adjacent secondary structures form the eigenvector as the space restriction condition, which reduces the conformational search space and reduces the energy The error caused by the inaccuracy of the function greatly improves the prediction accuracy; on the other hand, under the framework of the genetic algorithm, through the information interaction between individuals and the mutation selection operation of the parent individual, the convergence speed is accelerated and the population density is increased. diversity.

附图说明Description of drawings

图1是基于二级结构空间距离约束的蛋白质构象搜索方法的基本流程图。Figure 1 is a basic flowchart of the protein conformation search method based on the spatial distance constraints of the secondary structure.

图2是基于二级结构空间距离约束的蛋白质构象搜索方法对蛋白质1AIL进行结构预测时的构象更新示意图。Fig. 2 is a schematic diagram of the conformation update of the protein 1AIL when the protein conformation search method based on the space distance constraint of the secondary structure is used for structure prediction.

图3是基于二级结构空间距离约束的蛋白质构象搜索方法对蛋白质1AIL进行结构预测得到的三维结构图。Figure 3 is a three-dimensional structure diagram obtained by predicting the structure of protein 1AIL based on the protein conformation search method based on the spatial distance constraints of the secondary structure.

具体实施方式detailed description

下面结合附图对本发明作进一步描述。The present invention will be further described below in conjunction with the accompanying drawings.

参照图1~图3,一种基于二级结构空间距离约束的蛋白质构象搜索方法,所述方法包括以下步骤:Referring to Figures 1 to 3, a protein conformation search method based on secondary structure space distance constraints, the method includes the following steps:

1)给定输入序列信息;1) given input sequence information;

2)参数初始化:设置种群规模NP,最大遗传代数Gmax,确定交叉概率Pc,初始种群迭代次数iteration,交叉片段长度frag_length,组装计数器reject_number,最大组装次数reject_max,先验知识中二级结构的空间长度以及相邻两个二级结构中心残基间的空间距离构成的特征向量D={d1,…,dm,d1,2,…,dk,k+1},其中dm是目标蛋白的第m个二级结构块的长度,dk,k+1是第k个二级结构块和第k+1个二级结构中心残基的空间距离,最大距离约束范围δ,选择概率Ps2) Parameter initialization: set the population size NP, the maximum genetic algebra G max , determine the crossover probability P c , the initial population iteration number iteration, the crossover fragment length frag_length, the assembly counter reject_number, the maximum assembly number reject_max, and the secondary structure in prior knowledge The eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } composed of the spatial length and the spatial distance between the central residues of two adjacent secondary structures, where d m is the length of the mth secondary structure block of the target protein, d k,k+1 is the spatial distance between the kth secondary structure block and the k+1th secondary structure central residue, the maximum distance constraint range δ, selection probability P s ;

3)初始化种群:启动NP条Monte Carlo轨迹,每条轨迹搜索iteration次,即生成NP个初始个体;3) Initialize the population: start NP Monte Carlo trajectories, search iteration times for each trajectory, and generate NP initial individuals;

4)对每个目标个体xi和随机选取的个体xj进行如下操作,i,j∈(1,...,NP)且j≠i:4) Perform the following operations on each target individual x i and randomly selected individual x j , i, j∈(1,...,NP) and j≠i:

4.1)按概率Pc对个体xi和xj进行交叉操作,过程如下:4.1) Carry out crossover operation on individuals x i and x j according to probability P c , the process is as follows:

4.1.1)在允许范围[1,total_residue-frag_length]内随机选择交叉起始点begin_position,同时计算出交叉终止点end_position=begin_position+frag_length,其中total_residue为残基总数;4.1.1) Randomly select the intersection start point begin_position within the allowable range [1, total_residue-frag_length], and calculate the intersection end point end_position=begin_position+frag_length at the same time, where total_residue is the total number of residues;

4.1.2)在每个交叉位点position∈[begin_position,end_position]处进行扭转角度交换,生成新个体x′i,x′j,即交叉个体x′i,x′j4.1.2) Perform twist angle exchange at each intersection position ∈ [begin_position, end_position] to generate new individuals x′ i , x′ j , that is, cross individuals x′ i , x′ j ;

4.2)对交叉个体x′i,x′j进行如下变异操作,过程如下:4.2) Carry out the following mutation operation on cross individuals x′ i , x′ j , the process is as follows:

4.2.1)利用片段组装技术对交叉个体x′i进行空间构象搜索,计算出交叉个体x′i片段组装后的二级结构的长度以及相邻两个二级结构中心残基间的空间距离,并构成距离向量其中是交叉个体x′i中第m个二级结构块的长度,是第k个二级结构块中心残基和第k+1个二级结构块中心残基的空间距离;4.2.1) Use the fragment assembly technology to search for the spatial conformation of the crossover individual x'i , and calculate the length of the secondary structure of the crossover individual x'i after fragment assembly and the spatial distance between the central residues of two adjacent secondary structures , and form a distance vector in is the length of the mth secondary structure block in the cross individual x′ i , is the spatial distance between the central residue of the kth secondary structure block and the central residue of the k+1th secondary structure block;

4.2.2)根据公式计算出个体x′i的特征向量与先验知识中的特征向量D={d1,…,dm,d1,2,…,dk,k+1}的Manhattan距离,若similarity_mutation_1≤δ则变异生成的个体x″i满足二级结构空间距离约束,转至步骤4.2.4),否则转至4.2.3);4.2.2) According to the formula Calculate the eigenvector of individual x′ i The Manhattan distance from the eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } in prior knowledge, if similarity_mutation_1≤δ, the individual x″ i generated by mutation satisfies Secondary structure space distance constraint, go to step 4.2.4), otherwise go to 4.2.3);

4.2.3)计数器reject_number开始计数,如果reject_number≤reject_max则依次执行步骤4.2.1)和4.2.2)生成新个体x″i,直到满足similarity_mutation_1≤δ停止;否则执行步骤4.2.1)生成新个体x″i4.2.3) The counter reject_number starts counting, if reject_number≤reject_max, execute steps 4.2.1) and 4.2.2) to generate a new individual x″ i in sequence, until the similarity_mutation_1≤δ is satisfied; otherwise, execute step 4.2.1) to generate a new individual x″ i ;

4.2.4)与步骤4.2.1)和4.2.2)同理对个体x′j进行片段组装并计算相应的Manhattan距离值similarity_mutation_2,最后得到新个体x″j4.2.4) In the same way as in steps 4.2.1) and 4.2.2), perform fragment assembly on the individual x'j and calculate the corresponding Manhattan distance value similarity_mutation_2, and finally obtain the new individual x"j ;

4.2.5)根据公式计算出目标个体xi的距离向量与先验知识中的特征向量D={d1,…,dm,d1,2,…,dk,k+1}的Manhattan距离;4.2.5) According to the formula Calculate the distance vector of the target individual x i The Manhattan distance from the eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } in the prior knowledge;

5)根据目标个体xi和变异个体x″i、x″j的能量和距离相似度进行选择,选出优势个体并更新种群,过程如下:5) Select according to the energy and distance similarity of the target individual x i and the mutant individual x″ i , x″ j , select the dominant individual and update the population, the process is as follows:

5.1)根据Rosetta Score3函数E(xi)分别计算目标个体xi和变异个体x″i、x″j的能量E(xi)、E(x″i)和E(x″j);5.1) According to the Rosetta Score3 function E( xi ), calculate the energy E( xi ), E(x″ i ) and E(x″ j ) of the target individual xi and the mutant individual x″ i , x″ j respectively;

5.2)在目标个体xi和变异个体x″i、x″j中,若某一个体X,X∈{xi,x″i,x″j}的能量值小于其他两个个体的能量值,同时对应的Manhattan距离值也比其他两个个体对应的Manhattan距离值小,则该个体为优势个体;若某一个体X′,X′∈{xi,x″i,x″j}只有能量值比其他两个个体的能量值小,则按选择概率Ps将该个体设为优势个体;同理,若某一个体X″,X″∈{xi,x″i,x″j}只有对应的Manhattan距离值比其他两个个体对应的Manhattan距离值小,则按选择概率Ps将该个体设为优势个体;最后,优势个体替代目标个体,更新种群;5.2) Among the target individual x i and the mutant individual x″ i , x″ j , if the energy value of an individual X, X∈{ xi ,x″ i ,x″ j } is smaller than the energy value of the other two individuals , and the corresponding Manhattan distance value is also smaller than that of the other two individuals, the individual is the dominant individual; if an individual X′,X′∈{ xi ,x″ i ,x″ j } has only If the energy value is smaller than that of the other two individuals, the individual is set as the dominant individual according to the selection probability P s ; similarly, if an individual X″,X″∈{ xi ,x″ i ,x″ j } Only the corresponding Manhattan distance value is smaller than the corresponding Manhattan distance value of the other two individuals, then the individual is set as the dominant individual according to the selection probability P s ; finally, the dominant individual replaces the target individual and the population is updated;

6)判断是否达到最大遗传代数Gmax,若满足终止条件,则输出结果,否则转至步骤4)。6) Judging whether the maximum genetic algebra G max is reached, if the termination condition is met, then output the result, otherwise go to step 4).

本实施例序列长度为73的α折叠蛋白质1AIL为实施例,一种基于二级结构空间距离约束的蛋白质构象搜索方法,其中包含以下步骤:The α-fold protein 1AIL with a sequence length of 73 in this example is an example, a protein conformation search method based on the space distance constraint of the secondary structure, which includes the following steps:

1)给定输入序列信息;1) given input sequence information;

2)参数初始化:设置种群规模NP=200,最大遗传代数Gmax=2000,确定交叉概率Pc=0.1,初始种群迭代次数iteration=2000,交叉片段长度frag_length=9,组装计数器reject_number=0,最大组装次数reject_max=100,先验知识中二级结构的空间长度以及相邻两个二级结构中心残基间的空间距离构成的特征向量D={3.81085,33.8066,8.38603,30.3193,6.69076,22.1852,19.6409,17.2739,15.4455,14.6372,15.5907,12.43},最大距离约束范围δ=15,选择概率Ps=0.3;2) Parameter initialization: set the population size NP = 200, the maximum genetic algebra G max = 2000, determine the crossover probability P c = 0.1, the initial population iteration number iteration = 2000, the crossover fragment length frag_length = 9, the assembly counter reject_number = 0, the maximum The number of assemblies reject_max=100, the eigenvector D={3.81085,33.8066,8.38603,30.3193,6.69076,22.1852, 19.6409, 17.2739, 15.4455, 14.6372, 15.5907, 12.43}, maximum distance constraint range δ=15, selection probability P s =0.3;

3)初始化种群:启动NP条Monte Carlo轨迹,每条轨迹搜索iteration次,即生成NP个初始个体;3) Initialize the population: start NP Monte Carlo trajectories, search iteration times for each trajectory, and generate NP initial individuals;

4)对每个目标个体xi和随机选取的个体xj进行如下操作,i,j∈(1,...,NP)且j≠i:4) Perform the following operations on each target individual x i and randomly selected individual x j , i, j∈(1,...,NP) and j≠i:

4.1)按概率Pc对个体xi和xj进行交叉操作,过程如下:4.1) Carry out crossover operation on individuals x i and x j according to probability P c , the process is as follows:

4.1.1)在允许范围[1,total_residue-frag_length]内随机选择交叉起始点begin_position,同时计算出交叉终止点end_position=begin_position+frag_length,其中total_residue为残基总数;4.1.1) Randomly select the intersection start point begin_position within the allowable range [1, total_residue-frag_length], and calculate the intersection end point end_position=begin_position+frag_length at the same time, where total_residue is the total number of residues;

4.1.2)在每个交叉位点position∈[begin_position,end_position]处进行扭转角度交换,生成新个体x′i,x′j,即交叉个体x′i,x′j4.1.2) Perform twist angle exchange at each intersection position ∈ [begin_position, end_position] to generate new individuals x′ i , x′ j , that is, cross individuals x′ i , x′ j ;

4.2)对交叉个体x′i,x′j进行如下变异操作,过程如下:4.2) Carry out the following mutation operation on crossover individuals x′ i and x′ j , the process is as follows:

4.2.1)利用片段组装技术对交叉个体x′i进行空间构象搜索,计算出交叉个体x′i片段组装后的二级结构的长度以及相邻两个二级结构中心残基间的空间距离,并构成距离向量其中是交叉个体x′i中第m个二级结构块的长度,是第k个二级结构块中心残基和第k+1个二级结构块中心残基的空间距离;4.2.1) Use the fragment assembly technology to search for the spatial conformation of the crossover individual x'i , and calculate the length of the secondary structure of the crossover individual x'i after fragment assembly and the spatial distance between the central residues of two adjacent secondary structures , and form a distance vector in is the length of the mth secondary structure block in the cross individual x′ i , is the spatial distance between the central residue of the kth secondary structure block and the central residue of the k+1th secondary structure block;

4.2.2)根据公式计算出个体x′i的特征向量与先验知识中的特征向量D={d1,…,dm,d1,2,…,dk,k+1}的Manhattan距离,若similarity_mutation_1≤δ则变异生成的个体x″i满足二级结构空间距离约束,转至步骤4.2.4),否则转至4.2.3);4.2.2) According to the formula Calculate the eigenvector of individual x′ i The Manhattan distance from the eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } in prior knowledge, if similarity_mutation_1≤δ, the individual x″ i generated by mutation satisfies Secondary structure space distance constraint, go to step 4.2.4), otherwise go to 4.2.3);

4.2.3)计数器reject_number开始计数,如果reject_number≤reject_max则依次执行步骤4.2.1)和4.2.2)生成新个体x″i,直到满足similarity_mutation_1≤δ停止;否则执行步骤4.2.1)生成新个体x″i4.2.3) The counter reject_number starts counting, if reject_number≤reject_max, execute steps 4.2.1) and 4.2.2) to generate a new individual x″ i in sequence, until the similarity_mutation_1≤δ is satisfied; otherwise, execute step 4.2.1) to generate a new individual x″ i ;

4.2.4)与步骤4.2.1)和4.2.2)同理对个体x′j进行片段组装并计算相应的Manhattan距离值similarity_mutation_2,最后得到新个体x″j4.2.4) In the same way as in steps 4.2.1) and 4.2.2), perform fragment assembly on the individual x'j and calculate the corresponding Manhattan distance value similarity_mutation_2, and finally obtain the new individual x"j ;

4.2.5)根据公式计算出目标个体xi的距离向量与先验知识中的特征向量D={d1,…,dm,d1,2,…,dk,k+1}的Manhattan距离;4.2.5) According to the formula Calculate the distance vector of the target individual x i The Manhattan distance from the eigenvector D={d 1 ,…,d m ,d 1,2 ,…,d k,k+1 } in the prior knowledge;

5)根据目标个体xi和变异个体x″i、x″j的能量和距离相似度进行选择,选出优势个体并更新种群,过程如下:5) Select according to the energy and distance similarity of the target individual x i and the mutant individual x″ i , x″ j , select the dominant individual and update the population, the process is as follows:

5.1)根据Rosetta Score3函数E(xi)分别计算目标个体xi和变异个体x″i、x″j的能量E(xi)、E(x″i)和E(x″j);5.1) According to the Rosetta Score3 function E( xi ), calculate the energy E( xi ), E(x″ i ) and E(x″ j ) of the target individual xi and the mutant individual x″ i , x″ j respectively;

5.2)在目标个体xi和变异个体x″i、x″j中,若某一个体X,X∈{xi,x″i,x″j}的能量值小于其他两个个体的能量值,同时对应的Manhattan距离值也比其他两个个体对应的Manhattan距离值小,则该个体为优势个体;若某一个体X′,X′∈{xi,x″i,x″j}只有能量值比其他两个个体的能量值小,则按选择概率Ps将该个体设为优势个体;同理,若某一个体X″,X″∈{xi,x″i,x″j}只有对应的Manhattan距离值比其他两个个体对应的Manhattan距离值小,则按选择概率Ps将该个体设为优势个体;最后,优势个体替代目标个体,更新种群;5.2) Among the target individual x i and the mutant individual x″ i , x″ j , if the energy value of an individual X, X∈{ xi ,x″ i ,x″ j } is smaller than the energy value of the other two individuals , and the corresponding Manhattan distance value is also smaller than that of the other two individuals, the individual is the dominant individual; if an individual X′,X′∈{ xi ,x″ i ,x″ j } has only If the energy value is smaller than that of the other two individuals, the individual is set as the dominant individual according to the selection probability P s ; similarly, if an individual X″,X″∈{ xi ,x″ i ,x″ j } Only the corresponding Manhattan distance value is smaller than the corresponding Manhattan distance value of the other two individuals, then the individual is set as the dominant individual according to the selection probability P s ; finally, the dominant individual replaces the target individual and the population is updated;

6)判断是否达到最大遗传代数Gmax,若满足终止条件,则输出结果,否则转至步骤4)。6) Judging whether the maximum genetic algebra G max is reached, if the termination condition is met, then output the result, otherwise go to step 4).

以序列长度为73的α折叠蛋白质1AIL为实施例,运用以上方法得到了该蛋白质的近天然态构象,最小均方根偏差为平均均方根偏差为预测结构如图3所示。Taking the α-fold protein 1AIL with a sequence length of 73 as an example, the near-native conformation of the protein was obtained by using the above method, and the minimum root mean square deviation is The average root mean square deviation is The prediction structure is shown in Figure 3.

以上说明是本发明以1AIL蛋白质为实例所得出的优化效果,并非限定本发明的实施范围,在不偏离本发明基本内容所涉及范围的前提下对其做各种变形和改进,不应排除在本发明的保护范围之外。The above description is the optimization effect obtained by taking 1AIL protein as an example in the present invention, and does not limit the implementation scope of the present invention. Various deformations and improvements are made to it without departing from the scope involved in the basic content of the present invention, and should not be excluded. outside the protection scope of the present invention.

Claims (1)

  1. A kind of 1. protein conformation searching method based on the constraint of secondary structure space length, it is characterised in that:The conformation is empty Between searching method comprise the following steps:
    1) list entries information is given;
    2) parameter initialization:Population scale NP, maximum genetic algebra G are setmax, determine crossover probability Pc, initial population iteration time Number iteration, intersects fragment length frag_length, assembles counter reject_number, maximum assembling number Reject_max, space in priori between the space length of secondary structure and two neighboring secondary structure center residue away from From the characteristic vector D={ d of composition1,…,dm,d1,2,…,dk,k+1, wherein dmIt is m-th of secondary structure block of target protein Length, dk,k+1It is the space length of+1 secondary structure center residue of k-th of secondary structure block and kth, ultimate range constrains model Enclose δ, select probability Ps
    3) population is initialized:Start NP bar Monte Carlo tracks, every track search iteration time, that is, generate NP it is individual at the beginning of Begin individual;
    4) to each target individual xiWith the individual x randomly selectedjProceed as follows, i, j ∈ (1 ..., NP) and j ≠ i:
    4.1) probability P is pressedcTo individual xiAnd xjCrossover operation is carried out, process is as follows:
    4.1.1) random selection intersects starting point begin_ in allowed band [1, total_residue-frag_length] Position, while cross termination point end_position=begin_position+frag_length is calculated, wherein Total_residue is total number of residues;
    4.1.2) windup-degree is carried out at each intersection site position ∈ [begin_position, end_position] place Exchange, generation new individual x 'i,x′j, that is, intersect individual x 'i,x′j
    4.2) to intersecting individual x 'i,x′jFollowing mutation operation is carried out, process is as follows:
    4.2.1) using fragment package technique to intersecting individual x 'iSpace conformation search is carried out, calculates and intersects individual x 'iFragment Space length between the length of secondary structure after assembling and two neighboring secondary structure center residue, and form distance vectorWhereinIt is to intersect individual x 'iIn m-th of secondary structure block length,It is The space length of+1 secondary structure block center residue of k-th of secondary structure block center residue and kth;
    4.2.2) according to formulaCalculate individual x 'i Characteristic vectorWith the characteristic vector D={ d in priori1,…,dm,d1,2,…, dk,k+1Manhattan distances, the individual x " for the generation that made a variation if similarity_mutation_1≤δiMeet two level knot Conformational space distance restraint, goes to step 4.2.4), otherwise go to 4.2.3);
    4.2.3) counter reject_number is started counting up, and is performed successively if reject_number≤reject_max Step 4.2.1) and 4.2.2) generate new individual x "i, until meeting that similarity_mutation_1≤δ stops;Otherwise perform Step 4.2.1) generation new individual x "i
    4.2.4) with step 4.2.1) and 4.2.2) similarly to individual x 'jCarry out fragment assembling and calculate corresponding Manhattan away from From value similarity_mutation_2, new individual x " is finally obtainedj
    4.2.5) according to formulaCalculate target individual xi Distance vectorWith the characteristic vector D={ d in priori1,…,dm,d1,2,…, dk,k+1Manhattan distances;
    5) according to target individual xiWith the individual x " that makes a variationi、x″jEnergy and Distance conformability degree selected, the advantage individual of selecting is simultaneously Population Regeneration, process are as follows:
    5.1) according to Rosetta Score3 function E (xi) target individual x is calculated respectivelyiWith the individual x " that makes a variationi、x″jENERGY E (xi)、E(x″i) and E (x "j);
    5.2) in target individual xiWith the individual x " that makes a variationi、x″jIn, if a certain individual X, X ∈ { xi,x″i,x″jEnergy value be less than Other two individual energy values, while corresponding Manhattan distance values are also than Manhattan corresponding to other two individuals Distance value is small, then the individual is advantage individual;If a certain individual X ', X ' ∈ { xi,x″i,x″jThere was only energy value than other two The energy value of individual is small, then by select probability PsThe individual is set to advantage individual;Similarly, if a certain individual X ", X " ∈ { xi, x″i,x″jOnly have corresponding Manhattan distance values smaller than Manhattan distance values corresponding to other two individuals, then by choosing Select probability PsThe individual is set to advantage individual;Finally, advantage individual substitutes target individual, Population Regeneration;
    6) judge whether to reach maximum genetic algebra GmaxIf meeting end condition, output result, step 4) is otherwise gone to.
CN201710683896.1A 2017-08-11 2017-08-11 Protein conformation search method based on secondary structure space distance constraint Active CN107609342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710683896.1A CN107609342B (en) 2017-08-11 2017-08-11 Protein conformation search method based on secondary structure space distance constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710683896.1A CN107609342B (en) 2017-08-11 2017-08-11 Protein conformation search method based on secondary structure space distance constraint

Publications (2)

Publication Number Publication Date
CN107609342A true CN107609342A (en) 2018-01-19
CN107609342B CN107609342B (en) 2020-08-18

Family

ID=61065372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710683896.1A Active CN107609342B (en) 2017-08-11 2017-08-11 Protein conformation search method based on secondary structure space distance constraint

Country Status (1)

Country Link
CN (1) CN107609342B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647486A (en) * 2018-03-22 2018-10-12 浙江工业大学 A kind of prediction method for three-dimensional structure of protein based on conformational diversity strategy
CN109002691A (en) * 2018-07-12 2018-12-14 浙江工业大学 A kind of Advances in protein structure prediction based on Boltzmann more new strategy
CN109033753A (en) * 2018-06-07 2018-12-18 浙江工业大学 A kind of group's Advances in protein structure prediction based on the assembling of secondary structure segment
CN109101785A (en) * 2018-07-12 2018-12-28 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity selection strategy
CN109243526A (en) * 2018-07-12 2019-01-18 浙江工业大学 A kind of Advances in protein structure prediction intersected based on specific fragment
CN109300506A (en) * 2018-08-29 2019-02-01 浙江工业大学 A protein structure prediction method based on specific distance constraints
CN109300505A (en) * 2018-08-29 2019-02-01 浙江工业大学 A protein structure prediction method based on biased sampling
CN109326319A (en) * 2018-08-28 2019-02-12 浙江工业大学 A protein conformational space optimization method based on secondary structure knowledge
CN109326320A (en) * 2018-08-29 2019-02-12 浙江工业大学 An ensemble conformation selection strategy adaptive protein structure prediction method
CN109360598A (en) * 2018-08-28 2019-02-19 浙江工业大学 A protein structure prediction method based on two-stage sampling
CN109360600A (en) * 2018-08-28 2019-02-19 浙江工业大学 A Protein Structure Prediction Method Based on Residue Feature Distances
CN109378034A (en) * 2018-08-28 2019-02-22 浙江工业大学 A protein prediction method based on distance distribution estimation
CN109390035A (en) * 2018-08-29 2019-02-26 浙江工业大学 A kind of protein conformation space optimization method compared based on partial structurtes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063632A (en) * 2014-06-27 2014-09-24 南京理工大学 Prediction method for protein sequence disulfide bond connection mode based on forest regression model
CN105205348A (en) * 2015-09-22 2015-12-30 浙江工业大学 Method for colony conformation space optimization based on distance constraint selection strategy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063632A (en) * 2014-06-27 2014-09-24 南京理工大学 Prediction method for protein sequence disulfide bond connection mode based on forest regression model
CN105205348A (en) * 2015-09-22 2015-12-30 浙江工业大学 Method for colony conformation space optimization based on distance constraint selection strategy

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FANCHI MENG等: "Computational Prediction of Protein Secondary Structure from Sequence", 《CURRENT PROTOCOLS IN PROTEIN SCIENCE》 *
王彩霞: "基于距离约束的蛋白质空间结构预测", 《中国优秀硕士学位论文全文数据库 基础科学辑》 *
田远等: "基于量子多种群遗传算法的蛋白质二级结构预测", 《中国农学通报》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647486A (en) * 2018-03-22 2018-10-12 浙江工业大学 A kind of prediction method for three-dimensional structure of protein based on conformational diversity strategy
CN108647486B (en) * 2018-03-22 2021-06-18 浙江工业大学 A protein three-dimensional structure prediction method based on conformational diversity strategy
CN109033753A (en) * 2018-06-07 2018-12-18 浙江工业大学 A kind of group's Advances in protein structure prediction based on the assembling of secondary structure segment
CN109033753B (en) * 2018-06-07 2021-06-18 浙江工业大学 A population protein structure prediction method based on the assembly of secondary structure fragments
CN109243526B (en) * 2018-07-12 2021-08-03 浙江工业大学 A protein structure prediction method based on the intersection of specific fragments
CN109101785A (en) * 2018-07-12 2018-12-28 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity selection strategy
CN109243526A (en) * 2018-07-12 2019-01-18 浙江工业大学 A kind of Advances in protein structure prediction intersected based on specific fragment
CN109002691A (en) * 2018-07-12 2018-12-14 浙江工业大学 A kind of Advances in protein structure prediction based on Boltzmann more new strategy
CN109101785B (en) * 2018-07-12 2021-06-18 浙江工业大学 A Protein Structure Prediction Method Based on Secondary Structure Similarity Selection Strategy
CN109002691B (en) * 2018-07-12 2021-11-23 浙江工业大学 Protein structure prediction method based on Boltzmann update strategy
CN109360600A (en) * 2018-08-28 2019-02-19 浙江工业大学 A Protein Structure Prediction Method Based on Residue Feature Distances
CN109378034A (en) * 2018-08-28 2019-02-22 浙江工业大学 A protein prediction method based on distance distribution estimation
CN109378034B (en) * 2018-08-28 2021-06-18 浙江工业大学 A protein prediction method based on distance distribution estimation
CN109360598B (en) * 2018-08-28 2021-06-18 浙江工业大学 A protein structure prediction method based on two-stage sampling
CN109360598A (en) * 2018-08-28 2019-02-19 浙江工业大学 A protein structure prediction method based on two-stage sampling
CN109360600B (en) * 2018-08-28 2021-05-18 浙江工业大学 Protein structure prediction method based on residue characteristic distance
CN109326319B (en) * 2018-08-28 2021-05-18 浙江工业大学 A protein conformational space optimization method based on secondary structure knowledge
CN109326319A (en) * 2018-08-28 2019-02-12 浙江工业大学 A protein conformational space optimization method based on secondary structure knowledge
CN109300505A (en) * 2018-08-29 2019-02-01 浙江工业大学 A protein structure prediction method based on biased sampling
CN109300506B (en) * 2018-08-29 2021-05-18 浙江工业大学 A protein structure prediction method based on specific distance constraints
CN109300505B (en) * 2018-08-29 2021-05-18 浙江工业大学 A protein structure prediction method based on biased sampling
CN109390035B (en) * 2018-08-29 2021-04-06 浙江工业大学 Protein conformation space optimization method based on local structure comparison
CN109390035A (en) * 2018-08-29 2019-02-26 浙江工业大学 A kind of protein conformation space optimization method compared based on partial structurtes
CN109326320A (en) * 2018-08-29 2019-02-12 浙江工业大学 An ensemble conformation selection strategy adaptive protein structure prediction method
CN109300506A (en) * 2018-08-29 2019-02-01 浙江工业大学 A protein structure prediction method based on specific distance constraints

Also Published As

Publication number Publication date
CN107609342B (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN107609342A (en) A kind of protein conformation searching method based on the constraint of secondary structure space length
CN106778059B (en) A kind of group's Advances in protein structure prediction based on Rosetta local enhancement
CN112233723B (en) Protein structure prediction method and system based on deep learning
CN107633157B (en) A protein conformational space optimization method based on distribution estimation and copy swapping strategy
CN104200130B (en) It is a kind of that the Advances in protein structure prediction assembled with fragment is exchanged based on tree construction copy
CN107633159B (en) A search method for protein conformation space based on distance similarity
Zhang et al. Enhancing protein conformational space sampling using distance profile-guided differential evolution
CN108334746A (en) A kind of Advances in protein structure prediction based on secondary structure similarity
CN108846256A (en) A kind of group's Advances in protein structure prediction based on contact residues information
CN105808973B (en) One kind is based on interim shifty group's conformational space method of sampling
CN103473482A (en) Protein three-dimensional structure prediction method based on differential evolution and conformation space annealing
CN108062457A (en) A kind of Advances in protein structure prediction of structural eigenvector assisted Selection
CN109086566B (en) A Fragment Resampling-Based Population Protein Structure Prediction Method
CN115527605A (en) Antibody structure prediction method based on depth map model
Hao et al. Conformational space sampling method using multi-subpopulation differential evolution for de novo protein structure prediction
CN113539364A (en) A deep neural network framework for predicting protein phosphorylation
Liu et al. Drug-target Interaction Prediction By Combining Transformer and Graph Neural Networks
CN108595910A (en) A kind of group's protein conformation space optimization method based on diversity index
CN109390035B (en) Protein conformation space optimization method based on local structure comparison
CN111951885B (en) Protein structure prediction method based on local bias
CN109300505B (en) A protein structure prediction method based on biased sampling
CN110610742B (en) Functional module detection method based on protein interaction network
CN109360600B (en) Protein structure prediction method based on residue characteristic distance
CN109326319B (en) A protein conformational space optimization method based on secondary structure knowledge
CN108763870B (en) A construction method of multi-domain protein Linker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant