CN109033753B - Group protein structure prediction method based on secondary structure fragment assembly - Google Patents

Group protein structure prediction method based on secondary structure fragment assembly Download PDF

Info

Publication number
CN109033753B
CN109033753B CN201810579668.4A CN201810579668A CN109033753B CN 109033753 B CN109033753 B CN 109033753B CN 201810579668 A CN201810579668 A CN 201810579668A CN 109033753 B CN109033753 B CN 109033753B
Authority
CN
China
Prior art keywords
population
individuals
information
individual
conformation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810579668.4A
Other languages
Chinese (zh)
Other versions
CN109033753A (en
Inventor
李章维
孙科
郝小虎
周晓根
张贵军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhaoji Biotechnology Co ltd
Shenzhen Xinrui Gene Technology Co ltd
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810579668.4A priority Critical patent/CN109033753B/en
Publication of CN109033753A publication Critical patent/CN109033753A/en
Application granted granted Critical
Publication of CN109033753B publication Critical patent/CN109033753B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A group protein structure prediction method based on secondary structure fragment assembly. Firstly, the information interaction process probability in the group algorithm can control the group convergence speed; then, secondary structure-based fragment assembly operations can increase conformational diversity, thereby achieving a more native conformation; and finally, carrying out micro disturbance on the Loop area, and carrying out optimization on the population by using energy in the selection process, eliminating individuals with higher energy, and leaving the better individuals for next iteration. The invention has better sampling capability and higher prediction precision.

Description

Group protein structure prediction method based on secondary structure fragment assembly
Technical Field
The invention relates to the fields of biological informatics, molecular dynamics simulation, statistical learning and combination optimization and computer application, in particular to a group protein structure prediction method based on secondary structure fragment assembly.
Background
Proteins are the blueprint of life and proteins are the machinery of life. Nucleic acid sequences contain vital information, while proteins perform various important tasks in the body of a human, such as catalysis of biochemical reactions, transport of nutrients, control of growth and differentiation, identification and transmission of biological signals, and the like. Proteins have different lengths, different amino acid arrangements and different spatial structures, and experimental analysis shows that proteins can form specific structures.
The structural significance of protein research is great, and the analysis of protein structure, function and relationship is an important component in proteome planning. Studying protein structure helps to understand the role of a protein, to understand how a protein performs its biological functions, and to recognize the interaction between a protein and a protein (or other molecule), which is very important both for biology and for medicine and pharmacy. However, what determines the spatial structure of the protein? When the spatial structure of a protein is disrupted, or the protein unfolds, it can recover its natural folded structure. A large number of experimental results prove that: the structure of a protein is determined by the protein sequence. Although another factor that affects the spatial structure of proteins is the solution environment in which the protein molecule is located, information that determines the structure of the protein is encoded within the amino acid sequence. However, can such code be deciphered? Or whether the spatial structure of a protein can be predicted directly from the amino acid sequence? Although the structure of a protein is generally thought to be determined by its amino acid sequence, our current indications are not sufficient to accurately predict the tertiary structure of a protein.
The existing experimental methods for predicting the protein structure mainly comprise methods such as X-ray, nuclear magnetic resonance, cryoelectron microscope and the like, but the methods are expensive and long in time consumption, and under the double promotion of theoretical requirements and practical application, the method for predicting the protein structure by using an amino acid sequence and a computer optimization method is adopted, wherein the de novo prediction method has the defects of poor prediction precision and insufficient sampling capability.
Therefore, the invention provides a group protein structure prediction method based on secondary structure fragment assembly, so that the problems of insufficient prediction precision and sampling capability in the existing protein structure prediction method are solved.
Disclosure of Invention
In order to overcome the defects of insufficient sampling capability and prediction accuracy of the conventional protein structure prediction method, the invention provides a group protein structure prediction method based on secondary structure fragment assembly, which designs the fragment assembly based on the secondary structure, samples protein conformations with energy and structures closer to natural states through Loop-based information interaction and the secondary structure-based fragment assembly, improves the sampling capability, and changes the structures of Loop regions through small disturbance given to the Loop regions of the conformations, thereby effectively improving the problem of low protein structure prediction accuracy caused by inaccuracy of energy functions.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for population protein structure prediction based on secondary structure fragment assembly, the method comprising the steps of:
1) setting parameters, and the process is as follows:
reading sequence information of a target protein, fragment library information, and setting a population position ═ p of a protein conformation1,p2,...,pi,...,pnWhere n is the population size, piRepresenting the ith individual of the population, the iteration number is G, and the maximum iteration number is GmaxThe information interaction probability R and the sequence length are L;
2) population initialization, the process is as follows:
replicating these linear chains to give n initial population of individuals based on the initial linear chains of the protein conformation, using a fragment of length 9 for each individual p in the populationiFragment assembly is performed until the residue types at all positions are replaced at least once;
3) and (3) interacting population information, wherein the process is as follows:
for each individual p in the populationiJudging whether the individual carries out population interaction according to the information interaction probability, and randomly selecting another individual p from the population if the population interaction is carried outjWhere i ≠ j, randomly choosing piOne Loop region of the conformation, with pjExchanging dihedral angle information in corresponding area of conformation, and obtaining two new individuals p after information interactioni′,pjIf the population interaction is not carried out, carrying out the step 3) on the next individual in the population to finish the information interaction of all the individuals in the population;
4) assembling population fragments based on secondary structure, wherein the process is as follows:
for individual pi′,i∈[1,NP]Performing 9-segment fragment assembly, judging after each fragment assembly, and if the region of the fragment assembly comprises residues of the Loop region, replacing the information of the current Loop region residues with the residue information of the Loop region in the conformation before the fragment assembly to obtain the individual piAssembling all individuals in the population by fragments based on a secondary structure;
5) and (3) disturbing the Loop area, wherein the process is as follows:
for individual pi″,i∈[1,NP]Perturbing the Loop region, and finely adjusting the dihedral angle of each residue of the conformation Loop region within an angle range of +/-2 to obtain an individual piAfter the disturbance process, the individuals before and after the disturbance are respectively evaluated by using an energy function to obtain EiAnd Ei', if Ei<Ei', then jump back to step 4) to re-assemble the segments, if Ei>Ei', ending the variation operation and obtaining a new individual;
6) the population is selected using an energy function, as follows:
firstly, combining an initial population and a disturbed population into a new population with the population size of 2 x n, then calculating the energy of individuals of the new population according to an energy function, sequencing the combined population according to the energy level, selecting the first n individuals with low energy as the selected population individuals, and finally setting G + 1;
7) judging whether the maximum iteration number G is reachedmaxAnd if the conditions are met, stopping iteration and outputting the information of the population individuals of the last generation, otherwise, returning to the step 3).
The technical conception of the invention is as follows: the invention provides a group protein structure prediction method based on secondary structure fragment assembly under the framework of a group algorithm. Firstly, the information interaction process probability in the group algorithm can control the group convergence speed; then, secondary structure-based fragment assembly operations can increase conformational diversity, thereby achieving a more native conformation; and finally, carrying out micro disturbance on the Loop area, and carrying out optimization on the population by using energy in the selection process, eliminating individuals with higher energy, and leaving the better individuals for next iteration.
The beneficial effects of the invention are as follows: on one hand, a group algorithm is used, information interaction is carried out among groups, and search of a conformation space is increased; on the other hand, the diversity of conformation is increased by the fragment assembly operation based on the secondary structure and the tiny perturbation of the residue in the Loop region, and the prediction precision is improved.
Drawings
FIG. 1 is a conformational distribution diagram obtained by structure prediction of protein 1GYZ based on a protein structure prediction method of population-limited Loop region fragment assembly.
Fig. 2 is a three-dimensional structure diagram obtained by performing structure prediction of protein 1GYZ by a protein structure prediction method limited by Loop region fragment assembly of a population.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a method for predicting a population protein structure based on secondary structure fragment assembly, the method comprising the steps of:
1) setting parameters, and the process is as follows:
reading sequence information of a target protein, fragment library information, and setting a population position ═ p of a protein conformation1,p2,...,pi,...,pnWhere n is the population size, piRepresenting the ith individual of the population, the iteration number is G, and the maximum iteration number is GmaxThe information interaction probability R and the sequence length are L;
2) population initialization, the process is as follows:
obtaining n initial population of individuals by replicating these linear chains according to the initial linear chains of the protein conformation, pairing the individuals p of the population with a fragment of length 9iAssembling fragments, and replacing residue types at all positions assembled to the conformation at least once, wherein the initialization operation is finished, and all individuals in the population are initialized;
3) and (3) interacting population information, wherein the process is as follows:
for each individual p in the populationiJudging whether the individual carries out population interaction according to the given information interaction probability, and randomly selecting another individual p from the population if the population interaction is carried outjWhere i ≠ j, randomly choosing piOne Loop region of the conformation, with pjExchanging dihedral angle information in corresponding area of conformation, and obtaining two new individuals p after information interactioni′,pjIf the population interaction is not carried out, carrying out the step 3) on the next individual in the population to finish the information interaction of all the individuals in the population;
4) assembling population fragments based on secondary structure, wherein the process is as follows:
for individual pi′,i∈[1,NP]Performing 9-segment fragment assembly, judging after each fragment assembly, and if the region assembled by the fragment comprises the residue of the Loop region, replacing the information of the current Loop region residue with the information of the residue of the Loop region in the conformation before the fragment assembly, namely, retaining the structure information of the Loop region to obtain the individual piAssembling all individuals in the population by fragments based on a secondary structure;
5) and (3) disturbing the Loop area, wherein the process is as follows:
for individual pi″,i∈[1,NP]Perturbing the Loop region, and finely adjusting the dihedral angle of each residue of the conformation Loop region within an angle range of +/-2 to obtain an individual piAfter the disturbance process, the individuals before and after the disturbance are respectively evaluated by using an energy function to obtain EiAnd Ei', if Ei<Ei', then jump back to step 4) to re-assemble the segments, if Ei>Ei', ending the variation operation and obtaining a new individual;
6) the population is selected using an energy function, as follows:
firstly, combining an initial population and a disturbed population into a new population with the population size of 2 x n, then calculating the energy of individuals of the new population according to an energy function, sequencing the combined population according to the energy level, selecting the first n individuals with low energy as the selected population individuals, and finally setting G + 1;
7) judging whether the maximum iteration number G is reachedmaxAnd if the conditions are met, stopping iteration and outputting the information of the population individuals of the last generation, otherwise, returning to the step 3).
This example illustrates an α -sheet protein 1GYZ with a sequence length of 60, a population-based method for predicting protein structure restricted by assembly of Loop region fragments, the method comprising the steps of:
1) setting parameters, and the process is as follows:
reading sequence information of a target protein, fragment library information, and setting a population position ═ p of a protein conformation1,p2,...,pi,...,pnWhere n is 100 is the population size, piRepresenting the ith individual of the population, the iteration number is G, and the maximum iteration number is GmaxThe information interaction probability R is 0.1, and the sequence length L is 60;
2) population initialization, the process is as follows:
based on the initial linear chains of the protein conformation, 100 initial population individuals are obtained by replicating these linear chains, and the individual p of the population is paired with a fragment of fragment length 9iAssembling fragments, and replacing residue types at all positions assembled to the conformation at least once, wherein the initialization operation is finished, and all individuals in the population are initialized;
3) and (3) interacting population information, wherein the process is as follows:
for each individual p in the populationiJudging whether the individual carries out population interaction according to the given information interaction probability, and randomly selecting another individual p from the population if the population interaction is carried outjWhere i ≠ j, randomly choosing piOne Loop region of the conformation, with pjExchanging dihedral angle information in corresponding area of conformation, and obtaining two new individuals p after information interactioni′,pj' if the population interaction is not carried out, the next individual in the population is carried out with the step 3), and the step for all the individuals in the population is finishedInformation interaction of the body;
4) assembling population fragments based on secondary structure, wherein the process is as follows:
for individual pi′,i∈[1,NP]Performing 9-segment fragment assembly, judging after each fragment assembly, and if the region assembled by the fragment comprises the residue of the Loop region, replacing the information of the current Loop region residue with the information of the residue of the Loop region in the conformation before the fragment assembly, namely, retaining the structure information of the Loop region to obtain the individual piAssembling all individuals in the population by fragments based on a secondary structure;
5) and (3) disturbing the Loop area, wherein the process is as follows:
for individual pi″,i∈[1,NP]Perturbing the Loop region, and finely adjusting the dihedral angle of each residue of the conformation Loop region within an angle range of +/-2 to obtain an individual piAfter the perturbation process, the individuals before and after the perturbation are respectively evaluated by using an energy function 'score 3' to obtain EiAnd Ei', if Ei<Ei', then jump back to step 4) to re-assemble the segments, if Ei>Ei', ending the variation operation and obtaining a new individual;
6) the population is selected using an energy function, as follows:
firstly, combining an initial population and a disturbed population into a new population with the population size of 2 × n, then calculating the energy of new population individuals according to an energy function 'score 3', sequencing the combined population according to the energy level, selecting the first n individuals with low energy as the selected population individuals, and finally setting G + 1;
7) judging whether the maximum iteration number G is reachedmaxAnd if the conditions are met, stopping iteration and outputting the information of the population individuals of the last generation, otherwise, returning to the step 3).
Using the example of the alpha-folded protein 1GYZ with a sequence length of 60, the above method was used to obtain the near-native conformation of the protein with a minimum RMS deviation of
Figure BDA0001688001210000061
Mean root mean square deviation of
Figure BDA0001688001210000062
The prediction structure is shown in fig. 2.
The above description is the optimization effect of the present invention using the 1GYZ protein as an example, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims (1)

1. A method for predicting a population protein structure based on secondary structure fragment assembly, the method comprising the steps of:
1) setting parameters, and the process is as follows:
reading sequence information and fragment library information of a target protein, and setting a population position ═ p of a protein conformation1,p2,...,pi,...,pnWhere n is the population size, piRepresenting the ith individual of the population, the iteration number is G, and the maximum iteration number is GmaxThe information interaction probability R and the sequence length are L;
2) population initialization, the process is as follows:
obtaining n initial population of individuals by replicating these linear chains according to the initial linear chains of the protein conformation, pairing the individuals p of the population with a fragment of length 9iAssembling fragments, and replacing residue types at all positions assembled to the conformation at least once, wherein the initialization operation is finished, and all individuals in the population are initialized;
3) and (3) interacting population information, wherein the process is as follows:
for each individual p in the populationiJudging whether the individual carries out population interaction according to the given information interaction probability, and randomly selecting another individual p from the population if the population interaction is carried outjWhere i ≠ j, randomly choosing piOne Loop region of the conformation, with pjRegions corresponding to conformationsExchanging dihedral angle information to obtain two new individuals p after information interactioni′,pjIf the population interaction is not carried out, carrying out the step 3) on the next individual in the population to finish the information interaction of all the individuals in the population;
4) assembling population fragments based on secondary structure, wherein the process is as follows:
for individual pi′,i∈[1,n]Performing 9-segment fragment assembly, judging after each fragment assembly, and if the region assembled by the fragment comprises the residue of the Loop region, replacing the information of the current Loop region residue with the information of the residue of the Loop region in the conformation before the fragment assembly, namely, retaining the structure information of the Loop region to obtain the individual piAssembling all individuals in the population by fragments based on a secondary structure;
5) and (3) disturbing the Loop area, wherein the process is as follows:
for individual pi″,i∈[1,n]Perturbing the Loop region, and finely adjusting the dihedral angle of each residue of the conformation Loop region within an angle range of +/-2 degrees to obtain an individual piAfter the disturbance process, the individuals before and after the disturbance are respectively evaluated by using an energy function to obtain EiAnd Ei', if Ei<Ei', then jump back to step 4) to re-assemble the segments, if Ei>Ei', ending the variation operation and obtaining a new individual;
6) the population is selected using an energy function, as follows:
firstly, combining an initial population and a disturbed population into a new population with the population size of 2 x n, then calculating the energy of individuals of the new population according to an energy function, sequencing the combined population according to the energy level, selecting the first n individuals with low energy as the selected population individuals, and finally setting G + 1;
7) judging whether the maximum iteration number G is reachedmaxAnd if the conditions are met, stopping iteration and outputting the information of the population individuals of the last generation, otherwise, returning to the step 3).
CN201810579668.4A 2018-06-07 2018-06-07 Group protein structure prediction method based on secondary structure fragment assembly Active CN109033753B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810579668.4A CN109033753B (en) 2018-06-07 2018-06-07 Group protein structure prediction method based on secondary structure fragment assembly

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810579668.4A CN109033753B (en) 2018-06-07 2018-06-07 Group protein structure prediction method based on secondary structure fragment assembly

Publications (2)

Publication Number Publication Date
CN109033753A CN109033753A (en) 2018-12-18
CN109033753B true CN109033753B (en) 2021-06-18

Family

ID=64612076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810579668.4A Active CN109033753B (en) 2018-06-07 2018-06-07 Group protein structure prediction method based on secondary structure fragment assembly

Country Status (1)

Country Link
CN (1) CN109033753B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110729023B (en) * 2019-08-29 2021-04-06 浙江工业大学 Protein structure prediction method based on contact assistance of secondary structure elements
CN111951885B (en) * 2020-08-11 2022-05-03 湖南大学 Protein structure prediction method based on local bias

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106605228A (en) * 2014-07-07 2017-04-26 耶达研究及发展有限公司 Method of computational protein design
JP2017174057A (en) * 2016-03-23 2017-09-28 国立大学法人大阪大学 Method for predicting partial cubic structure of protein
CN107609342A (en) * 2017-08-11 2018-01-19 浙江工业大学 A kind of protein conformation searching method based on the constraint of secondary structure space length
CN108052795A (en) * 2017-11-28 2018-05-18 华东师范大学 A kind of method of the G-protein coupling specificities prediction of feature based optimization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106605228A (en) * 2014-07-07 2017-04-26 耶达研究及发展有限公司 Method of computational protein design
JP2017174057A (en) * 2016-03-23 2017-09-28 国立大学法人大阪大学 Method for predicting partial cubic structure of protein
CN107609342A (en) * 2017-08-11 2018-01-19 浙江工业大学 A kind of protein conformation searching method based on the constraint of secondary structure space length
CN108052795A (en) * 2017-11-28 2018-05-18 华东师范大学 A kind of method of the G-protein coupling specificities prediction of feature based optimization

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Critical Features of Fragment Libraries for Protein Structure Prediction";Raphael Trevizani;《PLoS ONE》;20171231;第1-22页 *
"一种基于片段组装的蛋白质构象空间优化算法";郝小虎;《计算机科学》;20150331;第237-240页 *

Also Published As

Publication number Publication date
CN109033753A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
Zhou et al. SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures
CN108334746B (en) Protein structure prediction method based on secondary structure similarity
CN109360599B (en) Protein structure prediction method based on residue contact information cross strategy
CN109033753B (en) Group protein structure prediction method based on secondary structure fragment assembly
CN107491664B (en) Protein structure de novo prediction method based on information entropy
CN109524058B (en) Protein dimer structure prediction method based on differential evolution
CN109086566B (en) Group protein structure prediction method based on fragment resampling
CN109033744A (en) A kind of Advances in protein structure prediction based on residue distance and contact information
CN109215732B (en) Protein structure prediction method based on residue contact information self-learning
Gao et al. High-performance deep learning toolbox for genome-scale prediction of protein structure and function
CN108763860B (en) Loop information sampling-based group protein conformation space optimization method
CN109378034B (en) Protein prediction method based on distance distribution estimation
CN108920894B (en) Protein conformation space optimization method based on brief abstract convex estimation
CN109360597B (en) Group protein structure prediction method based on global and local strategy cooperation
CN109346128B (en) Protein structure prediction method based on residue information dynamic selection strategy
CN111951885B (en) Protein structure prediction method based on local bias
CN109243526B (en) Protein structure prediction method based on specific fragment crossing
CN110189794B (en) Residue contact guided loop perturbation population protein structure prediction method
CN109300505B (en) Protein structure prediction method based on biased sampling
CN107609345B (en) Multi-domain protein structure assembly method based on template self-adaptive selection
CN109360600B (en) Protein structure prediction method based on residue characteristic distance
CN109147867B (en) Group protein structure prediction method based on dynamic segment length
CN112967751A (en) Protein conformation space optimization method based on evolution search
CN109390035B (en) Protein conformation space optimization method based on local structure comparison
CN110706739B (en) Protein conformation space sampling method based on multi-mode internal and external intersection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221227

Address after: D1101, Building 4, Software Industry Base, No. 19, 17, 18, Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen, Guangdong, 518000

Patentee after: Shenzhen Xinrui Gene Technology Co.,Ltd.

Address before: N2248, Floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou, Guangdong 510,000

Patentee before: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Effective date of registration: 20221227

Address after: N2248, Floor 3, Xingguang Yingjing, No. 117, Shuiyin Road, Yuexiu District, Guangzhou, Guangdong 510,000

Patentee after: GUANGZHOU ZHAOJI BIOTECHNOLOGY CO.,LTD.

Address before: The city Zhaohui six districts Chao Wang Road Hangzhou City, Zhejiang province 310014 18

Patentee before: JIANG University OF TECHNOLOGY