A kind of protein structure prediction based on the selection of dihedral angle information auxiliary energy function
Method
Technical field
The present invention is that one kind is related to biological information, molecular dynamics simulation, statistical learning and Combinatorial Optimization, computer
Application field, more particularly to, a kind of Advances in protein structure prediction based on the selection of dihedral angle information auxiliary energy function.
Background technique
Protein is that most wide, most complicated protein is distributed in organism, is played in various processes related with life
Vital effect, such as transport, adjust and defend process.
The structure of protein can be divided into three levels:
1) primary structure of protein refers to the amino acid sequence in polypeptide chain.
2) secondary structure refers to the partial structurtes of height rule on practical polypeptide backbone.There are two kinds of major type of second level knots
Structure, alpha-helix and beta chain.
3) tertiary structure refers to the three-dimensional structure of monomer and polymer protein molecule.Alpha-helix and beta-pleated sheet are folded
At fine and close chondritic.
4) the 4th structure is two or more individual polypeptide chain (subunit) aggregations by running as simple function unit
The three-dimensional structure of composition.
Protein certain biological functions of competence exertion only after being folded into specific structure, therefore understands that the knot of protein
For structure to understanding that it is that central nervous system is extremely important, its infection sources is a kind of certain types of mistake for being referred to as prion
Unfolded protein.Under normal circumstances, prion is α-helixstructure, but under specific circumstances, it can be twisted into β chain structure,
This is virulence factor.The experimental method for obtaining protein three-dimensional structure includes X-ray crystallography, nuclear magnetic resonance spectroscopy, low temperature
Electron microscopy etc..In the past few decades, protein sequence database (UniProt) and Protein structure databases
(PDB) data in are exponentially increased.However, it is more much easier than obtaining protein structural database to obtain protein sequence data.
Importantly, laboratory facilities are always time-consuming huge and expensive.By 2 months 2018, the protein sequence less than 0.127% was
It is determined through experimentation three-dimensional structure.Therefore, work is very important from the calculation method of protein prediction structure.This
Outside, the experiment of Anfinsen is shown, natural structure is only determined by the amino acid sequence of protein.In other words, the structure of protein
Information is included in its sequence, this shows can be used calculation method from sequence prediction structure.Due to similar protein sequence
Usually there is similar three-dimensional structure, therefore exist and use the known structure in PDB as the homology modeled method of template, this
It is the most accurate method for being used for protein structure prediction so far.With the growth of database, more and more protein
Accurate protein structure can be obtained by homologous templates.Homologous modeling can effectively predict protein structure, but its is pre-
Survey the sequence identity that accuracy depends between target protein and stay in place form.(it is greater than when sequence identity is relatively high
30%) when, homology modeling methods generally can be with higher accuracy prediction tertiary protein structure, and when sequence identity is lower
Then fail.Different from Structure Prediction Methods (such as homology modeled) based on template, ab initio prediction method does not depend on any known
Structure, and pass through the natural structure of stable conformation method search target protein.Wherein, segment package technique is widely used,
It is using the fragment assembly of multiple protein structures at target protein structure in protein structure.During ab initio prediction,
There are two the bottlenecks being primarily present at present, one be energy landscape duplicity so that obtain the low conformation of energy be not day
The conformation of right state, is embodied in the inaccurate of energy function, the conformation that cannot have been picked out;Another is then existing technology
To the deficiency of the ability in sampling of conformational space, it is embodied in conformation lack of diversity.
Therefore, there is defects in terms of precision of prediction and ability in sampling for present Advances in protein structure prediction, need
It improves.
Summary of the invention
In order to overcome existing Advances in protein structure prediction ability in sampling and the insufficient defect of precision of prediction, the present invention is mentioned
A kind of ability in sampling is preferable out, the higher protein structure based on the selection of dihedral angle information auxiliary energy function of precision of prediction is pre-
Survey method, on the basis of Rosetta algorithm, introduce stage by stage segment assembling, and using energy function to conformation into
A kind of new index is proposed while row evaluation, the tendentiousness of the dihedral angle based on ramachandran map Ramachandran Ramachandran plot is led to
This index and energy function are crossed, two different weights are added, a kind of new scoring functions is devised, segment is assembled
Conformation afterwards is given a mark, and judges whether the conformation is received according to Metropolis criterion.This method can effectively subtract
The problem low due to the inaccurate caused protein structure prediction precision of energy function is improved in small conformation sample space.
The technical solution adopted by the present invention to solve the technical problems is:
It is a kind of based on dihedral angle information auxiliary energy function selection Advances in protein structure prediction, the method includes with
Lower step:
1) parameter setting: protein sequence length L, initialization the number of iterations are Ii, global search the number of iterations is Ig, office
Portion's search iteration number is Il;
2) information pre-processing, starting protein sequence given first, forms the maximum extended chain of free energy according to the sequence,
Wherein dihedral angle φ,ω is respectively set to -150 °, -150 ° and 180 °, obtains the protein difference secondary structure difference residue
The corresponding ramachandran map Ramachandran Ramachandran plot of type;
3) conformation initializes, and is initialized using the stage1 in Rosetta ab initio method to initial configurations,
Residue on each resi-dues of initial configurations is replaced more than at least once or reaches maximum initialization the number of iterations IiThen
It is considered as and initializes successfully;
4) it is given a mark by the middle energy function of Rosetta algorithm to conformation, the Energy Fraction of conformation is
Energyscore;
5) conformation Rama score is calculated, by ramachandran map Ramachandran Ramachandran plot to the two of each residue position of conformation
Face angle is assessed, and assessment formula is as follows:
Wherein, φa,It is two dihedral angles of residue a, res (a) is the residue type of residue a, and ss (a) is residue a
Secondary structure types, wherein secondary structure types are obtained by DSSP algorithm, and the assessment result summation of each residue position can be obtained
To the Rama score Rama of conformationscore;
6) scoring functions are designed, Energy Fraction Energy is obtained by step 4)scoreAnd the resulting Rama of step 5) points
Number RamascoreDesign following scoring functions:
E (C)=weEnergyscore+wrRamascore
Wherein, weAnd wrThe respectively corresponding weight of Energy Fraction Rama score, C are the conformation being scored, with this dozen
Function is divided to give a mark conformation;
7) conformation global search assembles the conformation C segment for carrying out 9 segments, obtains conformation C ', then designed with step 6)
Scoring functions give a mark to the individual of segment assembling front and back, obtain E (C) and E (C '), if E (C) < E (C '), receive a
Body C ', if E (C) > E (C '), according to Boltzmann probabilityReceive individual, wherein Δ E=E (C)-E
(C ') is the energy difference that segment assembles latter two individual, and kT is temperature coefficient, carries out I to the conformation after receptiongSecondary search, is searched
Rope process is as described above, reach IgEnter conformation local search after secondary search;
8) conformation local search assembles the conformation C segment for carrying out 3 segments, obtains conformation C ', then designed with step 6)
Scoring functions give a mark to the individual of segment assembling front and back, obtain E (C) and E (C '), if E (C) < E (C '), receive a
Body C ', if E (C) > E (C '), according to Boltzmann probabilityReceive individual, wherein Δ E=E (C)-E
(C ') is the energy difference that segment assembles latter two individual, and kT is temperature coefficient, carries out I to the conformation after receptionlSecondary search, is searched
Rope process is as described above, reach IlThe entire search process of conformation is completed after secondary search;
9) it saves final conformation and records output Constellation information.
Technical concept of the invention are as follows: the present invention proposes a kind of based on dihedral angle information auxiliary under the frame of group's algorithm
The Advances in protein structure prediction of energy function selection.Firstly, according to the corresponding ramachandran map Ramachandran Ramachandran of residue of protein
Dihedral angle information is extracted in plot;Secondly, using in Rosetta algorithm energy function and dihedral angle information to conformation carry out
Assessment;Then, two scores are given respectively with different weights, designs a new scoring functions, uses this scoring functions
Conformation after segment assembling is selected, to reduce the influence of energy function lax pair protein three-dimensional structure generation;
Finally, global search and local search are carried out to conformation, on the basis of guaranteeing conformation global Topological Structure, to the knot of part
Structure is enhanced, to obtain the structure of more nearly native states.
Beneficial effects of the present invention show themselves in that the energy function and residue of protein on the one hand used in Rosetta
Dihedral angle information in corresponding ramachandran map Ramachandran Ramachandran plot scores to conformation, on the basis of the two indexs
The new scoring functions of one kind are devised to give a mark to know selection, to improve the standard of prediction to the conformation after segment assembling
True rate, reducing the inaccurate bring of energy function influences.On the other hand, it in the search process of conformation sample space, is using
Global search is formed on the basis of the overall topology of conformation, and conformation local search procedure is added to enhance the part knot of conformation
Structure increases the structure diversity of conformation, and then samples the conformation closer to native state.
Detailed description of the invention
Fig. 1 be based on dihedral angle information auxiliary energy function selection Advances in protein structure prediction to protein 1AIL into
The conformation distribution map obtained when row structure prediction.
Fig. 2 be based on dihedral angle information auxiliary energy function selection Advances in protein structure prediction to protein 1AIL into
The tomograph that row structure prediction obtains.
Specific embodiment
The present invention is described further with reference to the accompanying drawing.
Referring to Figures 1 and 2, a kind of Advances in protein structure prediction based on the selection of dihedral angle information auxiliary energy function,
It the described method comprises the following steps:
1) parameter setting: protein sequence length L, initialization the number of iterations are Ii, global search the number of iterations is Ig, office
Portion's search iteration number is Il;
2) information pre-processing, starting protein sequence given first, forms the maximum extended chain of free energy according to the sequence,
Wherein dihedral angle φ,ω is respectively set to -150 °, -150 ° and 180 °, obtains the protein difference secondary structure difference residue
The corresponding ramachandran map Ramachandran Ramachandran plot of type;
3) conformation initializes, and is initialized using the stage1 in Rosetta ab initio method to initial configurations,
Residue on each resi-dues of initial configurations is replaced more than at least once or reaches maximum initialization the number of iterations IiThen
It is considered as and initializes successfully;
4) it is given a mark by the middle energy function of Rosetta algorithm to conformation, the Energy Fraction of conformation is
Energyscore;
5) conformation Rama score is calculated, by ramachandran map Ramachandran Ramachandran plot to the two of each residue position of conformation
Face angle is assessed, and assessment formula is as follows:
Wherein, φa,It is two dihedral angles of residue a, res (a) is the residue type of residue a, and ss (a) is residue a
Secondary structure types, wherein secondary structure types are obtained by DSSP algorithm, and the assessment result summation of each residue position can be obtained
To the Rama score Rama of conformationscore;
6) scoring functions are designed, Energy Fraction Energy is obtained by step 4)scoreAnd the resulting Rama of step 5) points
Number RamascoreDesign following scoring functions:
E (C)=weEnergyscore+wrRamascore
Wherein, weAnd wrThe respectively corresponding weight of Energy Fraction Rama score, C are the conformation being scored, with this dozen
Function is divided to give a mark conformation;
7) conformation global search assembles the conformation C segment for carrying out 9 segments, obtains conformation C ', then designed with step 6)
Scoring functions give a mark to the individual of segment assembling front and back, obtain E (C) and E (C '), if E (C) < E (C '), receive a
Body C ', if E (C) > E (C '), according to Boltzmann probabilityReceive individual, wherein Δ E=E (C)-E
(C ') is the energy difference that segment assembles latter two individual, and kT is temperature coefficient, carries out I to the conformation after receptiongSecondary search, is searched
Rope process is as described above, reach IgEnter conformation local search after secondary search;
8) conformation local search assembles the conformation C segment for carrying out 3 segments, obtains conformation C ', then designed with step 6)
Scoring functions give a mark to the individual of segment assembling front and back, obtain E (C) and E (C '), if E (C) < E (C '), receive a
Body C ', if E (C) > E (C '), according to Boltzmann probabilityReceive individual, wherein Δ E=E (C)-E
(C ') is the energy difference that segment assembles latter two individual, and kT is temperature coefficient, carries out I to the conformation after receptionlSecondary search, is searched
Rope process is as described above, reach IlThe entire search process of conformation is completed after secondary search;
9) it saves final conformation and records output Constellation information.
For the α unfolded protein 1AIL that the present embodiment is 73 using sequence length as embodiment, one kind is auxiliary based on dihedral angle information
The Advances in protein structure prediction for helping energy function to select, the described method comprises the following steps:
1) parameter setting: protein sequence length L=73, initialization the number of iterations are Ii=1000, global search iteration
Number is Ig=12000, local search the number of iterations is Il=20000;
2) information pre-processing, starting protein sequence given first, forms the maximum extended chain of free energy according to the sequence,
Wherein dihedral angle φ,ω is respectively set to -150 °, -150 ° and 180 °, obtains the protein difference secondary structure difference residue
The corresponding ramachandran map Ramachandran Ramachandran plot of type;
3) conformation initializes, and is initialized using the stage1 in Rosetta ab initio method to initial configurations,
Residue on each resi-dues of initial configurations is replaced more than at least once or reaches maximum initialization the number of iterations 1000
It initializes successfully;
4) it is given a mark by the middle energy function of Rosetta algorithm to conformation, the Energy Fraction of conformation is
Energyscore;
5) conformation Rama score is calculated, by ramachandran map Ramachandran Ramachandran plot to the two of each residue position of conformation
Face angle is assessed, and assessment formula is as follows:
Wherein, φa,It is two dihedral angles of residue a, res (a) is the residue type of residue a, and ss (a) is residue a
Secondary structure types, wherein secondary structure types are obtained by DSSP algorithm, and the assessment result summation of each residue position can be obtained
To the Rama score Rama of conformationscore;
6) scoring functions are designed, Energy Fraction Energy is obtained by step 4)scoreAnd the resulting Rama of step 5) points
Number RamascoreDesign following scoring functions:
E (C)=weEnergyscore+wrRamascore
Wherein, we=0.5 and wr=0.5 is respectively the corresponding weight of Energy Fraction Rama score, and C is scored
Conformation gives a mark to conformation with the scoring functions;
7) conformation global search assembles the conformation C segment for carrying out 9 segments, obtains conformation C ', then designed with step 6)
Scoring functions give a mark to the individual of segment assembling front and back, obtain E (C) and E (C '), if E (C) < E (C '), receive a
Body C ', if E (C) > E (C '), according to Boltzmann probabilityReceive individual, wherein Δ E=E (C)-E
(C ') is the energy difference that segment assembles latter two individual, and kT=2 is temperature coefficient, is carried out 12000 times to the conformation after reception
Search, search process is as described above, into conformation local search after reaching 12000 search;
8) conformation local search assembles the conformation C segment for carrying out 3 segments, obtains conformation C ', then designed with step 6)
Scoring functions give a mark to the individual of segment assembling front and back, obtain E (C) and E (C '), if E (C) < E (C '), receive a
Body C ', if E (C) > E (C '), according to Boltzmann probabilityReceive individual, wherein Δ E=E (C)-E
(C ') is the energy difference that segment assembles latter two individual, and kT=2 is temperature coefficient, is carried out 20000 times to the conformation after reception
Search, search process is as described above, reach the entire search process of completion conformation after 20000 search;
9) it saves final conformation and records output Constellation information.
The α unfolded protein 1AIL for being 73 using sequence length has obtained the protein with above method as embodiment
Nearly native state conformation, lowest mean square root deviation areAverage root-mean-square deviation isPre- geodesic structure is as shown in Figure 2.
Described above is the effect that is obtained using 1AIL protein by example of the present invention, and non-limiting implementation model of the invention
It encloses, various changes and improvements is done to it under the premise of without departing from range involved by basic content of the present invention, should not exclude at this
Except the protection scope of invention.