CN109147867A - A kind of group's Advances in protein structure prediction based on dynamic fragment length - Google Patents
A kind of group's Advances in protein structure prediction based on dynamic fragment length Download PDFInfo
- Publication number
- CN109147867A CN109147867A CN201810986058.6A CN201810986058A CN109147867A CN 109147867 A CN109147867 A CN 109147867A CN 201810986058 A CN201810986058 A CN 201810986058A CN 109147867 A CN109147867 A CN 109147867A
- Authority
- CN
- China
- Prior art keywords
- conformation
- population
- fragment length
- trial
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of group's Advances in protein structure prediction based on dynamic fragment length, for each target conformation, half conformation individual is randomly choosed from current population and establishes sub- population, and population is ranked up according to energy value, and a conformation is randomly choosed in conformation in the top to instruct to make a variation;In mutation process, multiple fragment lengths are designed, and its probability selected is calculated according to the success rate in early period of each fragment length, select probability is then based on according to the mode of roulette, a fragment length is selected to carry out segment exchange, realize mutation process.The present invention provides a kind of precision of prediction and the higher group's Advances in protein structure prediction based on dynamic fragment length of search efficiency.
Description
Technical field
The present invention relates to a kind of biological information, intelligent optimization, computer application field more particularly to a kind of bases
In group's Advances in protein structure prediction of dynamic fragment length.
Background technique
Nineteen sixty-five, Nirenberg and Khorana etc. have found triplet genetic code (i.e. the first genetic code), and DNA is with three
A nucleotide is that one group of codon translates to protein amino acid sequence (i.e. prlmary structure of protein);And protein is only rolled over
Its specific biological function could be generated by building up specific three-dimensional structure (i.e. tertiary protein structure).Relative to the first heredity
Password, the corresponding relationship (i.e. the second genetic code or code of folding) between protein sequence primary structure and its tertiary structure
It is still unsolved mystery.In order to solve protein folding this " asking for century ", more and more there is grinding for different discipline backgrounds
The person of studying carefully participates, and especially for terminal-protein structure prediction of protein folding procedure, it is extensive to receive scientific circles
Concern and research.For opposing proteins fold, protein structure prediction has stronger practicability, only acquisition protein
Three-dimensional structure, could really realize gene diagnosis, and be finally reached gene therapy purpose.
Currently, the experimental method of measurement protein three-dimensional structure includes X-ray crystal diffraction, multi-dimensional nmr (NMR)
With electron cryo-microscopy etc..X-ray crystal diffraction is current measurement protein structure most efficient method, and precision achieved is other
What method cannot compare, major defect is that protein crystal is difficult to cultivate and the period of crystal structure determination is longer;NMR method
The conformation of protein in the solution can directly be measured, but it is big to the requirement of sample, purity requirement is high, can only measure at present
Small protein.Secondly, these experimental determining methods are expensive, the three-dimensional structure for measuring a protein needs hundreds of thousands
Dollar, however, the primary amino acid sequences of one protein of measurement only need 1000 dollars or so, so as to cause protein sequence and
Wide gap between three-dimensional structure measurement is increasing.Therefore, such as how computer is tool, with algorithm appropriate, from amino
Acid sequence, which sets out, directly predicts the three-dimensional structure of protein, becomes a kind of important research topic in current biological informatics.
Ab initio prediction method is current most effective Advances in protein structure prediction.According to thermodynamics hypothesis, minimum energy
Conformation be considered as conformation closest to native state.Therefore, it in ab initio prediction method, is evaluated using energy function each
The quality of conformation, and search for using optimization method the conformation of minimum energy.In order to reduce the complexity of optimization algorithm, and mention
High precision of prediction, ab initio prediction method selects albumen similar with search sequence from existing protein pool, and is directed to each residue
Corresponding fragment library is established in position.In search process, new structure is generated by selecting segment to carry out assembling from corresponding fragment library
As, and made a variation by segment exchange of the tripe systems as between.However, in mutation process, the fragment length that is exchanged
Determination is the work of a challenge, and too long will lead to of fragment length destroys more excellent conformation, and fragment length is too short and can reduce calculation
Method search speed, to influence search speed and precision of prediction.
Therefore, there is defects in terms of precision of prediction and search efficiency for existing Advances in protein structure prediction, need
It improves.
Summary of the invention
In order to overcome precision of prediction and the lower deficiency of search efficiency, the present invention of existing Advances in protein structure prediction to mention
A kind of precision of prediction and the higher group's Advances in protein structure prediction based on dynamic fragment length of search efficiency out.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of group's Advances in protein structure prediction based on dynamic fragment length, the described method comprises the following steps:
1) sequence information of testing protein is inputted, and from ROBETTA server (http://www.robetta.org/)
On obtain fragment library;
2) parameter setting: setting population scale NP, crossover probability CR, temperature factor KT, maximum number of iterations Gmax, segment
Length collection l={ l1,l2,...,lM, the select probability p of each fragment lengthm, m=1,2 ..., M, and initialize the number of iterations
G=0, wherein M is the scale of fragment length collection;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1,
C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, so
The sub- population is ranked up from low to high according to energy afterwards;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as
Clbest;
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc;
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses one
Fragment length lm;
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation
ClbestThe segment of middle corresponding position generates variation conformation Cmutant;
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length
For lmSegment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate survey
Try conformation Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial;
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1,
2,...,M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
Technical concept of the invention are as follows: be directed to each target conformation, half conformation individual is randomly choosed from current population
Sub- population is established, and population is ranked up according to energy value, and randomly chooses a conformation in conformation in the top to refer to
Lead variation;In mutation process, multiple fragment lengths are designed, and it is calculated according to the success rate in early period of each fragment length
Then the probability selected is based on select probability according to the mode of roulette and a fragment length is selected to carry out segment exchange, real
Existing mutation process.It is higher based on Advances in protein structure prediction that the present invention provides a kind of precision of prediction and search efficiency.
Beneficial effects of the present invention are shown: on the one hand, being instructed to make a variation by local optimum conformation, can not only be kept
The diversity of conformation, and search speed can be accelerated;On the other hand, segment exchange is carried out according to dynamic fragment length, added
Interactive process between fast difference albumen, improves search efficiency.
Detailed description of the invention
Fig. 1 is that group's Advances in protein structure prediction based on dynamic fragment length carries out structure prediction to protein 1GYZ
When conformation update schematic diagram.
Fig. 2 is that group's Advances in protein structure prediction based on dynamic fragment length carries out structure prediction to protein 1GYZ
When obtained conformation distribution map.
Fig. 3 is that group's Advances in protein structure prediction based on dynamic fragment length carries out structure prediction to protein 1GYZ
Obtained tomograph.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Fig.1~Fig. 3, a kind of group's Advances in protein structure prediction based on dynamic fragment length, including following step
It is rapid:
1) sequence information of testing protein is inputted, and from ROBETTA server (http://www.robetta.org/)
On obtain fragment library;
2) parameter setting: setting population scale NP, crossover probability CR, temperature factor KT, maximum number of iterations Gmax, segment
Length collection l={ l1,l2,...,lM, the select probability p of each fragment lengthm, m=1,2 ..., M, and initialize the number of iterations
G=0, wherein M is the scale of fragment length collection;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1,
C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, so
The sub- population is ranked up from low to high according to energy afterwards;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as
Clbest;
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc;
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses one
Fragment length lm;
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation
ClbestThe segment of middle corresponding position generates variation conformation Cmutant;
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length
For lmSegment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate survey
Try conformation Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial;
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1,
2,...,M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
The α unfolded protein 1GYZ that the present embodiment sequence length is 60 is embodiment, a kind of based on dynamic fragment length
Group's Advances in protein structure prediction, wherein comprising the steps of:
1) sequence information of testing protein is inputted, and from ROBETTA server (http://www.robetta.org/)
On obtain fragment library;
2) parameter setting: setting population scale NP=50, crossover probability CR=0.5, temperature factor KT=2, greatest iteration
Number Gmax=1000, fragment length collection l={ 3,6,9,12 }, i.e. l1=3, l2=6, l3=9, l4=12, each fragment length
Select probability pm=0.5, m=1,2 ..., M, and initialize the number of iterations g=0, wherein M=4;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1,
C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, so
The sub- population is ranked up from low to high according to energy afterwards;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as
Clbest;
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc;
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses one
Fragment length lm;
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation
ClbestThe segment of middle corresponding position generates variation conformation Cmutant;
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length
For lmSegment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate survey
Try conformation Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial;
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1,
2,...,M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
The α unfolded protein 1GYZ for being 60 using sequence length has obtained the protein with above method as embodiment
Nearly native state conformation, lowest mean square root deviation areAverage root-mean-square deviation isPre- geodesic structure is as shown in Figure 3.
Described above is that the present invention is obtained as example using protein 1GYZ as a result, and non-limiting implementation model of the invention
It encloses, various changes and improvements is done to it under the premise of without departing from range involved by basic content of the present invention, should not exclude at this
Except the protection scope of invention.
Claims (1)
1. a kind of group's Advances in protein structure prediction based on dynamic fragment length, it is characterised in that: the method includes with
Lower step:
1) sequence information of testing protein is inputted, and obtains fragment library from ROBETTA server;
2) parameter setting: setting population scale NP, crossover probability CR, temperature factor KT, maximum number of iterations Gmax, fragment length
Collect l={ l1,l2,...,lM, the select probability p of each fragment lengthm, m=1,2 ..., M, and initialize the number of iterations g=
0, wherein M is the scale of fragment length collection;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1,
C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, then basis
Energy is from low to high ranked up the sub- population;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as Clbest;
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc;
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses a segment
Length lm;
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation ClbestIn
The segment of corresponding position generates variation conformation Cmutant;
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length be lm
Segment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate test structure
As Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial;
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1,2 ..., M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810986058.6A CN109147867B (en) | 2018-08-28 | 2018-08-28 | Group protein structure prediction method based on dynamic segment length |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810986058.6A CN109147867B (en) | 2018-08-28 | 2018-08-28 | Group protein structure prediction method based on dynamic segment length |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109147867A true CN109147867A (en) | 2019-01-04 |
CN109147867B CN109147867B (en) | 2021-06-18 |
Family
ID=64828442
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810986058.6A Active CN109147867B (en) | 2018-08-28 | 2018-08-28 | Group protein structure prediction method based on dynamic segment length |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109147867B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109872770A (en) * | 2019-01-09 | 2019-06-11 | 浙江工业大学 | A kind of multi-Vari strategy Advances in protein structure prediction of combination exclusion degree evaluation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778059A (en) * | 2016-12-19 | 2017-05-31 | 浙江工业大学 | A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements |
CN108334746A (en) * | 2018-01-15 | 2018-07-27 | 浙江工业大学 | A kind of Advances in protein structure prediction based on secondary structure similarity |
-
2018
- 2018-08-28 CN CN201810986058.6A patent/CN109147867B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778059A (en) * | 2016-12-19 | 2017-05-31 | 浙江工业大学 | A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements |
CN108334746A (en) * | 2018-01-15 | 2018-07-27 | 浙江工业大学 | A kind of Advances in protein structure prediction based on secondary structure similarity |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109872770A (en) * | 2019-01-09 | 2019-06-11 | 浙江工业大学 | A kind of multi-Vari strategy Advances in protein structure prediction of combination exclusion degree evaluation |
CN109872770B (en) * | 2019-01-09 | 2020-10-30 | 浙江工业大学 | Variable strategy protein structure prediction method combined with displacement degree evaluation |
Also Published As
Publication number | Publication date |
---|---|
CN109147867B (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106778059B (en) | A kind of group's Advances in protein structure prediction based on Rosetta local enhancement | |
CN108334746B (en) | Protein structure prediction method based on secondary structure similarity | |
CN108632764B (en) | Multi-sensor selective measurement data fusion estimation method | |
Onischenko et al. | Maturation kinetics of a multiprotein complex revealed by metabolic labeling | |
CN111709318B (en) | High-resolution remote sensing image classification method based on generation countermeasure network | |
CN109767437A (en) | Thermal-induced imagery defect characteristic extracting method based on k mean value dynamic multi-objective | |
CN107092812B (en) | Method for identifying key protein based on genetic algorithm in PPI network | |
CN111210869A (en) | Protein cryoelectron microscope structure analysis model training method and analysis method | |
Samee et al. | Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data | |
Chaudhari et al. | DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins | |
CN109448784A (en) | A kind of Advances in protein structure prediction based on the selection of dihedral angle information auxiliary energy function | |
CN109524058A (en) | A kind of protein dimer Structure Prediction Methods based on differential evolution | |
CN110491443B (en) | lncRNA protein correlation prediction method based on projection neighborhood non-negative matrix decomposition | |
CN109147867A (en) | A kind of group's Advances in protein structure prediction based on dynamic fragment length | |
CN109360597A (en) | A kind of group's Advances in protein structure prediction based on global and local policy cooperation | |
CN109360601B (en) | Multi-modal protein structure prediction method based on displacement strategy | |
CN106471509A (en) | It is derived from method, equipment and the computer program of the chromosome of one or more organisms for assembling | |
Zheng et al. | Enhancing diversity for NSGA-II in evolutionary multi-objective optimization | |
CN109411013A (en) | A kind of group's Advances in protein structure prediction based on the specific Mutation Strategy of individual | |
CN109326318B (en) | Group protein structure prediction method based on Loop region Gaussian disturbance | |
CN113139334A (en) | Simulation optimization method based on bee colony | |
CN109300503A (en) | A kind of group's Advances in protein structure prediction of global and local Lower Bound Estimation collaboration | |
CN109448785A (en) | A kind of Advances in protein structure prediction using ramachandran map Ramachandran enhancing Loop regional structure | |
CN109326321A (en) | A kind of k- neighbour's Advances in protein structure prediction based on abstract convex estimation | |
Dong et al. | Hadronic Top Quark Polarimetry with ParticleNet |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |