CN109147867A - A kind of group's Advances in protein structure prediction based on dynamic fragment length - Google Patents

A kind of group's Advances in protein structure prediction based on dynamic fragment length Download PDF

Info

Publication number
CN109147867A
CN109147867A CN201810986058.6A CN201810986058A CN109147867A CN 109147867 A CN109147867 A CN 109147867A CN 201810986058 A CN201810986058 A CN 201810986058A CN 109147867 A CN109147867 A CN 109147867A
Authority
CN
China
Prior art keywords
conformation
population
fragment length
trial
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810986058.6A
Other languages
Chinese (zh)
Other versions
CN109147867B (en
Inventor
周晓根
张贵军
彭春祥
胡俊
刘俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810986058.6A priority Critical patent/CN109147867B/en
Publication of CN109147867A publication Critical patent/CN109147867A/en
Application granted granted Critical
Publication of CN109147867B publication Critical patent/CN109147867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of group's Advances in protein structure prediction based on dynamic fragment length, for each target conformation, half conformation individual is randomly choosed from current population and establishes sub- population, and population is ranked up according to energy value, and a conformation is randomly choosed in conformation in the top to instruct to make a variation;In mutation process, multiple fragment lengths are designed, and its probability selected is calculated according to the success rate in early period of each fragment length, select probability is then based on according to the mode of roulette, a fragment length is selected to carry out segment exchange, realize mutation process.The present invention provides a kind of precision of prediction and the higher group's Advances in protein structure prediction based on dynamic fragment length of search efficiency.

Description

A kind of group's Advances in protein structure prediction based on dynamic fragment length
Technical field
The present invention relates to a kind of biological information, intelligent optimization, computer application field more particularly to a kind of bases In group's Advances in protein structure prediction of dynamic fragment length.
Background technique
Nineteen sixty-five, Nirenberg and Khorana etc. have found triplet genetic code (i.e. the first genetic code), and DNA is with three A nucleotide is that one group of codon translates to protein amino acid sequence (i.e. prlmary structure of protein);And protein is only rolled over Its specific biological function could be generated by building up specific three-dimensional structure (i.e. tertiary protein structure).Relative to the first heredity Password, the corresponding relationship (i.e. the second genetic code or code of folding) between protein sequence primary structure and its tertiary structure It is still unsolved mystery.In order to solve protein folding this " asking for century ", more and more there is grinding for different discipline backgrounds The person of studying carefully participates, and especially for terminal-protein structure prediction of protein folding procedure, it is extensive to receive scientific circles Concern and research.For opposing proteins fold, protein structure prediction has stronger practicability, only acquisition protein Three-dimensional structure, could really realize gene diagnosis, and be finally reached gene therapy purpose.
Currently, the experimental method of measurement protein three-dimensional structure includes X-ray crystal diffraction, multi-dimensional nmr (NMR) With electron cryo-microscopy etc..X-ray crystal diffraction is current measurement protein structure most efficient method, and precision achieved is other What method cannot compare, major defect is that protein crystal is difficult to cultivate and the period of crystal structure determination is longer;NMR method The conformation of protein in the solution can directly be measured, but it is big to the requirement of sample, purity requirement is high, can only measure at present Small protein.Secondly, these experimental determining methods are expensive, the three-dimensional structure for measuring a protein needs hundreds of thousands Dollar, however, the primary amino acid sequences of one protein of measurement only need 1000 dollars or so, so as to cause protein sequence and Wide gap between three-dimensional structure measurement is increasing.Therefore, such as how computer is tool, with algorithm appropriate, from amino Acid sequence, which sets out, directly predicts the three-dimensional structure of protein, becomes a kind of important research topic in current biological informatics.
Ab initio prediction method is current most effective Advances in protein structure prediction.According to thermodynamics hypothesis, minimum energy Conformation be considered as conformation closest to native state.Therefore, it in ab initio prediction method, is evaluated using energy function each The quality of conformation, and search for using optimization method the conformation of minimum energy.In order to reduce the complexity of optimization algorithm, and mention High precision of prediction, ab initio prediction method selects albumen similar with search sequence from existing protein pool, and is directed to each residue Corresponding fragment library is established in position.In search process, new structure is generated by selecting segment to carry out assembling from corresponding fragment library As, and made a variation by segment exchange of the tripe systems as between.However, in mutation process, the fragment length that is exchanged Determination is the work of a challenge, and too long will lead to of fragment length destroys more excellent conformation, and fragment length is too short and can reduce calculation Method search speed, to influence search speed and precision of prediction.
Therefore, there is defects in terms of precision of prediction and search efficiency for existing Advances in protein structure prediction, need It improves.
Summary of the invention
In order to overcome precision of prediction and the lower deficiency of search efficiency, the present invention of existing Advances in protein structure prediction to mention A kind of precision of prediction and the higher group's Advances in protein structure prediction based on dynamic fragment length of search efficiency out.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of group's Advances in protein structure prediction based on dynamic fragment length, the described method comprises the following steps:
1) sequence information of testing protein is inputted, and from ROBETTA server (http://www.robetta.org/) On obtain fragment library;
2) parameter setting: setting population scale NP, crossover probability CR, temperature factor KT, maximum number of iterations Gmax, segment Length collection l={ l1,l2,...,lM, the select probability p of each fragment lengthm, m=1,2 ..., M, and initialize the number of iterations G=0, wherein M is the scale of fragment length collection;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1, C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, so The sub- population is ranked up from low to high according to energy afterwards;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as Clbest
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses one Fragment length lm
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation ClbestThe segment of middle corresponding position generates variation conformation Cmutant
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length For lmSegment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate survey Try conformation Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1, 2,...,M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
Technical concept of the invention are as follows: be directed to each target conformation, half conformation individual is randomly choosed from current population Sub- population is established, and population is ranked up according to energy value, and randomly chooses a conformation in conformation in the top to refer to Lead variation;In mutation process, multiple fragment lengths are designed, and it is calculated according to the success rate in early period of each fragment length Then the probability selected is based on select probability according to the mode of roulette and a fragment length is selected to carry out segment exchange, real Existing mutation process.It is higher based on Advances in protein structure prediction that the present invention provides a kind of precision of prediction and search efficiency.
Beneficial effects of the present invention are shown: on the one hand, being instructed to make a variation by local optimum conformation, can not only be kept The diversity of conformation, and search speed can be accelerated;On the other hand, segment exchange is carried out according to dynamic fragment length, added Interactive process between fast difference albumen, improves search efficiency.
Detailed description of the invention
Fig. 1 is that group's Advances in protein structure prediction based on dynamic fragment length carries out structure prediction to protein 1GYZ When conformation update schematic diagram.
Fig. 2 is that group's Advances in protein structure prediction based on dynamic fragment length carries out structure prediction to protein 1GYZ When obtained conformation distribution map.
Fig. 3 is that group's Advances in protein structure prediction based on dynamic fragment length carries out structure prediction to protein 1GYZ Obtained tomograph.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Fig.1~Fig. 3, a kind of group's Advances in protein structure prediction based on dynamic fragment length, including following step It is rapid:
1) sequence information of testing protein is inputted, and from ROBETTA server (http://www.robetta.org/) On obtain fragment library;
2) parameter setting: setting population scale NP, crossover probability CR, temperature factor KT, maximum number of iterations Gmax, segment Length collection l={ l1,l2,...,lM, the select probability p of each fragment lengthm, m=1,2 ..., M, and initialize the number of iterations G=0, wherein M is the scale of fragment length collection;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1, C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, so The sub- population is ranked up from low to high according to energy afterwards;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as Clbest
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses one Fragment length lm
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation ClbestThe segment of middle corresponding position generates variation conformation Cmutant
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length For lmSegment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate survey Try conformation Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1, 2,...,M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
The α unfolded protein 1GYZ that the present embodiment sequence length is 60 is embodiment, a kind of based on dynamic fragment length Group's Advances in protein structure prediction, wherein comprising the steps of:
1) sequence information of testing protein is inputted, and from ROBETTA server (http://www.robetta.org/) On obtain fragment library;
2) parameter setting: setting population scale NP=50, crossover probability CR=0.5, temperature factor KT=2, greatest iteration Number Gmax=1000, fragment length collection l={ 3,6,9,12 }, i.e. l1=3, l2=6, l3=9, l4=12, each fragment length Select probability pm=0.5, m=1,2 ..., M, and initialize the number of iterations g=0, wherein M=4;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1, C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, so The sub- population is ranked up from low to high according to energy afterwards;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as Clbest
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses one Fragment length lm
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation ClbestThe segment of middle corresponding position generates variation conformation Cmutant
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length For lmSegment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate survey Try conformation Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1, 2,...,M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
The α unfolded protein 1GYZ for being 60 using sequence length has obtained the protein with above method as embodiment Nearly native state conformation, lowest mean square root deviation areAverage root-mean-square deviation isPre- geodesic structure is as shown in Figure 3.
Described above is that the present invention is obtained as example using protein 1GYZ as a result, and non-limiting implementation model of the invention It encloses, various changes and improvements is done to it under the premise of without departing from range involved by basic content of the present invention, should not exclude at this Except the protection scope of invention.

Claims (1)

1. a kind of group's Advances in protein structure prediction based on dynamic fragment length, it is characterised in that: the method includes with Lower step:
1) sequence information of testing protein is inputted, and obtains fragment library from ROBETTA server;
2) parameter setting: setting population scale NP, crossover probability CR, temperature factor KT, maximum number of iterations Gmax, fragment length Collect l={ l1,l2,...,lM, the select probability p of each fragment lengthm, m=1,2 ..., M, and initialize the number of iterations g= 0, wherein M is the scale of fragment length collection;
3) segment assembling is randomly choosed from the corresponding fragment library in each residue position generates initial configurations population Pinitial={ C1, C2,...,CNP, wherein Ci, i={ 1,2 ..., NP } is i-th of conformation individual in population P;
4) energy value of each conformation individual in current population is calculated according to Rosetta Score3;
5) to each conformation C in populationi, i ∈ 1,2 ..., and NP } it performs the following operations:
5.1) by conformation CiRegard target conformation as, selects NP/2 conformation at random from current population and set up sub- population, then basis Energy is from low to high ranked up the sub- population;
5.2) from the preceding NP/5 conformation of the sub- population after sequence, one and C is randomly choosediDifferent conformations, is denoted as Clbest
5.3) three different and and C are randomly choosed from current populationiAnd ClbestDifferent conformation Ca、CbAnd Cc
5.4) roulette algorithm is utilized, according to the select probability of each fragment length, is concentrated from fragment length and chooses a segment Length lm
5.5) respectively from Ca、CbAnd CcThe different length in one residue position of middle random selection is lmSegment replace conformation ClbestIn The segment of corresponding position generates variation conformation Cmutant
5.6) random to generate a decimal R between 0 and 1, if R < CR, from conformation CiIn randomly select a length be lm Segment replacement variation conformation CmutantThe segment of middle corresponding position, and a random fragment assembling is carried out, to generate test structure As Ctrial;Otherwise variation conformation is directly subjected to a random fragment assembling and generates test conformation Ctrial
5.7) test conformation C is calculated according to Rosetta score3 energy functiontrialEnergy value;
If 5.8) CtrialEnergy value be less than CiEnergy value, then CtrialReplace Ci;Otherwise according to Boltzmann probabilityReceive Ctrial, wherein Δ E is CtrialEnergy value and CiEnergy value miss absolute value of the difference;
If 5.9) CtrialReceived in step 5.8), then number of success of m-th of fragment length in g generation
6) g=g+1 updates the select probability p of each fragment length according to formula (1) if g > 20m, m=1,2 ..., M;
If 7) g > Gmax, then the final pre- geodesic structure of conformation conduct of minimum energy is exported, otherwise return step 5).
CN201810986058.6A 2018-08-28 2018-08-28 Group protein structure prediction method based on dynamic segment length Active CN109147867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810986058.6A CN109147867B (en) 2018-08-28 2018-08-28 Group protein structure prediction method based on dynamic segment length

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810986058.6A CN109147867B (en) 2018-08-28 2018-08-28 Group protein structure prediction method based on dynamic segment length

Publications (2)

Publication Number Publication Date
CN109147867A true CN109147867A (en) 2019-01-04
CN109147867B CN109147867B (en) 2021-06-18

Family

ID=64828442

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810986058.6A Active CN109147867B (en) 2018-08-28 2018-08-28 Group protein structure prediction method based on dynamic segment length

Country Status (1)

Country Link
CN (1) CN109147867B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872770A (en) * 2019-01-09 2019-06-11 浙江工业大学 A kind of multi-Vari strategy Advances in protein structure prediction of combination exclusion degree evaluation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778059A (en) * 2016-12-19 2017-05-31 浙江工业大学 A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements
CN108334746A (en) * 2018-01-15 2018-07-27 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778059A (en) * 2016-12-19 2017-05-31 浙江工业大学 A kind of colony's Advances in protein structure prediction based on Rosetta local enhancements
CN108334746A (en) * 2018-01-15 2018-07-27 浙江工业大学 A kind of Advances in protein structure prediction based on secondary structure similarity

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109872770A (en) * 2019-01-09 2019-06-11 浙江工业大学 A kind of multi-Vari strategy Advances in protein structure prediction of combination exclusion degree evaluation
CN109872770B (en) * 2019-01-09 2020-10-30 浙江工业大学 Variable strategy protein structure prediction method combined with displacement degree evaluation

Also Published As

Publication number Publication date
CN109147867B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
Danev et al. Cryo-electron microscopy methodology: current aspects and future directions
CN106778059B (en) A kind of group&#39;s Advances in protein structure prediction based on Rosetta local enhancement
CN108334746B (en) Protein structure prediction method based on secondary structure similarity
CN108632764B (en) Multi-sensor selective measurement data fusion estimation method
CN109767437A (en) Thermal-induced imagery defect characteristic extracting method based on k mean value dynamic multi-objective
Lu et al. Palettailor: Discriminable colorization for categorical data
CN107092812B (en) Method for identifying key protein based on genetic algorithm in PPI network
CN111210869A (en) Protein cryoelectron microscope structure analysis model training method and analysis method
CN106055920B (en) It is a kind of based on the Advances in protein structure prediction that tactful copy exchanges more than stage
CN113066527B (en) Target prediction method and system for siRNA knockdown mRNA
Samee et al. Evaluating thermodynamic models of enhancer activity on cellular resolution gene expression data
CN111709318A (en) High-resolution remote sensing image classification method based on generation countermeasure network
Chaudhari et al. DeepRMethylSite: a deep learning based approach for prediction of arginine methylation sites in proteins
CN109448784A (en) A kind of Advances in protein structure prediction based on the selection of dihedral angle information auxiliary energy function
CN109524058A (en) A kind of protein dimer Structure Prediction Methods based on differential evolution
CN110491443B (en) lncRNA protein correlation prediction method based on projection neighborhood non-negative matrix decomposition
CN109147867A (en) A kind of group&#39;s Advances in protein structure prediction based on dynamic fragment length
CN109360597A (en) A kind of group&#39;s Advances in protein structure prediction based on global and local policy cooperation
CN109360601B (en) Multi-modal protein structure prediction method based on displacement strategy
Bai et al. Hierarchical clustering split for low-bias evaluation of drug-target interaction prediction
CN106471509A (en) It is derived from method, equipment and the computer program of the chromosome of one or more organisms for assembling
Zheng et al. Enhancing diversity for NSGA-II in evolutionary multi-objective optimization
CN109411013A (en) A kind of group&#39;s Advances in protein structure prediction based on the specific Mutation Strategy of individual
CN109326318B (en) Group protein structure prediction method based on Loop region Gaussian disturbance
CN113139334A (en) Simulation optimization method based on bee colony

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant