CN111180004A - Multi-contact information sub-population strategy protein structure prediction method - Google Patents

Multi-contact information sub-population strategy protein structure prediction method Download PDF

Info

Publication number
CN111180004A
CN111180004A CN201911197621.2A CN201911197621A CN111180004A CN 111180004 A CN111180004 A CN 111180004A CN 201911197621 A CN201911197621 A CN 201911197621A CN 111180004 A CN111180004 A CN 111180004A
Authority
CN
China
Prior art keywords
contact
target
population
trial
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911197621.2A
Other languages
Chinese (zh)
Other versions
CN111180004B (en
Inventor
张贵军
彭春祥
刘俊
周晓根
李亭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201911197621.2A priority Critical patent/CN111180004B/en
Publication of CN111180004A publication Critical patent/CN111180004A/en
Application granted granted Critical
Publication of CN111180004B publication Critical patent/CN111180004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Landscapes

  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

A method for predicting a protein structure of a multi-contact information sub-population strategy comprises the steps of firstly, initializing a population by utilizing a fragment assembly technology under an evolutionary algorithm framework; then, dividing the population into a plurality of sub-populations, carrying out variation on each individual in the sub-populations, and carrying out cross operation to generate a new conformation; in the selection link, a new structure is selected by using a Rosetta energy function score 3; then, using Scon(C) The new low energy conformations are further screened while preserving the diversity of conformations during selection by monte carlo probability acceptance criteria. The method utilizes the concept of the sub-population and combines the contact information auxiliary structure prediction predicted by a plurality of contact servers, so that the problem of inaccuracy of an energy function can be relieved, and the diversity of the population can be improved. The invention provides a method for predicting the protein structure of a sub-population strategy of multi-contact information, which has good diversity and high prediction precision.

Description

Multi-contact information sub-population strategy protein structure prediction method
Technical Field
The invention relates to the fields of bioinformatics and computer application, in particular to a method for predicting a protein structure of a multi-contact information sub-population strategy.
Background
Protein structure prediction is a major research content in structural bioinformatics. In the global protein structure prediction competition held by campfon, mexico (CASP13) at 12 months of 2018, AlphaFold, developed by the deep mind team under google, obtained the first total name. The most innovative and breakthrough place of AlphaFold is that the spatial distance relationship of the protein structure is predicted by using a machine learning method, and the spatial distance constraint is used as an energy function to guide the folding of the protein, so that the prediction precision is greatly improved. The work also shows that the deep cross fusion of the fields of computer technology, information technology and life science can effectively drive and accelerate the new discovery of science. However, de novo prediction methods currently face a number of difficulties and challenges.
First, due to the inaccuracy of energy models, the accuracy of inter-residue contact information is one of the key factors that currently restrict the accuracy of de novo protein structure prediction. Although the precision of prediction of contact information among residues reaches an unprecedented new era, the accuracy of the contact information is low, and the contact information predicted by each contact prediction server is uneven, so that the accuracy of the contact prediction and the precision of protein structure prediction do not form a good corresponding relation.
Second, the inherent complexity of spatial optimization of protein conformation makes it a very challenging research topic in the field of de novo protein structure prediction. In order to find unique native protein structures in a huge sampling space by using a computer, an efficient conformational space optimization algorithm must be designed to convert the native protein structures into a practical computational problem. The differential evolution algorithm (DE) has the advantages of simple structure, easy realization, strong robustness, high convergence speed and the like, and is widely applied in the field of protein conformation space optimization. However, as the amino acid sequence increases, the degree of freedom of a protein molecular system also increases, and obtaining a global optimal solution of a large-scale protein conformation space by using a traditional population algorithm under the condition of ensuring population diversity becomes challenging work.
Therefore, the conventional protein structure prediction methods are insufficient in diversity and prediction accuracy, and improvement is required.
Disclosure of Invention
In order to solve the problems of poor diversity and low prediction precision of the conventional protein structure prediction method in the sampling process, the invention firstly uses a plurality of contact prediction servers to predict the obtained contact information and then constructs a high-confidence contact set. Meanwhile, by utilizing the concept of the sub-population, different space constraint models are adopted for different sub-populations to assist the Rosettascore3 energy function to guide conformation selection. The invention provides a sub-population strategy protein structure prediction method of multi-contact information with good diversity and high prediction precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for predicting protein structure of multiple contact information sub-population strategy, comprising the following steps:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) predicting 3 contact maps from a Raptorx server (RaptorX. uchicago. edu/ContactMap), a ResTriplet server (zhangglab. ccmb. med. omich. edu/Restriplet) and a DNCON2 server (sysbio. rnet. missouri. edu/DNCON2) respectively according to a target protein sequence, and selecting L/5 contact information from large to small according to the confidence degree of each contact information in each contact map to form a high-confidence contact information set contact 1, a contact 2 and a contact cf3 respectively, wherein L is the length of the target protein sequence;
4) constructing a contact set contf 4 with high confidence according to the contact information of the contf 1, the contf 2 and the contcf 3, wherein the construction rule of the contf 4 is as follows:
4.1) adding contact information for each of contif 1, contif 2 and conticf 3 to conticf 4, respectively, if the residue pair does not overlap for the contact information in contif 1, contif 2 and conticf 3;
4.2) for the contact information in contictf 1, contictf 2 and conticf 3, if the residue pair is repeated, firstly, averaging the confidence degrees of the contact information repeated in contictf 1, contictf 2 and conticf 3, and then adding the average to the contictf 4;
4.3) sorting according to the confidence degree of the contact information in the contact 4 from big to small, and calculating the number Num of contacts in the contact 4;
5) setting parameters, namely a population size NP, a maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and setting the iteration algebra G to be 0;
6) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP, dividing NP initial constellations equally into 4 sub-populations
Figure BDA0002295058120000031
Figure BDA0002295058120000032
Figure BDA0002295058120000033
7) For each individual in the population CiThe following operations are carried out:
7.1) mixing CiSet as target individual CtargetRandomly selecting two different individuals C from the populationaAnd Cb,Ctarget≠Ca≠CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced
Figure BDA0002295058120000034
Corresponding position fragment generates variant conformation Cmutant
7.2) pairs of CmutantOne-time fragment assembly to generate new conformation Cmutant′;
7.3) generating a random number pCR, where pCR ∈ (0,1), if pCR < CR, then from CtargetIn the sequence, randomly selecting a 3-segment to replace to Cmutant' fragment of corresponding position generates test conformation CtrialOtherwise, directly handle Cmutant' As Ctrial
7.4) computing C using the Rosetta score3 energy functiontarget、CtrialEnergy score3 (C)trial)、score3(Ctarget);
7.5) if score3 (C)trial)>score3(Ctarget) Then C is retainedtarget
7.6) if score3 (C)trial)<score3(Ctarget) Then C is calculated according to equation (1)trialAnd CtargetIs a space constraint score of Scon(C),Scon(C) Is defined as follows;
Figure BDA0002295058120000035
wherein m and n are respectively the m-th residue and the n-th residue corresponding to the K-th contact in the high-confidence contact set, K is the number of contacts in the high-confidence contact set, dm,nEuclidean distance of the mth residue from the nth residue in conformation C, Um,nConfidence that the residue pair (m, n) corresponds to a contact in the high-confidence contact set, if
Figure BDA0002295058120000036
Contact 1 for high confidence contact set selection, if
Figure BDA0002295058120000037
Contact 2 for high confidence contact set selection, if
Figure BDA0002295058120000041
Contact 3 for high confidence contact set selection, if
Figure BDA0002295058120000042
High confidence contact set selects contact 4;
7.7) if Scon(Ctrial)<Scon(Ctarget) Then C istrialReplacement CtargetEntering a population;
7.8) if Scon(Ctrial)>Scon(Ctarget) Then C istrialWith probability PacceptReplacement CtargetEntering a population, and if the replacement is unsuccessful, retaining CtargetWherein P isacceptIs defined as follows;
Figure BDA0002295058120000043
8) g +1, and iteratively executing the steps 5) -8) until G is greater than G;
9) the lowest conformation of Rosetta score3 was exported as the final result.
The technical conception of the invention is as follows: under the framework of an evolutionary algorithm, first, a population is initialized using a fragment assembly technique. Then, dividing the population into a plurality of sub-populations, carrying out variation on each individual in the sub-populations, and carrying out cross operation to generate a new conformation; in the selection step, a new structure is selected by using a Rosetta energy function score3, and then S is usedcon(C) The new low energy conformations are further screened while preserving the diversity of conformations during selection by monte carlo probability acceptance criteria. The method utilizes the concept of the sub-population and combines the contact information auxiliary structure prediction predicted by a plurality of contact servers, so that the problem of inaccuracy of an energy function can be relieved, and the diversity of the population can be improved. The invention provides a method for predicting the protein structure of a sub-population strategy of multi-contact information, which has good diversity and high prediction precision.
The invention has the beneficial effects that: according to different sub-populations, different space constraint fractions are constructed to assist the Rosetta energy function score3 in selecting the conformation, so that the problem of prediction error caused by inaccuracy of the energy function is relieved, and the prediction accuracy is improved.
Drawings
FIG. 1 is a conformational distribution diagram obtained by protein 4UEX sampling by a subgroup strategy protein structure prediction method of multi-contact information.
FIG. 2 is a schematic diagram of conformation update of protein 4UEX in a multi-contact information sub-population strategy protein structure prediction method.
FIG. 3 is a three-dimensional structure predicted by a subgroup strategy protein structure prediction method of multi-contact information on a protein 4UEX structure.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a method for predicting protein structure of multiple contact information sub-population strategy, the method comprising the following steps:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) predicting 3 contact maps from a Raptorx server (RaptorX. uchicago. edu/ContactMap), a ResTriplet server (zhangglab. ccmb. med. omich. edu/Restriplet) and a DNCON2 server (sysbio. rnet. missouri. edu/DNCON2) respectively according to a target protein sequence, and selecting L/5 contact information from large to small according to the confidence degree of each contact information in each contact map to form a high-confidence contact information set contact 1, a contact 2 and a contact cf3 respectively, wherein L is the length of the target protein sequence;
4) constructing a contact set contf 4 with high confidence according to the contact information of the contf 1, the contf 2 and the contcf 3, wherein the construction rule of the contf 4 is as follows:
4.1) adding contact information for each of contif 1, contif 2 and conticf 3 to conticf 4, respectively, if the residue pair does not overlap for the contact information in contif 1, contif 2 and conticf 3;
4.2) for the contact information in contictf 1, contictf 2 and conticf 3, if the residue pair is repeated, firstly, averaging the confidence degrees of the contact information repeated in contictf 1, contictf 2 and conticf 3, and then adding the average to the contictf 4;
4.3) sorting according to the confidence degree of the contact information in the contact 4 from big to small, and calculating the number Num of contacts in the contact 4;
5) setting parameters, namely a population size NP, a maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and setting the iteration algebra G to be 0;
6) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP, dividing NP initial constellations equally into 4 sub-populations
Figure BDA0002295058120000051
Figure BDA0002295058120000052
Figure BDA0002295058120000053
7) For each individual in the population CiThe following operations are carried out:
7.1) mixing CiSet as target individual CtargetRandomly selecting two different individuals C from the populationaAnd Cb,Ctarget≠Ca≠CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced
Figure BDA0002295058120000061
Corresponding position fragment generates variant conformation Cmutant
7.2) pairs of CmutantOne-time fragment assembly to generate new conformation Cmutant′;
7.3) generating a random number pCR, where pCR ∈ (0,1), if pCR < CR, then from CtargetIn the sequence, randomly selecting a 3-segment to replace to Cmutant' fragment of corresponding position generates test conformation CtrialOtherwise, directly handle Cmutant' As Ctrial
7.4) computing C using the Rosetta score3 energy functiontarget、CtrialEnergy score3 (C)trial)、score3(Ctarget);
7.5) if score3 (C)trial)>score3(Ctarget) Then C is retainedtarget
7.6) if score3 (C)trial)<score3(Ctarget) Then C is calculated according to equation (1)trialAnd CtargetIs a space constraint score of Scon(C),Scon(C) Is defined as follows;
Figure BDA0002295058120000062
wherein m and n are respectively the m-th residue and the n-th residue corresponding to the K-th contact in the high-confidence contact set, K is the number of contacts in the high-confidence contact set, dm,nEuclidean distance of the mth residue from the nth residue in conformation C, Um,nConfidence that the residue pair (m, n) corresponds to a contact in the high-confidence contact set, if
Figure BDA0002295058120000063
Contact 1 for high confidence contact set selection, if
Figure BDA0002295058120000064
Contact 2 for high confidence contact set selection, if
Figure BDA0002295058120000065
Contact 3 for high confidence contact set selection, if
Figure BDA0002295058120000066
High confidence contact set selects contact 4;
7.7) if Scon(Ctrial)<Scon(Ctarget) Then C istrialReplacement CtargetEntering a population;
7.8) if Scon(Ctrial)>Scon(Ctarget) Then C istrialWith probability PacceptReplacement CtargetEntering a population, and if the replacement is unsuccessful, retaining CtargetWherein P isacceptIs defined as follows;
Figure BDA0002295058120000071
8) g +1, and iteratively executing the steps 5) -8) until G is greater than G;
9) the lowest conformation of Rosetta score3 was exported as the final result.
taking α protein 4UEX with the sequence length of 82 as an example, the method for predicting the protein structure of the multi-contact information sub-population strategy comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining a fragment library file from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence, wherein the fragment library file comprises a 3 fragment library file and a 9 fragment library file;
3) predicting 3 contact maps from a Raptorx server (RaptorX. uchicago. edu/ContactMap), a ResTriplet server (zhangglab. ccmb. med. omich. edu/Restriplet) and a DNCON2 server (sysbio. rnet. missouri. edu/DNCON2) respectively according to a target protein sequence, and selecting L/5 contact information from large to small according to the confidence degree of each contact information in each contact map to form a high-confidence contact information set contact 1, a contact 2 and a contact cf3 respectively, wherein L is the length of the target protein sequence;
4) constructing a contact set contf 4 with high confidence according to the contact information of the contf 1, the contf 2 and the contcf 3, wherein the construction rule of the contf 4 is as follows:
4.1) adding contact information for each of contif 1, contif 2 and conticf 3 to conticf 4, respectively, if the residue pair does not overlap for the contact information in contif 1, contif 2 and conticf 3;
4.2) for the contact information in contictf 1, contictf 2 and conticf 3, if the residue pair is repeated, firstly, averaging the confidence degrees of the contact information repeated in contictf 1, contictf 2 and conticf 3, and then adding the average to the contictf 4;
4.3) sorting according to the confidence degree of the contact information in the contact 4 from big to small, and calculating the number Num of contacts in the contact 4;
5) setting parameters, wherein the population size NP is 200, the maximum iteration algebra G of the algorithm is 6000, the cross factor CR is 0.5, the temperature factor β is 4, and the iteration algebra G is 0;
6) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP, dividing NP initial constellations equally into 4 sub-populations
Figure BDA0002295058120000072
Figure BDA0002295058120000073
Figure BDA0002295058120000081
7) For each individual in the population CiThe following operations are carried out:
7.1) mixing CiSet as target individual CtargetRandomly selecting two different individuals C from the populationaAnd Cb,Ctarget≠Ca≠CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced
Figure BDA0002295058120000082
Corresponding position fragment generates variant conformation Cmutant
7.2) pairs of CmutantOne-time fragment assembly to generate new conformation Cmutant′;
7.3) generating a random number pCR, where pCR ∈ (0,1), if pCR < CR, then from CtargetIn the sequence, randomly selecting a 3-segment to replace to Cmutant' fragment of corresponding position generates test conformation CtrialOtherwise, directly handle Cmutant' As Ctrial
7.4) computing C using the Rosetta score3 energy functiontarget、CtrialEnergy score3 (C)trial)、score3(Ctarget);
7.5) if score3 (C)trial)>score3(Ctarget) Then C is retainedtarget
7.6) if score3 (C)trial)<score3(Ctarget) Then C is calculated according to equation (1)trialAnd CtargetIs a space constraint score of Scon(C),Scon(C) Is defined as follows;
Figure BDA0002295058120000083
wherein m and n are respectively the m-th residue and the n-th residue corresponding to the K-th contact in the high-confidence contact set, K is the number of contacts in the high-confidence contact set, dm,nEuclidean distance of the mth residue from the nth residue in conformation C, Um,nConfidence that the residue pair (m, n) corresponds to a contact in the high-confidence contact set, if
Figure BDA0002295058120000084
Contact 1 for high confidence contact set selection, if
Figure BDA0002295058120000085
Contact 2 for high confidence contact set selection, if
Figure BDA0002295058120000086
Contact 3 for high confidence contact set selection, if
Figure BDA0002295058120000087
High confidence contact set selects contact 4;
7.7) if Scon(Ctrial)<Scon(Ctarget) Then C istrialReplacement CtargetEntering a population;
7.8) if Scon(Ctrial)>Scon(Ctarget) Then C istrialWith probability PacceptReplacement CtargetEntering a population, and if the replacement is unsuccessful, retaining CtargetWherein P isacceptIs defined as follows;
Figure BDA0002295058120000091
8) g +1, and iteratively executing the steps 5) -8) until G is greater than G;
9) the lowest conformation of Rosetta score3 was exported as the final result.
taking alpha protein 4UEX with sequence length of 82 as an example, the near-natural state conformation of the protein is obtained by using the method, and the average root mean square deviation between the structure obtained by running 6000 generations and the natural state structure is
Figure BDA0002295058120000092
Minimum root mean square deviation of
Figure BDA0002295058120000093
The predicted three-dimensional structure is shown in fig. 3.
The foregoing illustrates one example of the invention, and it will be apparent that the invention is not limited to the above-described embodiments, but may be practiced with various modifications without departing from the essential spirit of the invention and without departing from the spirit thereof.

Claims (1)

1. A method for predicting a protein structure of a multi-contact information sub-population strategy is characterized by comprising the following steps: the method comprises the following steps:
1) sequence information for a given protein of interest;
2) obtaining fragment library files from a ROBETTA server according to a target protein sequence, wherein the fragment library files comprise 3 fragment library files and 9 fragment library files;
3) respectively predicting 3 contact graphs from a Raptorx server, a Restriplet server and a DNCON2 server according to a target protein sequence, and respectively selecting L/5 contact information from large to small according to the confidence degree of each contact information in each contact graph to respectively form a high-confidence contact information set contact 1, contact 2 and a contact 3, wherein L is the length of the target protein sequence;
4) constructing a contact set contf 4 with high confidence according to the contact information of the contf 1, the contf 2 and the contcf 3, wherein the construction rule of the contf 4 is as follows:
4.1) adding contact information for each of contif 1, contif 2 and conticf 3 to conticf 4, respectively, if the residue pair does not overlap for the contact information in contif 1, contif 2 and conticf 3;
4.2) for the contact information in contictf 1, contictf 2 and conticf 3, if the residue pair is repeated, firstly, averaging the confidence degrees of the contact information repeated in contictf 1, contictf 2 and conticf 3, and then adding the average to the contictf 4;
4.3) sorting according to the confidence degree of the contact information in the contact 4 from big to small, and calculating the number Num of contacts in the contact 4;
5) setting parameters, namely a population size NP, a maximum iteration algebra G of the algorithm, a cross factor CR and a temperature factor beta, and setting the iteration algebra G to be 0;
6) population initialization: random fragment assembly to generate NP initial conformations CiI ═ 1,2, …, NP, dividing NP initial constellations equally into 4 sub-populations
Figure FDA0002295058110000011
Figure FDA0002295058110000012
Figure FDA0002295058110000013
7) For each individual in the population CiThe following operations are carried out:
7.1) mixing CiSet as target individual CtargetRandomly selecting two different individuals C from the populationaAnd Cb,Ctarget≠Ca≠CbFrom C, respectivelya、CbIn the method, a 9 segment with different positions is randomly selected and respectively replaced
Figure FDA0002295058110000021
Corresponding position fragment generates variant conformation Cmutant
7.2) pairs of CmutantOne-time fragment assembly to generate new conformation Cmutant′;
7.3) generating a random number pCR, where pCR ∈ (0,1), if pCR < CR, then from CtargetIn the sequence, randomly selecting a 3-segment to replace to Cmutant' fragment of corresponding position generates test conformation CtrialOtherwise, directly handle Cmutant' As Ctrial
7.4) computing C using the Rosetta score3 energy functiontarget、CtrialEnergy score3 (C)trial)、score3(Ctarget);
7.5) if score3 (C)trial)>score3(Ctarget) Then C is retainedtarget
7.6) if score3 (C)trial)<score3(Ctarget) Then C is calculated according to equation (1)trialAnd CtargetIs a space constraint score of Scon(C),Scon(C) Is defined as follows;
Figure FDA0002295058110000022
wherein m and n are respectively the m-th residue and the n-th residue corresponding to the K-th contact in the high-confidence contact set, K is the number of contacts in the high-confidence contact set, dm,nEuclidean distance of the mth residue from the nth residue in conformation C, Um,nConfidence that the residue pair (m, n) corresponds to a contact in the high-confidence contact set, if
Figure FDA0002295058110000023
Contact 1 for high confidence contact set selection, if
Figure FDA0002295058110000024
Contact 2 for high confidence contact set selection, if
Figure FDA0002295058110000025
Contact 3 for high confidence contact set selection, if
Figure FDA0002295058110000026
High confidence contact set selects contact 4;
7.7) if Scon(Ctrial)<Scon(Ctarget) Then C istrialReplacement CtargetEntering a population;
7.8) if Scon(Ctrial)>Scon(Ctarget) Then C istrialWith probability PacceptReplacement CtargetEntering a population, and if the replacement is unsuccessful, retaining CtargetWherein P isacceptIs defined as follows;
Figure FDA0002295058110000027
8) g +1, and iteratively executing the steps 5) -8) until G is greater than G;
9) the lowest conformation of Rosetta score3 was exported as the final result.
CN201911197621.2A 2019-11-29 2019-11-29 Multi-contact information sub-population strategy protein structure prediction method Active CN111180004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911197621.2A CN111180004B (en) 2019-11-29 2019-11-29 Multi-contact information sub-population strategy protein structure prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911197621.2A CN111180004B (en) 2019-11-29 2019-11-29 Multi-contact information sub-population strategy protein structure prediction method

Publications (2)

Publication Number Publication Date
CN111180004A true CN111180004A (en) 2020-05-19
CN111180004B CN111180004B (en) 2021-08-03

Family

ID=70656268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911197621.2A Active CN111180004B (en) 2019-11-29 2019-11-29 Multi-contact information sub-population strategy protein structure prediction method

Country Status (1)

Country Link
CN (1) CN111180004B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085244A (en) * 2020-07-21 2020-12-15 浙江工业大学 Residue contact map-based multi-objective optimization protein structure prediction method
CN112908408A (en) * 2021-03-03 2021-06-04 江苏海洋大学 Protein structure prediction method based on evolutionary algorithm and archive updating

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846256A (en) * 2018-06-07 2018-11-20 浙江工业大学 A kind of group's Advances in protein structure prediction based on contact residues information
CN109215733A (en) * 2018-08-30 2019-01-15 浙江工业大学 A kind of Advances in protein structure prediction based on contact residues information auxiliary evaluation
CN109360599A (en) * 2018-08-28 2019-02-19 浙江工业大学 A kind of Advances in protein structure prediction based on contact residues information Crossover Strategy
CN110148437A (en) * 2019-04-16 2019-08-20 浙江工业大学 A kind of Advances in protein structure prediction that contact residues auxiliary strategy is adaptive

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846256A (en) * 2018-06-07 2018-11-20 浙江工业大学 A kind of group's Advances in protein structure prediction based on contact residues information
CN109360599A (en) * 2018-08-28 2019-02-19 浙江工业大学 A kind of Advances in protein structure prediction based on contact residues information Crossover Strategy
CN109215733A (en) * 2018-08-30 2019-01-15 浙江工业大学 A kind of Advances in protein structure prediction based on contact residues information auxiliary evaluation
CN110148437A (en) * 2019-04-16 2019-08-20 浙江工业大学 A kind of Advances in protein structure prediction that contact residues auxiliary strategy is adaptive

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUI-JUN ZHANG等: "《Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction》", 《IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS》 *
MU GAO等: "《DESTINI: A deep-learning approach to contact-driven protein structure prediction》", 《SCIENTIFIC REPORTS》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085244A (en) * 2020-07-21 2020-12-15 浙江工业大学 Residue contact map-based multi-objective optimization protein structure prediction method
CN112908408A (en) * 2021-03-03 2021-06-04 江苏海洋大学 Protein structure prediction method based on evolutionary algorithm and archive updating
CN112908408B (en) * 2021-03-03 2023-09-22 江苏海洋大学 Protein structure prediction method based on evolutionary algorithm and archiving update

Also Published As

Publication number Publication date
CN111180004B (en) 2021-08-03

Similar Documents

Publication Publication Date Title
Su et al. Improved protein structure prediction using a new multi‐scale network and homologous templates
Nguyen et al. Ultra-large alignments using phylogeny-aware profiles
CN108846256B (en) Group protein structure prediction method based on residue contact information
CN107633157B (en) Protein conformation space optimization method based on distribution estimation and copy exchange strategy
CN111180004B (en) Multi-contact information sub-population strategy protein structure prediction method
CN110148437B (en) Residue contact auxiliary strategy self-adaptive protein structure prediction method
Mao et al. AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction
CN105760710A (en) Method for predicting protein structure on basis of two-stage differential evolution algorithm
Browning et al. Fast, accurate local ancestry inference with FLARE
Simoncini et al. Efficient sampling in fragment-based protein structure prediction using an estimation of distribution algorithm
CN109872770B (en) Variable strategy protein structure prediction method combined with displacement degree evaluation
Gao et al. High-performance deep learning toolbox for genome-scale prediction of protein structure and function
Feng et al. Artificial intelligence in bioinformatics: Automated methodology development for protein residue contact map prediction
Baldi et al. A machine learning strategy for protein analysis
CN109509510B (en) Protein structure prediction method based on multi-population ensemble variation strategy
CN109346126B (en) Adaptive protein structure prediction method of lower bound estimation strategy
Hong et al. fastmsa: Accelerating multiple sequence alignment with dense retrieval on protein language
Mourad et al. Designing pooling systems for noisy high-throughput protein-protein interaction experiments using Boolean compressed sensing
CN109448786B (en) Method for predicting protein structure by lower bound estimation dynamic strategy
CN109461471B (en) Adaptive protein structure prediction method based on championship mechanism
CN109300505B (en) Protein structure prediction method based on biased sampling
CN110600076B (en) Protein ATP docking method based on distance and angle information
CN116092576A (en) Protein structure optimization method and device
CN110197700B (en) Protein ATP docking method based on differential evolution
CN111161791B (en) Experimental data-assisted adaptive strategy protein structure prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200519

Assignee: ZHEJIANG ORIENT GENE BIOTECH CO.,LTD.

Assignor: JIANG University OF TECHNOLOGY

Contract record no.: X2023980053610

Denomination of invention: A Subpopulation Strategy Protein Structure Prediction Method Based on Multivariate Contact Information

Granted publication date: 20210803

License type: Common License

Record date: 20231222

EE01 Entry into force of recordation of patent licensing contract