CN112085244A

CN112085244A - Residue contact map-based multi-objective optimization protein structure prediction method

Info

Publication number: CN112085244A
Application number: CN202010704125.8A
Authority: CN
Inventors: 张贵军; 陈芳; 彭春祥; 李亭; 刘俊; 周晓根
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2020-12-15

Abstract

A multi-objective optimization protein structure prediction method based on a residue contact map comprises the following steps: firstly, predicting a residue Contact map of a target protein sequence by utilizing tripletRNAs, MetaPSICOV, RaptorX and STOP-Contact; secondly, designing a scoring function to initialize the population; then, sampling the population through fragment recombination and assembly; finally, aiming at different prediction servers, four contact energies are obtained through calculation according to the designed contact energy function, and a weighted scoring function E of a harmonic mean and a standard deviation is adopted_total(C_n) All conformations in the population are ranked and the first conformation in the last generation is selected as the predicted result. The invention can relieveThe single residue contact map prediction is inaccurate, so that the protein is misfolded, the diversity is increased, and the overall prediction accuracy is improved.

Description

Residue contact map-based multi-objective optimization protein structure prediction method

Technical Field

The invention relates to the fields of bioinformatics and computer application, in particular to a residue contact map-based multi-objective optimization protein structure prediction method.

Background

Proteins are the cornerstone of life, almost all cellular activities are involved in proteins, and the three-dimensional structure of proteins determines their specific biological functions. Therefore, structural information of proteins is crucial in protein research. For example, the catalytic function of an enzyme is performed by a portion of the protein chain, i.e., the active sites exposed on the surface of the protein. Protein interactions and protein interactions with nucleic acids, inhibitors, activators are also limited to specific protein surface areas. Therefore, it is only possible to design targeted drugs that interact with the protein surface if the protein structure is known. At present, protein structure is mainly determined by means of biological wet experiments. However, the biological assay method has the disadvantages of complicated operation, long time consumption, high cost and the like. Therefore, based on computer technology, predicting the tertiary structure of a protein directly from the primary or secondary structure of the protein is a major problem in structural bioinformatics research.

Protein structure prediction mainly has the following two problems: (1) the problem of the ability to sample under high-dimensional and complex energy functions. (2) And the energy function model is inaccurate. At present, we are still far from constructing a sufficiently accurate force field that can direct the folding of the target sequence in the correct direction, resulting in a mathematically optimal solution that does not necessarily correspond to the native structure of the target protein. Aiming at the problem of inaccuracy of an energy function, the method helps to improve the prediction accuracy of the protein structure by using residue contact information to assist the energy function, but the currently predicted residue contact information is inaccurate and easily influences the overall prediction accuracy.

Therefore, improvement is needed to solve the problem that the existing inaccurate residue contact information results in insufficient accuracy of protein structure prediction.

Disclosure of Invention

In order to solve the problem that the prediction accuracy of a protein structure is insufficient due to inaccurate contact information of a single residue, the invention provides a multi-objective optimization protein structure prediction method based on a residue contact map.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for residue contact map-based multi-objective optimized protein structure prediction, the method comprising the steps of:

1) inputting sequence information of a target protein, and predicting a residue Contact map of the target sequence by using a triplets server (zhanglab, ccmb, med, umich, edu/resttriplet), a metapsiccov server (bioif, cs, ucl, ac, uk/psipled), a RaptorX server (RaptorX, uchicago, edu/ContactMap) and a STOP-Contact server (spark-lab, org/server/shot-Contact);

2) acquiring fragment library files of 3 fragments and 9 fragments from a ROBETTA server (http:// www.robetta.org /) according to a target protein sequence;

3) according to the confidence coefficient between residue pairs in the residue contact map, arranging the confidence coefficient of the residue pairs from large to small, and selecting the first 2L pieces of contact information, wherein L is the sequence length of the target protein;

4) setting parameters: population size NP, maximum number of iterations G, temperature factor beta, omega_T，ω_P，ω_RAnd ω_SRespectively representing the weights corresponding to Contact energies calculated from residue Contact maps predicted by the servers TripletRes, MetaPSICOV, RaptorX and STOP-Contact;

5) setting G ═ 1, G ∈ {1, 2.., G };

6) population initialization, the process is as follows:

6.1) first and second phases using the Rosetta protocolGeneration of 2NP initial conformations C_n，n＝{1,2，…,2NP}；

6.2) calculating the energy of each individual in the population using the Rosetta score3 energy function, Individual C_nHas an energy of score3 (C)_n) And calculating four contact energies E according to the residue contact maps predicted by the four different contact prediction servers_C(C_n) And are respectively denoted as E_C1(C_n)，E_C2(C_n)，E_C3(C_n) And E_C4(C_n) In which E_C(C_n) The calculation formula of (a) is as follows:

wherein N is the total number of residue pairs,

is the confidence of the contact of the kth residue in the residue contact map to the (i, j),

is the distance between the kth residue pair (i, j) in the target protein, d_conIs a threshold value of 8, E_C1(C_n) Representing contact energy calculated from residue contact maps predicted by the TripletRes server, E_C2(C_n) Representing contact energy calculated from residue contact maps predicted by MetaPSICOV servers, E_C3(C_n) Representing the calculated contact energy, E, from a residue contact map predicted by a Raptorx server_C4(C_n) Represents the Contact energy calculated from the residue Contact map predicted by the STOP-Contact server;

6.3) according to E_total(C_n) Sequencing the individuals in the population from low to high, taking the first NP individuals as an initialization population, E_total(C_n) The calculation formula of (a) is as follows:

7) conformational recombination, assembly, as follows:

7.1)C_nfor the nth individual in the population and selecting two conformations C in the population which are different from the target conformation_n1，C_n2Three different fragments were randomly selected in these two conformations, replacing conformation C_nFragment of the corresponding position in the sequence to generate a new recombinant conformation C'_n；

7.2) according to each conformation C 'in the starting population'_nAssembly of fragments 3 and 9 to give a new conformation C ″_n；

7.3) if score3 (C'_n)＞score3(C″_n) Then, the conformation C ″, is used_nSubstitute for C'_nOtherwise, receiving the new conformation by using the Boltzmann probability to generate new NP individuals of the child population, forming a new population together with the NP individuals in the parent population, wherein the new population is 2NP individuals, and realizing the update of the new population;

8) performing population updating operation, repeating the steps 6.2) to 6.3), and selecting front NP individuals as new populations;

9) setting G to G +1, and if G > G, executing step 10); otherwise, go to 7);

10) the first conformation in the G-th generation is output as the final predicted result.

The technical conception of the invention is as follows: a multi-objective optimization protein structure prediction method based on a residue contact map comprises the following steps: firstly, predicting a residue Contact map of a target protein sequence by utilizing tripletRNAs, MetaPSICOV, RaptorX and STOP-Contact; secondly, designing a scoring function to initialize the population; then, sampling the population through fragment recombination and assembly; finally, aiming at different prediction servers, four contact energies are obtained through calculation according to the designed contact energy function, and a weighted scoring function E of a harmonic mean and a standard deviation is adopted_total(C_n) All conformations in the population are ranked and the first conformation in the last generation is selected as the predicted result. The multi-objective optimization protein structure prediction method based on the residue contact map can relieve protein misfolding caused by inaccurate prediction of a single residue contact map, thereby increasing diversity and improving overall prediction accuracy.

The invention has the beneficial effects that: according to a multi-objective optimization strategy, the defect that the conformation is evaluated only by a single energy function or a single residue contact map is overcome by using four pieces of residue contact map information, the population diversity is increased, and the overall prediction precision is improved.

Drawings

FIG. 1 is a schematic diagram of conformational update in the structural prediction of protein 1BG8 by a residue contact map-based multi-objective optimization protein structure prediction method.

FIG. 2 is a three-dimensional structure diagram of protein 1BG8 predicted by a residue contact map-based multi-objective optimization protein structure prediction method.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a method for residue contact map-based multi-objective optimized protein structure prediction, the method comprising the steps of:

5) setting G ═ 1, G ∈ {1, 2.., G };

6) population initialization, the process is as follows:

6.1) generating 2NP initial constellations C by using the first and second stages of the Rosetta protocol_n，n＝{1,2，…,2NP}；

wherein N is the total number of residue pairs,

7) conformational recombination, assembly, as follows:

9) setting G to G +1, and if G > G, executing step 10); otherwise, go to 7);

This example illustrates protein 1BG8 with sequence length 76 as an example, a method for predicting protein structure based on residue contact map for multi-objective optimization, the method comprises the following steps:

4) setting parameters: the population size NP is 200, the maximum number of iterations G is 500, the temperature factor β is 2, ω_T＝0.213，ω_P＝0.189，ω_R0.233 and ω_S0.175 represents the weight corresponding to the Contact energy calculated from the residue Contact maps predicted by the servers TripletRes, MetaPSICOV, RaptorX and STOP-Contact, respectively;

5) setting G ═ 1, G ∈ {1, 2.., G };

6) population initialization, the process is as follows:

wherein N is the total number of residue pairs,

7) conformational recombination, assembly, as follows:

9) setting G to G +1, and if G > G, executing step 10); otherwise, go to 7);

Using the protein 1C8C with sequence length 76 as an example, the above method can obtain the near-native conformation of the protein, the conformation renewal scheme is shown in FIG. 1, and the root mean square deviation between the structure obtained after 500 generations of operation and the native structure is

The predicted three-dimensional structure is shown in fig. 2.

While the foregoing illustrates one embodiment of the invention showing advantageous results, it will be apparent that the invention is not limited to the above-described embodiment, but is capable of numerous modifications without departing from the basic inventive concepts and without exceeding the scope of the inventive concepts.

Claims

1. A protein structure prediction method based on multi-objective optimization is characterized by comprising the following steps: the method comprises the following steps:

1) inputting sequence information of a target protein, and predicting a residue Contact graph of the target sequence by using a tripleters server, a MetaPSICOV server, a Raptorx server and a STOP-Contact server;

2) acquiring fragment library files of 3 fragments and 9 fragments from a ROBETTA server according to a target protein sequence;

5) setting G ═ 1, G ∈ {1, 2.., G };

6) population initialization, the process is as follows:

wherein N is the total number of residue pairs,

7) conformational recombination, assembly, as follows:

9) setting G to G +1, and if G > G, executing step 10); otherwise, go to 7);