CN110729023B

CN110729023B - Protein structure prediction method based on contact assistance of secondary structure elements

Info

Publication number: CN110729023B
Application number: CN201910805005.4A
Authority: CN
Inventors: 张贵军; 刘俊; 彭春祥; 饶亮; 周晓根; 胡俊
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-04-06
Anticipated expiration: 2039-08-29
Also published as: CN110729023A

Abstract

A protein structure prediction method based on secondary structure element contact assistance comprises the following steps of firstly, extracting contact information between secondary structure elements according to a predicted secondary structure and a residue contact map; then, a sequential sampling strategy is utilized to respectively sample the secondary structure elements and the loop areas connected with the secondary structure elements, so that a conformation with more accurate secondary structure and basically correct spatial position relationship among the secondary structure elements and higher topological structure precision is quickly obtained; and finally, the structure is further optimized by combining an energy function, and the overall prediction precision is improved. The invention provides a protein structure prediction method with high prediction precision based on secondary structure element contact assistance.

Description

Protein structure prediction method based on contact assistance of secondary structure elements

Technical Field

The invention relates to the fields of bioinformatics and computer application, in particular to a protein structure prediction method based on secondary structure element contact assistance.

Background

As the human genome project announced completion, the process of DNA transcription, translation into amino acid sequences (i.e., the first genetic code) has been broken by humans. However, it is an unblended puzzle how a protein folds from an amino acid sequence into a specific three-dimensional structure (second genetic code). The structure of the protein determines the specific biological function, and the efficient acquisition of the protein structure is very important for understanding the biological function, drug design and disease treatment.

At present, the three-dimensional structure of the protein is mainly obtained by an experimental determination method. The method for measuring the protein structure by experiment mainly comprises X-ray crystal diffraction, nuclear magnetic resonance and cryoelectron microscope technology. Such methods are complex, require extremely high time and capital investment, and are difficult to determine experimentally for most drug target proteins in terms of their three-dimensional structure.

The amino acid sequence of the protein contains three-dimensional structure information, and with the rapid development of artificial intelligence, the prior knowledge is mined from a known protein database according to the amino acid sequence information, and the three-dimensional structure of the protein is directly predicted from the amino acid sequence of the protein by utilizing a computer to simulate the protein folding process, which has become a development trend. Many research institutes worldwide have been dedicated to research on predicting three-dimensional structure of protein using biodata, artificial intelligence, and systematic optimization techniques, and gradually applied to disease diagnosis and drug design, among which representative research teams are David Baker's laboratory of washington, zhang laboratory of michigan university, and the like. More and more colleges and research institutions in China are also added to the research of protein structure prediction.

With the rapid development of inter-residue contact prediction, most protein structure prediction methods use inter-residue contact information to improve prediction accuracy. The secondary structure elements have obvious local characteristics, and the position relationship among the secondary structure elements directly determines the precision of the protein topological structure. However, the current method only considers the contact between residues and does not consider the space constraint between secondary structure elements.

Therefore, the current protein structure prediction method does not consider the space constraint between secondary structure elements, and needs to be improved.

Disclosure of Invention

In order to overcome the defect of low precision of the protein topological structure of the conventional protein structure prediction method, the invention provides a protein structure prediction method assisted by contact of secondary structure elements; firstly, extracting contact information among secondary structure elements according to a predicted secondary structure and a residue contact diagram, then quickly sampling a protein conformation space by using a sequential sampling strategy to generate a conformation with higher topological structure precision, and finally further optimizing the structure by combining an energy function to improve the overall prediction precision.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a protein structure prediction method based on contact assistance of secondary structure elements comprises the following steps:

1) inputting a target sequence, a fragment library, a predicted secondary structure and a residue contact map of a predicted protein;

2) extracting contact information between secondary structure elements, wherein the process comprises the following steps:

2.1) labeling each Secondary Structure element as SS based on the predicted Secondary Structure information_iWhere i ∈ {1,2, …, m }, m representing the number of secondary structure elements; note two adjacent secondary structure elements SS_iAnd SS_i+1The loop region between is L_kWhere k ∈ {1,2, …, m-1 };

2.2) for any pair of residues in the residue contact map, if the two residues belong to two different secondary structural elements SS_iAnd SS_jThen classify it into SS_iAnd SS_jContact set CSS_i,jWherein i and j are respectively in the state of {1,2, …, m }, and i < j;

3) setting parameters: maximum number of assembly times G for secondary structure sampling_SMaximum number of assembly times G for loop region sampling_LAssembling the maximum iteration times G of the segments guided by the energy function;

4) sequential sampling based on secondary structure matching degree, the process is as follows:

4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments₁Make SS₁Secondary structure constraints are met to the maximum extent; for example, if predicted SS₁Alpha helix, then sample to SS₁All residues in the sequence are alpha helices or reach the maximum assembly times G_S；

4.2) sampling SS in sequence according to the mode of the step 4.1)₂To SS_mThe conformation with more accurate secondary structure can be obtained;

5) sequential sampling based on contact constraint between secondary structure elements, the process is as follows:

5.1) assembling sample SS using fragments of 9 fragments₁And SS₂Loop region L between₁Up to SS₂And SS₁Satisfy the contact set CSS in the spatial position relationship therebetween_1,2Or up to a maximum number of assembly times G_L；

5.2) sampling L₂Make SS₃And SS₁And SS₂In the space betweenThe positional relationship satisfies the contact set CSS_1,3And CSS_2,3Or up to a maximum number of assembly times G_L；

5.3) and so on, sample L_kMake SS_k+1And SS₁To SS_kSatisfy the contact set CSS in the spatial position relationship therebetween_1,k+1To CSS_k,k+1Or up to a maximum number of iterations G_L；

5.4) sampling all L_kK belongs to {1,2, …, m-1}, and a conformation with accurate secondary structure and basically correct spatial position relationship among secondary structure elements and high topological structure precision can be obtained;

6) the structure was further optimized by combining the Rosetta score3 energy function as follows:

6.1) setting sampling probability: a secondary structural element SS_iSetting the sampling probability of all residues to be 0.3, and setting the sampling probability of all loop regions to be 1;

6.2) fragment assembly of 3 fragments is used, and the whole protein sequence is randomly sampled according to the sampling probability; calculating the energy of the pre-and post-assembly conformation of each fragment by using a Rosetta score3 energy function, and determining whether the assembly is successful according to a boltzmann criterion;

6.3) iterating step 6.2) until the maximum assembly times G is reached;

7) the resulting conformation is output as a predicted result.

The invention has the beneficial effects that: firstly, extracting contact information between secondary structure elements according to a predicted secondary structure and a residue contact map; then, a sequential sampling strategy is utilized to respectively sample the secondary structure elements and the loop areas connected with the secondary structure elements, so that a conformation with more accurate secondary structure and basically correct spatial position relationship among the secondary structure elements and higher topological structure precision is quickly obtained; and finally, the structure is further optimized by combining an energy function, and the overall prediction precision is improved.

Drawings

FIG. 1 is a schematic diagram of the extraction of the contact between secondary structure elements based on the protein structure prediction method assisted by the contact of the secondary structure elements, wherein the curve connecting different secondary structure elements indicates that the residues in two secondary structure elements are in contact.

FIG. 2 is an RMSD distribution diagram of a conformation sampled when a protein 2BL7 is predicted based on a secondary structure element contact-assisted protein structure prediction method.

FIG. 3 is a three-dimensional structure diagram of protein 2BL7 obtained by structure prediction based on secondary structure element contact-assisted protein structure prediction method.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1,2 and 3, a method for predicting protein structure based on contact assistance of secondary structural elements comprises the following steps:

4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments₁To makeGet SS₁Secondary structure constraints are met to the maximum extent; for example, if predicted SS₁Alpha helix, then sample to SS₁All residues in the sequence are alpha helices or reach the maximum assembly times G_S；

5.2) sampling L₂Make SS₃And SS₁And SS₂Satisfy the contact set CSS in the spatial position relationship therebetween_1,3And CSS_2,3Or up to a maximum number of assembly times G_L；

6.3) iterating step 6.2) until the maximum assembly times G is reached;

7) the resulting conformation is output as a predicted result.

In this embodiment, protein 2BL7 with a sequence length of 79 is taken as an example, and a method for predicting a protein structure based on contact assistance of secondary structural elements includes the following steps:

1) inputting a target sequence, a fragment library, a predicted secondary structure and a residue contact map of a predicted protein 2BL 7;

3) setting parameters: maximum number of assembly times G for secondary structure sampling_S500, loop region sample maximum assembly times G_L500, the maximum iteration number G of the segment assembly guided by the energy function is 2000;

4.2) according to the way of step 4.1),sampling SS in sequence₂To SS_mThe conformation with more accurate secondary structure can be obtained;

6.3) iterating step 6.2) until the maximum assembly times G is reached;

7) the resulting conformation is output as a predicted result.

Taking the protein 2BL7 with the amino acid sequence length of 79 as an example, the near-native conformation of the protein is obtained by prediction by the above method, the conformation update schematic diagram is shown in FIG. 1, and the predicted root mean square deviation of the protein is

The prediction structure is shown in fig. 2.

The foregoing is a predictive effect of one embodiment of the invention, which may be adapted not only to the above-described embodiment, but also to various modifications thereof without departing from the basic idea of the invention and without exceeding the gist of the invention.

Claims

1. A protein structure prediction method based on contact assistance of secondary structure elements is characterized by comprising the following steps:

4.1) sampling Secondary Structure element SS Assembly Using fragment of 9 fragments₁Make SS₁Secondary structure constraints are met to the maximum extent; if predicted SS₁Alpha helix, then sample to SS₁All residues in the sequence are alpha helices or reach the maximum assembly times G_S；

6.3) iterating step 6.2) until the maximum assembly times G is reached;

7) the resulting conformation is output as a predicted result.