CN110197700B

CN110197700B - A Differential Evolution-Based Protein ATP Docking Method

Info

Publication number: CN110197700B
Application number: CN201910302641.5A
Authority: CN
Inventors: 饶亮; 张贵军; 刘俊; 彭春祥; 胡俊; 周晓根
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2021-04-06
Anticipated expiration: 2039-04-16
Also published as: CN110197700A

Abstract

A protein ATP docking method based on differential evolution comprises the steps that firstly, an ATPbind server is used for predicting protein-ATP binding residue information, and the prediction precision of a compound molecular space structure is improved; then, the original protein-ATP structure prediction problem is converted into the optimization problem of searching the optimal individual through the design of the population individual, so that the calculation cost is reduced; and finally, searching for the optimal individual by using a differential evolution algorithm, so that the prediction precision of the protein-ATP compound structure is improved. The invention provides a protein ATP docking method based on differential evolution, which is low in calculation cost and high in search efficiency.

Description

Protein ATP docking method based on differential evolution

Technical Field

The invention relates to the fields of bioinformatics, intelligent optimization and computer application, in particular to a protein ATP docking method based on differential evolution.

Background

With the continuous and intensive research on proteins, the phenomenon that proteins are combined with small molecules or ligands is ubiquitous, and especially the combination of proteins and energy molecules is widely existed in various life phenomena, so that the research on the characteristics and the rule of the combination of proteins and ligands is necessary. ATP is an unstable, high-energy compound, also known as adenosine triphosphate. The hydrolysis releases more energy, which is the most direct energy source in organisms. In the cell, it can be interconverted with ADP to realize energy storage and release, thus ensuring energy supply of various vital activities of the cell. Many important physiological processes in the body, such as cell cycle regulation, anabolism, signal transduction, and the transmission of genetic information, depend on the interaction and recognition of proteins and ligand molecules. The molecular docking method has important significance for molecular mechanism research of life activities, biomolecular compound structure prediction, targeted drug screening and the like.

Classical thermodynamics holds that the complex structure formed by the interaction of protein and ligand molecules should be the conformation with the lowest binding free energy, and that rapid and accurate search for the conformation with the lowest energy is critical for protein-ligand molecule docking.

Therefore, molecular docking calculations require that the binding free energy be calculated as accurately as possible using mathematical models or functions, and efficient search algorithms are required to quickly find conformations with very low free energy. Conformation search in molecular docking is an extremely complex problem, and protein-ligand molecular docking requires searching for a conformation with low energy on one hand and searching for various possible situations in a short time on the other hand, so that a rapid and effective search algorithm is an important research field in molecular docking. The protein-ligand molecule docking conformation search method mainly comprises two categories of rapid exhaustive search and heuristic search. The region of ligand interaction may occur anywhere on the surface of the molecule and therefore often requires a global search, either by traversing various locations using a fast exhaustive search or by performing an approximate global search using heuristic algorithms.

Although the fast exhaustive algorithm can search the whole constellation space quickly, more wrong constellations are introduced at the same time, and the difficulty is increased for distinguishing the correct constellations. The heuristic search algorithm is to perform random translation and rotation operations on the ligand molecules in the docking system, optimize and accept and reject the operated ligand conformation according to the energy score, and finally find the ligand molecule conformation with the lowest energy. The heuristic Monte Carlo algorithm is a general search method, can randomly sample in the ligand conformation space and is not influenced by the conformation space structure and distribution. But this method may require a long calculation time to give a better solution. The RosettaDock program (Wang C, Schueler-Furman O, Baker D.Improved side-chain modifying for Protein-Protein linking [ J ]. Protein Science,2005,14(5): 1328-.

Therefore, the existing protein ATP molecular docking method has defects in calculation cost and search efficiency, and needs to be improved.

Disclosure of Invention

In order to overcome the defects of the existing protein and ATP docking method in the aspects of calculation cost and prediction accuracy, the invention provides a protein ATP docking method based on a differential evolution algorithm, which is low in calculation cost and high in prediction accuracy.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a differential evolution based protein ATP docking method, the method comprising the steps of:

1) inputting structural information of protein and ATP, and respectively marking as R and A;

2) for the input structure information R, the ATPbind server (https:// zhangglab. ccmb. med. umich. edu/ATPbind /) is used for predicting the residue site information bound by the protein-ATP, and n residues bound by the protein and the ATP are obtained and respectively marked as R₁,r₂,...,r_n；

3) According to r₁,r₂,...,r_nCentral carbon atom C of_αClustering coordinate information to obtain a central point C_RClustering a central point C according to the coordinate information of each atom in A_AMoving ATP to make C_AAnd C_RThe coordinates of the two points coincide;

4) clustering into three central points according to the coordinate information of each atom in A, wherein the three central points are called pseudo-atoms and are respectively expressed as

And

5) for each ATP molecule A in the PDB database^(j)J 1, 2.. times.n, which is clustered according to coordinate information of all atoms thereof

And

three central points in one-to-one correspondence

And

wherein N is the number of ATP in the PDB database;

6) for each central point of each ATP in the PDB database

Calculating C of the type T residue to which it binds_αDistance between atoms

Wherein T is one of the types of amino acid residues present in PDB;

7) calculating the kth central atom C of all ATP molecules in the database of arbitrary residue types T and PDB_kAnd k is 1,2,3, the average distance of interaction, denoted as D (C)_k,T)：

Wherein

8) According to step 7), respectively calculating the ATP central points C bound by all T-type residues in the PDB database_kAverage distance of interaction D (C)_k,T)；

9) Setting parameters: setting population size NP, scaling factor F, cross probability CR and maximum iteration number G_maxInitializing the iteration times G to be 0;

10) population initialization: randomly generating an initialization population P ═ S₁,S₂,...,S_i,...,S_NP}，S_i＝(s_i,1,s_i,2,s_i,3,s_i,4,s_i,5,s_i,6) Is the i-th individual of the population P, s_i,1、s_i,2、s_i,3、s_i,4、s_i,5And s_i,6Is S_iOf 6 elements of (a), wherein s_i,1、s_i,2And s_i,3Is in the value range of

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

11) for each individual in the population S_iThe protein was docked with ATP according to the following manner and the score (S) was calculated for that individual_i)：

11.1) according to S_iThe last three elements s in_i,4、s_i,5And s_i,6And calculating a three-dimensional space rotation matrix R:

11.2) will

The coordinates are rotated according to the rotation matrix R to respectively obtain three-dimensional coordinates

11.3) according to S_iThe first three elements s in_i,1、s_i,2、s_i,3Will rotate the obtained coordinates

The following translation process is carried out, and new three-dimensional coordinates C 'are calculated'₁，C’₂，C’₃：

Wherein C'_kIs a three-dimensional coordinate obtained after translation, for C'_kAnd

11.4) according to step 8), calculate the score (S)_i)：

score(S_i)＝∑|D_kT-D(C_k,T)|

Wherein D_kTIs C'_kWith a residue C of residue type T_αDistance of atoms, k ═ 1,2, 3;

12) according to a differential evolution algorithm, for each individual S in the population P_iI ∈ {1,2, …, NP } is processed as follows:

12.1) randomly selecting three different individuals S from the Current population P_a、S_bAnd S_cWherein a ≠ b ≠ c ≠ i, generating a mutated individual S according to the following equation_mutant：

S_mutant＝S_a+F·(S_b-S_c)

12.2) reaction of S_iThe element information in (1) is copied to the crossed individuals S_crossIn S, then_crossRandomly selects an element s from the 6 elements_cross,jUsing S_mutantOf (5) a corresponding element s_mutant,jAlternative, finally, for S_crossUsing a randomly generated random number R between 0 and 1 to control whether S is used or not_mutantReplacing the corresponding elements in: if R is less than CR, replacing, otherwise, not replacing;

12.3) according to step 11), respectively calculate S_crossAnd S_iCorresponding score (S)_cross) And score (S)_i)；

12.4) if score (S)_cross)＜score(S_i) Then use S_crossReplacing S in population P_iElse S_iRemaining in the population P;

13) g is G +1, if G > G_maxThen according to the individual S with lowest score in the current population P_lowAll the atomic coordinates in A are based on S_lowThe coordinates of the element information in (3) after rotation translation are output as final ligand position information, otherwise, the step 12) is returned to.

The technical conception of the invention is as follows: firstly, predicting protein ATP binding residue information by using an ATPbind server, thereby improving the prediction precision of the molecular space structure of the compound; then, the original protein-ATP structure prediction problem is converted into the optimization problem of searching the optimal individual through the design of the population individual, so that the calculation cost is reduced; and finally, searching for the optimal individual by using a differential evolution algorithm, so that the prediction precision of the protein-ATP compound structure is improved. The invention provides a protein-ATP docking method based on differential evolution, which is low in calculation cost and high in search efficiency.

The beneficial effects of the invention are as follows: on one hand, the ATPbind server is used for predicting the protein-ATP binding residue information, so that the prediction precision of the molecular space structure of the protein-ATP compound is improved; on the other hand, the protein-ATP docking prediction problem is converted into an optimization problem for selecting the optimal individual, and the optimal individual is searched by using a differential evolution algorithm, so that the efficiency and the accuracy of the protein-ATP docking prediction are improved.

Drawings

FIG. 1 is a schematic diagram of a protein ATP docking method based on differential evolution.

FIG. 2 is a diagram of a three-dimensional space structure of a complex obtained by predicting protein 1a0i and ATP by using a differential evolution-based protein ATP docking method.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a differential evolution-based protein to ATP docking method includes the following steps:

And

And

three central points in one-to-one correspondence

And

wherein N is the number of ATP in the PDB database;

6) for each central point of each ATP in the PDB database

Calculating C of the type T residue to which it binds_αDistance between atoms

Wherein T is one of the types of amino acid residues present in PDB;

Wherein

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

11.2) will

Wherein C'_kIs translated to obtainTo three-dimensional coordinate of C'_kAnd

k＝1,2,3；

11.4) according to step 8), calculate the score (S)_i)：

score(S_i)＝∑|D_kT-D(C_k,T)|

S_mutant＝S_a+F·(S_b-S_c)

13) g is G +1, if G > G_maxThen according to the individual S with lowest score in the current population P_lowAll the atomic coordinates in A are based on S_lowIn (1)And outputting the coordinates of the element information after the element information is subjected to rotation translation as final ligand position information, and otherwise, returning to the step 12).

In this embodiment, taking the three-dimensional space structure of the compound after predicting the docking of the protein 1a0i and ATP as an example, a protein ATP docking method based on differential evolution includes the following steps:

And

And

three central points in one-to-one correspondence

And

wherein N is the number of ATP in the PDB database;

6) for each central point of each ATP in the PDB database

Calculating C of the type T residue to which it binds_αDistance between atoms

Wherein T is one of the types of amino acid residues present in PDB;

Wherein

s_i,4、s_i,5And s_i,6The value range of (a) is 0 to 2 pi;

11.2) will

11.4) according to step 8), calculate the score (S)_i)：

score(S_i)＝∑|D_kT-D(C_k,T)|

S_mutant＝S_a+F·(S_b-S_c)

13) g is G +1, if G > G_maxThen according to the individual S with lowest score in the current population P_lowAll the atomic coordinates in A are based on S_lowThe coordinates of the element information in (1) after rotational translation are output as final ligand position informationOtherwise, return to step 12).

Taking the three-dimensional space structure of the protein 1a0i and ATP as an example, the three-dimensional space structure of the complex of the protein 1a0i and ATP obtained by the above method is shown in FIG. 2.

The above description is the prediction result of the protein 1a0i and ATP as examples in the present invention, and is not intended to limit the scope of the present invention, and various modifications and improvements can be made without departing from the scope of the present invention.

Claims

1. a protein ATP docking method based on differential evolution, is characterized in that: described docking method comprises the following steps:

1) Input the structural information of protein and ATP, denoted as R and A respectively;

2) For the input structural information R, use the ATPbind server to predict the residue site information of protein-ATP binding, and obtain n residues bound by protein and ATP, which are recorded as r ₁ , r ₂ , . . . .,r _n ;

3) According to the coordinate information of the central carbon atom C _α of _r ₁ , r ₂ ,..., rn, a center point C _R is clustered, and a center point C _A is clustered according to the coordinate information of each atom in A, and the ATP is moved. Make the coordinates of the two points C _A and C _R coincide;

4) According to the coordinate information of each atom in A, it is clustered into three center points, and the three center points are called quasi-atoms, which are respectively expressed as

and

5) For each ATP molecule A ^(j) in the PDB database, j=1,2,...,N, according to the coordinate information of all its atoms, cluster the

and

One-to-one correspondence of three center points

and

Wherein, N is the number of ATP in the PDB database;

6) For each center point of each ATP in the PDB database

Calculate the distance between it and the C _alpha atom of the residue of type T to which it binds

Wherein T is one of the amino acid residue types that have appeared in PDB;

7) Calculate the average distance of interaction between any residue type T and the kth central atom C _k of all ATP molecules in the PDB database, k=1, 2, 3, denoted as D(C _k ,T):

in

8) According to step 7), calculate the average interaction distance D(C _k ,T) of all T-type residues in the PDB database and the ATP center point C _k to which they are bound;

9) Parameter setting: set the population size NP, the scaling factor F, the crossover probability CR, the maximum number of iterations G _max , and the number of initialization iterations G=0;

10) Population initialization: randomly generate an initialization population P={S ₁ ,S ₂ ,...,S _i ,...,S _NP }, S _i =(s _i,1 ,s _i,2 ,s _{i, 3} , s _i,4 , s _i,5 , s _i,6 ) is the ith individual of the population P, s _i,1 , s _i,2 , s _i,3 , s _i,4 , s _i,5 and s _i,6 are 6 elements of S _i , where the value ranges of s _i,1 , s _i,2 and s _i,3 are

The value range of s _i,4 , s _i,5 and s _i,6 is 0 to 2π;

11) For each individual S _i in the population, dock the protein with ATP as follows, and calculate the individual's score (S _i ):

11.1) Calculate a three-dimensional space rotation matrix R according to the last three elements s _i _,4 , s _i,5 and s _i,6 in S i:

11.2) will

The coordinates are rotated according to the above rotation matrix R to obtain the three-dimensional coordinates respectively

11.3) According to the first three elements s _i _,1 , s _i,2 , s _i,3 in S i , rotate the obtained coordinates

Carry out the following translation process to calculate the new three-dimensional coordinates C' ₁ , C' ₂ , C' ₃ :

where C' _k is the three-dimensional coordinate obtained after translation, for C' _k and

11.4) According to step 8), calculate the score (S _i ):

score(S _i )=∑|D _kT -D(C _k ,T)|

where D _kT is the distance between C' _k and the C _α atom of residue type T, k = 1, 2, 3;

12) According to the differential evolution algorithm, each individual S _i ,i∈{1,2,…,NP} in the population P is processed as follows:

12.1) Randomly select three different individuals S _a , S _b and S _c from the current population P, where a≠b≠c≠i,

A mutant individual S _mutant is generated according to the following equation:

S _mutant =S _a +F·(S _b -S _c )

12.2) Copy the element information in S _i to the cross individual S _cross , then randomly select an element s _cross,j from the 6 elements of S _cross , and replace it with the corresponding element s _mutant,j in S _mutant , and finally, For each element in S _cross , use a randomly generated random number R between 0 and 1 to control whether to use the corresponding element in S _mutant to replace: if R < CR, replace, otherwise do not replace;

12.3) according to step 11), calculate the corresponding score score (S _cross ) and score (S _i ) of S _cross and S _i respectively;

12.4) If score(S _cross )＜score(S _i ), then use S _cross to replace S _i in population P, otherwise S _i remains in population P;

13) G=G+1, if G>G _max , then according to the individual S _low with the lowest score in the current population P, the coordinates of all atoms in A after rotation and translation according to the element information in S _low are taken as the final match. body position information output, otherwise return to step 12).