WO2023123396A1

WO2023123396A1 - Enhanced sampling method, and method for calculating binding free energy of complex

Info

Publication number: WO2023123396A1
Application number: PCT/CN2021/143802
Authority: WO
Inventors: 李治鹏; 杨明俊; 邹俊杰; 林泓叡; 彭春望; 方栋; 林志雄; 万晓
Original assignee: 深圳晶泰科技有限公司
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-07-06

Abstract

The present invention relates to the technical field of research and development of drugs, and in particular to an enhanced sampling method, and a method for calculating binding free energy of a complex. The enhanced sampling method comprises: determining a hot region of a target molecule-protein complex; modifying a force field parameter of a target molecule-protein complex system in an initial input file on the basis of the hot region to obtain a force field parameter of a modified target molecule-protein complex system, then further generating a parameter file required for FEP/REST2 simulation; and performing FEP/REST2 simulation by using the parameter file as an input of a potential function to generate a trajectory file. According to the present application, the force field parameter of the target molecule-protein complex system in the initial input file is modified on the basis of the determined hot region, the process of an REST2 method is quickly and conveniently implemented with as few code modifications as possible, and the operation is simple, such that the calculation efficiency and prediction precision of subsequent FEP are improved.

Description

Enhanced sampling method and method for calculating binding free energy of complexes

【Technical field】

This application relates to the technical field of drug research and development, in particular to an enhanced sampling method, a method for calculating the binding free energy of a target molecule-protein complex, an enhanced sampling device, a device for calculating the binding free energy of a target molecule-protein complex, electronic devices and computer-readable storage media.

【Background technique】

As an index for evaluating the activity of drug molecules, the binding free energy between drug molecules and proteins can be predicted by various calculation methods. Among them, free energy perturbation (FEP) is a high-precision computational chemical prediction method, which has been widely used in drug design. However, as a method based on molecular dynamics simulation, FEP often faces the problem of sampling during the simulation process. For systems with insufficient sampling, the obtained binding free energy does not converge well, which greatly affects the accuracy of the prediction results.

In order to solve this problem, there are various methods of enhancing sampling in the prior art. Among them, solute tempering replica exchange (Replica Exchange with Solute Tempering, REST2) is an enhanced sampling method in computational chemistry, which was originally designed for the simulation of protein folding systems. The REST2 method adopts the idea of local (hot region) heating, which can reduce the number of required exchange copies while ensuring the exchange rate, and realize efficient and reliable enhanced sampling. Many studies have proved that using the REST2 method to perform enhanced sampling for FEP (hereinafter abbreviated as FEP/REST2) can significantly improve the calculation accuracy of FEP. However, the inventors of the present application have found in long-term research that they face two problems when using FEP/REST2: 1. Using the REST2 method in FEP involves modification of the underlying code of the molecular dynamics software, which is difficult to implement; 2. , The overall process of the FEP/REST2 method is complex and difficult to use.

【Content of invention】

The main purpose of the present application is to provide an enhanced sampling method, a method for calculating the binding free energy of the target molecule-protein complex, an enhanced sampling device, a device for calculating the binding free energy of the target molecule-protein complex, electronic equipment and computer-based The storage medium is read to solve the problem in the prior art that it is difficult to realize and use the REST2 method in the FEP.

An enhanced sampling method provided in an embodiment of the present application includes:

Determining the heating region of the target molecule-protein complex;

Modify the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtain the modified force field parameters of the target molecule-protein complex system, and then further generate the parameter files required for FEP/REST2 simulation ;

FEP/REST2 simulations are performed using the parameter file as the input of the potential function to generate a trajectory file, which is used to calculate the binding free energy of the target molecule-protein complex.

A method for calculating the binding free energy of the target molecule-protein complex provided in the embodiment of the present application includes:

The initial input file for constructing the target molecule-protein complex system;

The initial input file is input to the enhanced sampling method as described in the above embodiment to obtain a trajectory file;

The binding free energy of the target molecule-protein complex is obtained based on the trajectory file.

An enhanced sampling device provided in an embodiment of the present application includes:

The heating region determination module is used to determine the heating region of the target molecule-protein complex;

The force field parameter modification module is used to modify the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtain the modified force field parameters of the target molecule-protein complex system, and then further generate FEP Parameter files required for /REST2 simulation; and

The trajectory file generation module is used to perform FEP/REST2 simulation on the parameter file as an input of the potential function to generate a trajectory file, and the trajectory file is used to calculate the binding free energy of the target molecule-protein complex.

A device for calculating the binding free energy of the target molecule-protein complex provided in the embodiment of the present application includes:

A building block for constructing an initial input file for a target molecule-protein complex system;

The enhanced sampling device as described in the above-mentioned embodiment is used to obtain a trajectory file based on the inputted initial input file; and

A calculation module is used to obtain the binding free energy of the target molecule-protein complex based on the trajectory file.

An electronic device provided in an embodiment of the present application includes:

one or more processors;

a memory, coupled to the processor, for storing one or more programs;

When the one or more programs are executed by the one or more processors, so that the one or more processors implement the enhanced sampling method as described in the above embodiment or the calculation target molecule as described in the above embodiment- Binding free energy methods for protein complexes.

A computer-readable storage medium provided by an embodiment of the present application stores a computer program on it. When the computer program is executed by a processor, the enhanced sampling method as described in any embodiment or the calculation described in the above-mentioned embodiments is implemented. Binding Free Energy Method for Target Molecule-Protein Complexes.

Compared with the prior art, the enhanced sampling method, the enhanced sampling method, the method for calculating the binding free energy of the target molecule-protein complex, the electronic device and the computer-readable storage medium of the present application have the following beneficial effects:

Since in the molecular dynamics simulation, the energy of the target molecule-protein complex system can be calculated by the potential function, so this application determines the temperature rise region of the target molecule-protein complex, and then modifies the target molecule in the initial input file corresponding to the temperature rise region -The force field parameters of the protein complex system, that is, to modify various parameters in the potential function to generate the parameter files required for FEP/REST2 simulation, so as to quickly and conveniently realize the process of the REST2 method with as little code modification as possible, and There is no need to greatly modify the underlying code of the molecular dynamics simulation software, and the operation is simple, which improves the calculation efficiency and prediction accuracy of subsequent FEP.

【Description of drawings】

The present application will describe the implementation manners with reference to the accompanying drawings. The drawings of the present application are only used to describe the embodiments for the purpose of illustration. Without departing from the principles of the present application, those skilled in the art can easily make other embodiments according to the steps described below.

FIG. 1 is a schematic flowchart of an enhanced sampling method in an embodiment of the present application.

FIG. 2 is a schematic flow chart of determining the heating region in step S10 in the embodiment of the present application.

Fig. 3 is a schematic flow chart of modifying the force field parameters of the target molecule-protein complex system corresponding to the temperature rise region in step S20 in the embodiment of the present application.

Fig. 4 is a schematic flowchart of a method for calculating the binding free energy of a target molecule-protein complex in an embodiment of the present application.

FIG. 5 is a schematic diagram of the basic principle of using relative free energy to calculate the difference in free energy between different molecules in the embodiment of the present application.

Figure 6(a) is a schematic diagram of the FEP calculation results when REST2 enhanced sampling is not used.

Fig. 6(b) is a schematic diagram of FEP calculation results when REST2 enhanced sampling is used in this application.

FIG. 7 is a schematic structural diagram of an enhanced sampling device in an embodiment of the present application.

Fig. 8 is a schematic structural diagram of a device for calculating the binding free energy of a target molecule-protein complex in an embodiment of the present application.

FIG. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application.

【Detailed ways】

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

The terms "first", "second", etc. in this application are used to distinguish different objects, not to describe a specific order. Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally further includes For other steps or units inherent in these processes, methods, products or apparatuses.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

Please refer to FIG. 1 , an embodiment of the present invention provides an enhanced sampling method, which is applied in commonly used molecular simulation software, and quickly and conveniently implements the flow of the REST2 method with as little code modification as possible. Among them, commonly used molecular simulation software includes but not limited to Amber program, Gromacs program, Namd program.

Specifically, the enhanced sampling method of the present application includes the following steps:

S10. Determine the heating region of the target molecule-protein complex.

In this embodiment, the target molecule may be a drug candidate compound.

Free energy perturbation calculation (FEP) is a method to assess the binding strength of drug small molecules and targets. FEP enhances the reliability of sampling by constructing a series of non-physical intermediate states between bound and unbound states. When calculating FEP, perturbation is performed by modifying lambda in the input file. In this embodiment, according to the perturbation area in the general FEP calculation process, the temperature rise area of the target molecule-protein complex is determined, so that the target molecule-protein complex to be detected can be heated locally, and the exchange rate can be guaranteed. Reduces the number of swap copies required, enabling efficient and reliable augmented sampling.

In a specific embodiment, as shown in FIG. 2, the above step S10 includes the following steps:

The region of perturbation during the FEP calculation was identified as the initial warming region of the target molecule-protein complex.

In this embodiment, the perturbation region in the general FEP calculation process is taken as the initial heating region.

After the initial heating area is determined, the structure of the target molecule is analyzed to determine whether the atoms directly connected to the perturbation area are located on the rings in the target molecule. If it is on the ring, it is determined that the perturbation region is directly connected to the ring, and if it is not on the ring, it is determined that the perturbation region is not connected to the ring. If determined to be attached to a ring, add the ring to the elevated zone. If it is determined that it is not connected to the ring, no processing is performed. This example can help some systems to converge faster by adding an extra ring to the heating region.

Specifically, the steps of analyzing the structure of the target molecule to adjust the heating area are as follows:

judging whether there is a ring in the target molecule;

If there is a ring in the target molecule, it is further judged whether at least some of the atoms in the initial heating region are located on the ring, or whether at least some of the atoms in the initial heating region are directly or directly with the atoms on the ring indirectly connected;

If at least some of the atoms in the initial heating region are located on the ring or are directly connected to atoms on the ring, then adding the region where the ring is located to the initial heating region to obtain a first updated heating region ;

If at least some of the atoms in the initial heating region are indirectly connected to atoms on the ring, add the ring and the region where the part indirectly connected to the ring is located to the initial heating region to obtain a first update warming area;

If all atoms in the initial heating region are neither located on the ring nor directly or indirectly connected to atoms on the ring, then the initial heating region is defined as the first updated heating region;

If there is no ring in the target molecule, then set the initial temperature rise region as the first updated temperature rise region;

Determine whether there are additional designated areas that need to be heated;

If it exists, adding the additionally designated area that needs to be heated to the first updated heating area to obtain a heating area;

If it does not exist, then set the first update temperature rise region as the temperature rise region.

In this way, the initial heating region is updated according to the structure of the target molecule, thereby obtaining the first updated heating region, and after obtaining the first updated heating region, it is further judged whether there is an additional specified heating region, and if so, it is added to The first heating area is the heating area, if it does not exist, it will not be added, and the first updated heating area is defined as the heating area, thereby realizing the automatic selection of the heating area.

S20. Modify the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtain the modified force field parameters of the target molecule-protein complex system, and then further generate the required parameters for FEP/REST2 simulation parameter file.

FEP/REST2 refers to enhanced sampling for FEP using the REST2 method. Since in the molecular dynamics simulation, the energy of the target molecule-protein complex system can be calculated by the potential function, so when performing REST2 calculation on the heating region, this process can be realized by modifying various parameters in the potential function, There is no need to modify the underlying code of the molecular dynamics simulation software.

After determining the heating area, further determine the force field parameters that need to be modified in the target molecule-protein complex system, so as to modify accordingly. Specifically, the above step S20 includes the following steps:

Dividing the force field parameter item of the target molecule-protein complex system into parameters inside the heating region, parameters between the heating region and the environment region, and parameters inside the environment region;

At least partially modifying the parameters inside the heating region, the parameters between the heating region and the environment region, and the parameters inside the environment region to obtain the modified force field parameters of the target molecule-protein complex system, and then further Generate the parameter files required for the FEP/REST2 simulation described.

This application can divide the target molecule-protein complex system into two parts on the basis of the heating area, the heating area and the environmental area outside the heating area, so that the total energy of the target molecule-protein complex system under REST2 can be divided into two parts:

It is divided into three parts (as shown in Equation 5 below): the energy U _c,c inside the heating area, the energy U _c,e between the heating area and the environment area, and the energy U _e,e inside the environment area. The basic formula of REST2 is:

in,

is the total energy of the target molecule-protein complex system when using REST2, U _c,c is the energy inside the heating area, U _c,e is the energy between the heating area and the environment area, U _e,e is the energy inside the environment area, k _B is the Boltzmann constant, T ₀ is the ambient temperature, generally 298K, and T _m is the temperature in the heating area, and the temperature can be selected by oneself.

In the molecular dynamics simulation, the above-mentioned energy U can be calculated by the potential function (as shown in the following formula 7), and the basic potential function is as follows. During the calculation process, all parameters and variables are provided by the molecular dynamics simulation software. In this way, the energy U _c,c inside the heating region, the energy U _c,e between the heating region and the environment region, and the energy U _e,e inside the environment region can be calculated, and then the total amount of the target molecule-protein complex system can be calculated. energy

in,

For the key connection parameters, including

are the bond length coefficient, angle coefficient and dihedral angle coefficient respectively; ∈ _i , ∈ _j are the van der Waals potential well depth parameters of the i-th atom and j-th atom respectively, σ _ij is the van der Waals parameter; q _i , q _j are the The electrostatic charge parameters of the i-th atom and the j-th atom; r _i , θ _i , and φ _i are the bond length parameters, angle parameters, and dihedral angle parameters of the i-th atom, respectively; r _ij is the i-th atom and j-th atom The distance between atoms; r ₀ , θ ₀ , δ _i , _{and ni} are respectively the bond length parameter, angle parameter, dihedral angle parameter of the i-th atom and the sample size included in the sampling of the intermediate state i.

As shown in Figure 3, according to the determined temperature rise region, the parameter k is divided into different types, including parameters inside the temperature rise region, parameters between the temperature rise region and the environment region, and parameters inside the environment region. At the same time, determine the force field parameters that need to be modified, including bond length, bond angle, dihedral angle, van der Waals and electrostatic parameters, etc.

According to the calculation formula of the above potential function, it can be obtained that when REST2 calculation is performed on the heating area, this process can be realized by modifying various parameters in the potential function without modifying the underlying code of the simulation software.

Specifically, in the above step S22, at least partially modifying the parameters inside the temperature rise region, the parameters between the temperature rise region and the environment region, and the parameters inside the environment region includes the following steps:

multiplying the parameters inside the heating zone by the parameters of the first type;

multiplying the parameter between the heating zone and the ambient zone by a second type of parameter;

No modification is made to the parameters inside the environmental region.

Optionally, the first type of parameters include bonding coefficients, van der Waals coefficients and electrostatic coefficients, and the second type of parameters include bonding coefficients.

In this embodiment, the parameters located inside the heating region (including bonding parameters, van der Waals parameters, and electrostatic parameters) are multiplied by corresponding first-type parameters, including bonding coefficients, van der Waals coefficients, and electrostatic coefficients.

Specifically, as shown in Figure 3,

For the parameter (bonding parameter) located between the heating region and the ambient region, it is multiplied by the corresponding second type of parameter for modification, and the second type of parameter is the bonding coefficient between the heating region and the ambient region.

It should be noted that the bonding parameters in this embodiment are parameters related to chemical bonds used to connect atoms, such as bond length coefficients, bond angle coefficients, dihedral angle coefficients, and the like.

Specifically, as shown in Figure 3,

The parameters inside the environment area are not modified.

S30. Perform FEP/REST2 simulation using the parameter file as an input of a potential function to generate a trajectory file, and the trajectory file is used to calculate the binding free energy of the target molecule-protein complex.

Finally, the input of the potential function of the modified parameter file is used for FEP/REST2 simulation. After performing FEP/REST2 simulation to generate trajectory files, the trajectory files can be used to calculate the binding free energy of the target molecule-protein complex.

In summary, the enhanced sampling method of this application has the following beneficial effects:

Please refer to FIG. 4 , the embodiment of the present invention also provides a complete calculation method for binding free energy. Specifically, a method for calculating the binding free energy of the target molecule-protein complex of the present application includes the following steps:

S101. Construct the initial input file of the target molecule-protein complex system.

Wherein, the initial input file includes a file of the three-dimensional conformation of the target molecule-protein complex, which may specifically be a file in pdb format.

S102. Input the initial input file into the enhanced sampling method described in any one of the above embodiments to obtain a trajectory file.

S103. Obtain the binding free energy of the target molecule-protein complex based on the trajectory file.

In this embodiment, after the trajectory file is obtained, the trajectory file is analyzed, and the obtained molecular dynamics trajectory of the target molecule-protein complex system in different states is analyzed.

In one embodiment, before the step S101 of constructing the initial input file of the target molecule-protein complex system, the following steps are further included:

Select small molecules with known conformations of small molecule-protein complexes as reference compounds;

Determining the warming region of the reference compound-protein complex;

Correspondingly, the step S20 modifies the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtains the modified force field parameters of the target molecule-protein complex system, and then further generates FEP/ The parameter file required for REST2 simulation, including the following steps:

Modifying the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region to obtain the modified force field parameters of the target molecule-protein complex system;

Modifying the force field parameters of the reference compound-protein complex system in the initial input file corresponding to the heating region, to obtain the modified force field parameters of the reference compound-protein complex system;

Based on the modified force field parameters of the reference compound-protein complex system and the modified force field parameters of the target molecule-protein complex system, the parameter files required for FEP/REST2 simulation are generated.

Further, the step S103 is to obtain the binding free energy of the target molecule-protein complex based on the trajectory file as follows: based on the trajectory file, calculate using the Bennett acceptance rate method to obtain the target molecule-protein complex binding free energy.

Because the parameters of each state (alchemical state) in the FEP calculation process have been modified during the FEP/REST2 calculation, so the common integral method cannot be used for analysis, so this embodiment uses the Bennett Acceptance Ratio (Bennett Acceptance Ratio, BAR) method to calculate the binding free energy of the target molecule-protein complex, the specific calculation method is shown in the following formula. Among them, the method of reweighting can be used to calculate the energy difference ΔU of the calculated trajectory file under other state parameter files, and finally obtain the free energy difference ΔG between the two states through the formula.

Specifically, the above step S103 further includes the following steps:

According to the two adjacent intermediate state i and intermediate state j in the trajectory file, respectively calculate the first energy ΔU _ij of the intermediate state i trajectory under the corresponding parameters of intermediate state j, and the intermediate state j trajectory under the corresponding parameters of intermediate state i The second energy ΔU _ji of ;

Put the first energy ΔU _ij and the second energy ΔU _ji into formula 1 and formula 2, respectively calculate the solvent free energy difference _ΔGA for converting the reference compound into the target molecule, and the binding of the reference compound into the target molecule Free energy difference ΔG _B ;

Based on the solvent free energy difference _ΔGA of the reference compound converted into the target molecule, and the binding free energy difference ΔG _B of the reference compound converted into the target molecule, the relative binding free energy ΔΔG _binding of the reference compound converted into the target molecule is calculated according to formula 3;

ΔΔG _binding = ΔG _B -ΔG _A formula 3;

Based on the relative binding free energy ΔΔG _binding converted from the reference compound to the target molecule, and the known binding free energy ΔG ₁ of the reference compound-protein complex, the binding free energy ΔG ₂ of the target molecule-protein complex is calculated according to formula 4;

ΔG ₂ = ΔΔG _binding + ΔG ₁ Formula 4;

Among them, ΔG is the free energy difference, ΔU _ij is the first energy, ΔU _ji is the second energy, <*> _i is the system average under intermediate state i, <*> _j is the system under intermediate state j On average, N _i and N _i are the frame numbers of the simulation trajectory under the intermediate state i and j, k _B is the Boltzmann constant, T is the simulation temperature, generally 298K, ΔΔG _binding is the relative binding free energy, and ΔG _A is The solvent free energy difference of the reference compound converted into the target molecule, ΔG _B is the binding free energy difference of the reference compound converted into the target molecule, ΔG ₁ is the known binding free energy of the reference compound-protein complex, ΔG ₂ is the target molecule- Binding free energy of protein complexes.

In this embodiment, the relative free energy (Relative Binding Free Energy, RBFE) is used to calculate the free energy difference ΔΔG _binding between different molecules, so as to calculate the binding free energy of the target molecule-protein complex.

As shown in FIG. 5 , FIG. 5 is a schematic diagram of the basic principle of calculating the free energy difference ΔΔG between different molecules by using relative free energy (Relative Binding Free Energy, RBFE). Among them, the upper left figure shows the schematic diagram of the separation of the reference compound A and the protein receptor, the lower left figure shows the complex structure A formed by the reference compound A and the protein receptor, and ΔG ₁ is the binding free energy of the complex structure A formed by the reference compound A and the protein receptor , the upper right figure shows the separation schematic diagram of the target molecule B and the protein receptor, the lower right figure shows the complex structure B formed by the target molecule B and the protein receptor, and ΔG ₂ is the binding free energy of the target molecule B and the protein receptor to form the complex structure B , ΔG _A is the solvent binding free energy of the protein receptor to transform the reference compound A into the target molecule B, and ΔG _B is the difference in the binding free energy of the reference compound A into the target molecule B.

Wherein, ΔΔG _binding =ΔG ₂ −ΔG ₁ =ΔG _B −ΔG _A .

In order to calculate ΔΔG _binding , ΔG ₂ -ΔG ₁ is relatively difficult to calculate, and the difference between the two laterally adjacent complex systems is relatively small, so it is relatively easy to achieve equilibrium. In actual calculation, it is easy to achieve, so the calculation can be relatively simple ΔG _B -ΔG _A . Specifically, the binding free energy difference ΔG _B of the reference compound of the target molecule to be predicted converted into the target molecule and the solvent free energy ΔG of the reference compound converted into the target molecule are calculated by the above method of calculating the binding free energy of the target molecule-protein complex _A , since the small molecule with known conformation of the small molecule-protein complex is selected as the reference compound, the binding free energy ΔG ₁ of the reference compound-protein complex is determined, so after calculating ΔΔG _binding through ΔG _B -ΔG _A , and then calculate the binding free energy ΔG ₂ of the target molecule-protein complex by ΔG ₂ =ΔΔG _binding +ΔG ₁ (Formula 4).

Exemplarily, the commonly used molecular simulation software takes the Amber program as an example, selects a protein small molecule complex as the test system, selects dihedral angle, van der Waals and electrostatic parameters for modification, and calculates the binding free energy of the protein small molecule complex.

1. Select a small molecule with known conformation of the small molecule-protein complex as a reference compound, and construct a molecular pair with relative free energy.

2. Construct the initial input file of the complex system suitable for RBFE calculation according to the small molecule pair, specifically:

a. Prepare the structure of the complex between the reference compound and the protein, and at the same time use the method of molecular docking (docking) to obtain the above protein small molecule complex.

b. Use the Antechamber tool in Amber18 to grab the gaff2 force field parameters for each small molecule, and use the Am1bcc method to calculate the charge carried by each small molecule.

c. On this basis, use the tleap tool in Amber18 to construct the initial input files of protein and small molecule systems.

Similarly, construct the initial input file for the target molecule-protein complex.

3. For the initial input files of the target molecule-protein complex and the reference compound-protein complex, modify the dihedral angle, van der Waals and electrostatic parameters respectively according to the method in Figure 1-Figure 3 above to obtain the force required for the calculation of each intermediate state field input file, specifically:

a. Determine the temperature rise region for the target molecule-protein complex and the reference compound-protein complex respectively.

b. Modifying the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, to obtain the modified force field parameters of the target molecule-protein complex system.

Similarly, modify the force field parameters of the reference compound-protein complex system in the initial input file corresponding to the heating region, to obtain the modified force field parameters of the reference compound-protein complex system;

c. Generate parameter files required for FEP/REST2 simulation based on the modified force field parameters of the reference compound-protein complex system and the modified force field parameters of the target molecule-protein complex system.

4. Carry out molecular dynamics simulation for the input of each intermediate state, and save the trajectory file during the simulation process.

5. Calculate the specific relative free energy according to the obtained trajectory file and the aforementioned Bennett Acceptance Rate (BAR) method. The specific process is as follows:

a. First, for two adjacent intermediate states i and intermediate states j in the trajectory file, calculate the first energy ΔU _ij of the intermediate state i trajectory under the corresponding parameters of intermediate state j, and calculate the second energy ΔU _ji in the same way;

b. Bring the first energy ΔU _ij and the second energy ΔU _ji into the BAR method (the following formulas 1 and 2), and calculate the solvent free energy difference _ΔGA of the reference compound converted into the target molecule, and the reference compound converted into the target molecule The binding free energy difference ΔG _B of the molecules;

c. Based on the solvent free energy difference ΔG _A of the reference compound converted into the target molecule, and the binding free energy difference ΔG _B of the reference compound converted into the target molecule, the relative binding free energy ΔΔG of the reference compound converted into the target molecule is calculated according to the following formula 3 _binding ;

ΔΔG _binding = ΔG _B -ΔG _A Formula 3;

d. Based on the relative binding free energy ΔΔG _binding of the reference compound converted into the target molecule, and the known binding free energy ΔG ₁ of the reference compound-protein complex, calculate the binding free energy of the target molecule-protein complex according to the following formula 4 ΔG ₂ ;

ΔG ₂ = ΔΔG _binding + ΔG ₁ Formula 4;

Among them, ΔG is the free energy difference, ΔU _ij is the first energy, ΔU _ji is the second energy, <*> _i is the system average under intermediate state i, <*> _j is the system under intermediate state j On average, N _i , N _i are the frame numbers of the simulation trajectory under the intermediate state i, j, k _B is the Boltzmann constant, T is the simulation temperature, ΔΔG _binding is the relative binding free energy, ΔG _A is the reference compound transformed into The solvent free energy difference of the target molecule, ΔG _B is the binding free energy difference of the reference compound converted into the target molecule, ΔG ₁ is the known binding free energy of the reference compound-protein complex, ΔG ₂ is the binding free energy of the target molecule-protein complex Binding free energy.

In this way, after ΔΔG _binding is calculated by ΔG _B _-ΔGA , the binding free energy ΔG ₂ of the target molecule-protein complex is finally calculated by ΔG ₂ =ΔΔG _binding +ΔG ₁ (Formula 4).

Comparing the results calculated by the above method with the experimental values, the results obtained are shown in Figure 6(b), where Figure 6(a) is the FEP calculation result when REST2 enhanced sampling is not used, and Figure 6(b) is This application uses REST2 to enhance the calculation results after sampling.

Comparing the result of using FEP/REST2 in Figure 6(b) with the result of not using REST2 in Figure 6(a), the average unsigned error MUE (mean unsigned error) after using FEP/REST2 is reduced, and the correlation coefficient R ² has improved. It can be seen that the free energy calculated by the present application is closer to the experimental results, therefore, the enhanced sampling method and the enhanced sampling method of the present application improve the accuracy of FEP prediction.

It should be noted that, in addition to using the Bennett Acceptance Ratio method (BAR) method to calculate the binding free energy of the target molecule-protein complex, other methods can also be used to calculate the binding free energy of the target molecule-protein complex. This is not specifically limited.

In summary, the method for calculating the binding free energy of the target molecule-protein complex of the present application has the following beneficial effects:

By implementing the enhanced sampling method in the embodiment of this application, the flow of the REST2 method can be realized quickly and conveniently with as little code modification as possible, without greatly modifying the underlying code of the molecular dynamics simulation software, and making the overall flow of the FEP/REST2 method Automation improves the calculation efficiency and prediction accuracy of FEP, making it easier to calculate the binding free energy between target molecules and proteins more quickly.

Referring to FIG. 7 , the embodiment of the present application also provides an enhanced sampling device 100 . The enhanced sampling device 100 includes:

The heating region determination module 110 is used to determine the heating region of the target molecule-protein complex;

The force field parameter modification module 120 is used to modify the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtain the modified force field parameters of the target molecule-protein complex system, and then further generate Parameter files required for FEP/REST2 simulations; and

The trajectory file generating module 130 is configured to perform FEP/REST2 simulation on the parameter file as an input of the potential function to generate a trajectory file, and the trajectory file is used to calculate the binding free energy of the target molecule-protein complex.

In a certain embodiment, the force field parameter modification module 120 includes:

A parameter division module, configured to divide the force field parameter items of the target molecule-protein complex system into parameters inside the heating region, parameters between the heating region and the environment region, and parameters inside the environment region;

The first modification module is used to at least partially modify the parameters inside the heating region, the parameters between the heating region and the environment region, and the parameters inside the environment region to obtain a modified target molecule-protein complex system The force field parameters, and then further generate the parameter files required for the FEP/REST2 simulation.

In a certain embodiment, the first modification module is specifically used for:

No modification is made to the parameters inside the environmental region.

In a certain embodiment, the first type of parameters include bonding coefficients, van der Waals coefficients and electrostatic coefficients, and the second type of parameters include bonding coefficients.

In a certain embodiment, the heating region determination module 110 is specifically used for:

Identify the perturbation region during the FEP calculation as the initial heating region of the target molecule-protein complex;

judging whether there is a ring in the target molecule;

Determine whether there are additional designated areas that need to be heated;

For the specific definition of the enhanced sampling device 100, refer to the above definition of the enhanced sampling method, which will not be repeated here. Each module in the above-mentioned enhanced sampling device 100 may be fully or partially implemented by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.

Please refer to Fig. 8, the embodiment of the present application also provides a device 200 for calculating the binding free energy of the target molecule-protein complex, the device 200 includes:

Construction module 210, used to construct the initial input file of the target molecule-protein complex system;

The enhanced sampling device 100 as described in any of the above-mentioned embodiments is configured to obtain a trajectory file based on the input initial input file; and

A calculation module 220, configured to obtain the binding free energy of the target molecule-protein complex based on the trajectory file.

In a certain embodiment, a device 200 for calculating the binding free energy of the target molecule-protein complex further includes:

Reference compound selection module, used to select small molecules with known small molecule-protein complex conformations as reference compounds;

The heating region determination module 110 is also used to determine the heating region of the reference compound-protein complex;

The force field parameter modification module 120 is specifically used for:

In a certain embodiment, the calculation module 220 includes a first sub-calculation module, and the first sub-calculation module is used for:

Based on the trajectory file, the Bennett acceptance rate method is used for calculation to obtain the binding free energy of the target molecule-protein complex;

The first sub-computing module is specifically used for:

ΔΔG _binding = ΔG _B -ΔG _A Formula 3;

ΔG ₂ = ΔΔG _binding + ΔG ₁ Formula 4;

For the specific limitations of the device 200 for calculating the binding free energy of the target molecule-protein complex, please refer to the above-mentioned limitations on the method for calculating the binding free energy of the target molecule-protein complex, which will not be repeated here. Each module in the above-mentioned apparatus 200 for calculating the binding free energy of the target molecule-protein complex can be fully or partially realized by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, and can also be stored in the memory of the computer device in the form of software, so that the processor can invoke and execute the corresponding operations of the above-mentioned modules.

Please refer to FIG. 9, the embodiment of the present application also provides an electronic device, including:

one or more processors;

a memory, coupled to the processor, for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the enhanced sampling method described in any of the above embodiments or the calculation described in any of the above embodiments Binding Free Energy Method for Target Molecule-Protein Complexes.

The processor is used to control the overall operation of the terminal device to complete all or part of the steps of the above enhanced sampling method or the method for calculating the binding free energy of the target molecule-protein complex described in any of the above embodiments. The memory is used to store various types of data to support the operation of the terminal device, such data may include instructions for any application program or method operated on the terminal device, and application-related data. The memory can be implemented by any type of volatile or non-volatile memory devices or their combination, such as Static Random Access Memory (SRAM for short), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory) Erasable Programmable Read-Only Memory, referred to as EEPROM), Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, referred to as EPROM), Programmable Read-Only Memory (Programmable Read-Only Memory, referred to as PROM), read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, magnetic disk or optical disk.

In an exemplary embodiment, the terminal device may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), digital signal processors (Digital Signal Processor, DSP for short), digital signal processing equipment (Digital Signal Processing Device, referred to as DSPD), programmable logic device (Programmable Logic Device, referred to as PLD), field programmable gate array (Field Programmable Gate Array, referred to as FPGA), controller, microcontroller, microprocessor or other electronic components to achieve , for performing the enhanced sampling method as described in any of the above embodiments or the method for calculating the binding free energy of the target molecule-protein complex as described in any of the above embodiments, and achieve the same technical effect as the above methods.

In another exemplary embodiment, a computer-readable storage medium including a computer program is also provided, and when the computer program is executed by a processor, the enhanced sampling method as described in any one of the above-mentioned embodiments or any of the above-mentioned implementations is implemented. The steps of the method for calculating the binding free energy of the target molecule-protein complex described in the example. For example, the computer-readable storage medium may be the above-mentioned memory including a computer program, and the above-mentioned computer program can be executed by the processor of the terminal device to complete the enhanced sampling method as described in any of the above embodiments or the method described in any of the above-mentioned embodiments. A method for calculating the binding free energy of the target molecule-protein complex, and achieve the same technical effect as the above method.

The above is only the preferred implementation mode of the application, and does not limit the patent scope of the application. Any equivalent structure or equivalent process conversion made by using the specification and drawings of the application, or directly or indirectly used in other related technologies fields, are all included in the scope of patent protection of this application in the same way.

Claims

A kind of enhanced sampling method, is characterized in that, comprises:

Determining the heating region of the target molecule-protein complex;

Modify the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtain the modified force field parameters of the target molecule-protein complex system, and then further generate the parameter files required for FEP/REST2 simulation ;

FEP/REST2 simulations are performed using the parameter file as the input of the potential function to generate a trajectory file, which is used to calculate the binding free energy of the target molecule-protein complex.
The method according to claim 1, characterized in that, modifying the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region to obtain the force of the modified target molecule-protein complex system Field parameters, and then further generate the parameter files required for FEP/REST2 simulation, including:

Dividing the force field parameter item of the target molecule-protein complex system into parameters inside the heating region, parameters between the heating region and the environment region, and parameters inside the environment region;

At least partially modifying the parameters inside the heating region, the parameters between the heating region and the environment region, and the parameters inside the environment region to obtain the modified force field parameters of the target molecule-protein complex system, and then further Generate the parameter files required for the FEP/REST2 simulation described.
The method according to claim 2, wherein the at least partially modifying parameters inside the heating zone, parameters between the heating zone and the environment zone, and parameters inside the environment zone include:

multiplying the parameters inside the heating zone by the parameters of the first type;

multiplying the parameter between the heating zone and the ambient zone by a second type of parameter;

No modification is made to the parameters inside the environmental region.
The method according to claim 3, wherein the first type of parameters include bonding coefficient, van der Waals coefficient and electrostatic coefficient, and the second type of parameters include bonding coefficient.
The method according to claim 1, wherein the determination of the heating region of the target molecule-protein complex comprises:

Identify the perturbation region during the FEP calculation as the initial heating region of the target molecule-protein complex;

judging whether there is a ring in the target molecule;

If there is a ring in the target molecule, it is further judged whether at least some of the atoms in the initial heating region are located on the ring, or whether at least some of the atoms in the initial heating region are directly or directly with the atoms on the ring indirectly connected;

If at least some of the atoms in the initial heating region are located on the ring or are directly connected to atoms on the ring, then adding the region where the ring is located to the initial heating region to obtain a first updated heating region ;

If at least some of the atoms in the initial heating region are indirectly connected to atoms on the ring, add the ring and the region where the part indirectly connected to the ring is located to the initial heating region to obtain a first update warming area;

If all atoms in the initial heating region are neither located on the ring nor directly or indirectly connected to atoms on the ring, then the initial heating region is defined as the first updated heating region;

If there is no ring in the target molecule, then set the initial temperature rise region as the first updated temperature rise region;

Determine whether there are additional designated areas that need to be heated;

If it exists, adding the additionally designated area that needs to be heated to the first updated heating area to obtain a heating area;

If it does not exist, then set the first update temperature rise region as the temperature rise region.
A method for calculating the binding free energy of a target molecule-protein complex, characterized in that it comprises:

The initial input file for constructing the target molecule-protein complex system;

The initial input file is input to the enhanced sampling method according to any one of claims 1-5 to obtain a trajectory file;

The binding free energy of the target molecule-protein complex is obtained based on the trajectory file.
The method according to claim 6, wherein, before the initial input file of the construction target molecule-protein complex system, further comprising:

Select small molecules with known conformations of small molecule-protein complexes as reference compounds;

Determining the warming region of the reference compound-protein complex;

Modifying the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region to obtain the modified force field parameters of the target molecule-protein complex system, and then further generating the required parameters for FEP/REST2 simulation parameter file, including:

Modifying the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region to obtain the modified force field parameters of the target molecule-protein complex system;

Modifying the force field parameters of the reference compound-protein complex system in the initial input file corresponding to the heating region, to obtain the modified force field parameters of the reference compound-protein complex system;

Based on the modified force field parameters of the reference compound-protein complex system and the modified force field parameters of the target molecule-protein complex system, the parameter files required for FEP/REST2 simulation are generated.
The method according to claim 7, wherein the obtaining of the binding free energy of the target molecule-protein complex based on the trajectory file is: based on the trajectory file, using the Bennett acceptance rate method to calculate, obtain The binding free energy of the target molecule-protein complex; based on the trajectory file, the Bennett acceptance rate method is used to calculate the binding free energy of the target molecule-protein complex, including:

According to the two adjacent intermediate state i and intermediate state j in the trajectory file, respectively calculate the first energy ΔU ij of the intermediate state i trajectory under the corresponding parameters of intermediate state j, and the intermediate state j trajectory under the corresponding parameters of intermediate state i The second energy ΔU ji of ;

Put the first energy ΔU ij and the second energy ΔU ji into formula 1 and formula 2, respectively calculate the solvent free energy difference ΔGA for converting the reference compound into the target molecule, and the binding of the reference compound into the target molecule Free energy difference ΔG B ;

Based on the solvent free energy difference ΔGA of the reference compound converted into the target molecule, and the binding free energy difference ΔG B of the reference compound converted into the target molecule, the relative binding free energy ΔΔG binding of the reference compound converted into the target molecule is calculated according to formula 3;

ΔΔG binding = ΔG B -ΔG A Formula 3;

Based on the relative binding free energy ΔΔG binding converted from the reference compound to the target molecule, and the known binding free energy ΔG 1 of the reference compound-protein complex, the binding free energy ΔG 2 of the target molecule-protein complex is calculated according to formula 4;

ΔG 2 = ΔΔG binding + ΔG 1 Formula 4;

Among them, ΔG is the free energy difference, ΔU ij is the first energy, ΔU ji is the second energy, <*> i is the system average under intermediate state i, <*> j is the system under intermediate state j On average, N i , N i are the frame numbers of the simulation trajectory under the intermediate state i, j, k B is the Boltzmann constant, T is the simulation temperature, ΔΔG binding is the relative binding free energy, ΔG A is the reference compound transformed into The solvent free energy difference of the target molecule, ΔG B is the binding free energy difference of the reference compound converted into the target molecule, ΔG 1 is the known binding free energy of the reference compound-protein complex, ΔG 2 is the binding free energy of the target molecule-protein complex Binding free energy.
An enhanced sampling device is characterized in that it comprises:

The heating region determination module is used to determine the heating region of the target molecule-protein complex;

The force field parameter modification module is used to modify the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region, obtain the modified force field parameters of the target molecule-protein complex system, and then further generate FEP Parameter files required for /REST2 simulation; and

The trajectory file generation module is used to perform FEP/REST2 simulation on the parameter file as an input of the potential function to generate a trajectory file, and the trajectory file is used to calculate the binding free energy of the target molecule-protein complex.
The device according to claim 9, wherein the force field parameter modification module includes:

A parameter division module, configured to divide the force field parameter items of the target molecule-protein complex system into parameters inside the heating region, parameters between the heating region and the environment region, and parameters inside the environment region;

The first modification module is used to at least partially modify the parameters inside the heating region, the parameters between the heating region and the environment region, and the parameters inside the environment region to obtain a modified target molecule-protein complex system The force field parameters, and then further generate the parameter files required for the FEP/REST2 simulation.
The device according to claim 10, wherein the first modification module is specifically used for:

multiplying the parameters inside the heating zone by the parameters of the first type;

multiplying the parameter between the heating zone and the ambient zone by a second type of parameter;

No modification is made to the parameters inside the environmental region.
The device according to claim 11, wherein the first type of parameters include bonding coefficient, van der Waals coefficient and electrostatic coefficient, and the second type of parameters include bonding coefficient.
The device according to claim 9, wherein the heating area determination module is specifically used for:

Identify the perturbation region during the FEP calculation as the initial heating region of the target molecule-protein complex;

judging whether there is a ring in the target molecule;

If there is a ring in the target molecule, it is further judged whether at least some of the atoms in the initial heating region are located on the ring, or whether at least some of the atoms in the initial heating region are directly or directly with the atoms on the ring indirectly connected;

If at least some of the atoms in the initial heating region are located on the ring or are directly connected to atoms on the ring, then adding the region where the ring is located to the initial heating region to obtain a first updated heating region ;

If at least some of the atoms in the initial heating region are indirectly connected to atoms on the ring, add the ring and the region where the part indirectly connected to the ring is located to the initial heating region to obtain a first update warming area;

If all atoms in the initial heating region are neither located on the ring nor directly or indirectly connected to atoms on the ring, then the initial heating region is defined as the first updated heating region;

If there is no ring in the target molecule, then set the initial temperature rise region as the first updated temperature rise region;

Determine whether there are additional designated areas that need to be heated;

If it exists, adding the additionally designated area that needs to be heated to the first updated heating area to obtain a heating area;

If it does not exist, then set the first update temperature rise region as the temperature rise region.
A device for calculating the binding free energy of a target molecule-protein complex, characterized in that it includes:

A building block for constructing an initial input file for a target molecule-protein complex system;

The enhanced sampling device according to any one of claims 9-13, configured to obtain a trajectory file based on the inputted initial input file; and

A calculation module is used to obtain the binding free energy of the target molecule-protein complex based on the trajectory file.
The device according to claim 14, further comprising:

Reference compound selection module, used to select small molecules with known small molecule-protein complex conformations as reference compounds;

The heating region determination module is also used to determine the heating region of the reference compound-protein complex;

The force field parameter modification module is specifically used for:

Modifying the force field parameters of the target molecule-protein complex system in the initial input file corresponding to the heating region to obtain the modified force field parameters of the target molecule-protein complex system;

Modifying the force field parameters of the reference compound-protein complex system in the initial input file corresponding to the heating region, to obtain the modified force field parameters of the reference compound-protein complex system;

Based on the modified force field parameters of the reference compound-protein complex system and the modified force field parameters of the target molecule-protein complex system, the parameter files required for FEP/REST2 simulation are generated.
The device according to claim 15, wherein the calculation module comprises a first sub-calculation module, and the first sub-calculation module is used for:

Based on the trajectory file, the Bennett acceptance rate method is used for calculation to obtain the binding free energy of the target molecule-protein complex;

The first sub-computing module is specifically used for:

According to the two adjacent intermediate state i and intermediate state j in the trajectory file, respectively calculate the first energy ΔU ij of the intermediate state i trajectory under the corresponding parameters of intermediate state j, and the intermediate state j trajectory under the corresponding parameters of intermediate state i The second energy ΔU ji of ;

Put the first energy ΔU ij and the second energy ΔU ji into formula 1 and formula 2, respectively calculate the solvent free energy difference ΔGA for converting the reference compound into the target molecule, and the binding of the reference compound into the target molecule Free energy difference ΔG B ;

Based on the solvent free energy difference ΔGA of the reference compound converted into the target molecule, and the binding free energy difference ΔG B of the reference compound converted into the target molecule, the relative binding free energy ΔΔG binding of the reference compound converted into the target molecule is calculated according to formula 3;

ΔΔG binding = ΔG B -ΔG A Formula 3;

Based on the relative binding free energy ΔΔG binding converted from the reference compound to the target molecule, and the known binding free energy ΔG 1 of the reference compound-protein complex, the binding free energy ΔG 2 of the target molecule-protein complex is calculated according to formula 4;

ΔG 2 = ΔΔG binding + ΔG 1 Formula 4;

Among them, ΔG is the free energy difference, ΔU ij is the first energy, ΔU ji is the second energy, <*> i is the system average under intermediate state i, <*> j is the system under intermediate state j On average, N i , N i are the frame numbers of the simulation trajectory under the intermediate state i, j, k B is the Boltzmann constant, T is the simulation temperature, ΔΔG binding is the relative binding free energy, ΔG A is the reference compound transformed into The solvent free energy difference of the target molecule, ΔG B is the binding free energy difference of the reference compound converted into the target molecule, ΔG 1 is the known binding free energy of the reference compound-protein complex, ΔG 2 is the binding free energy of the target molecule-protein complex Binding free energy.
An electronic device, characterized in that it comprises:

one or more processors;

a memory, coupled to the processor, for storing one or more programs;

When the one or more programs are executed by the one or more processors, the one or more processors implement the enhanced sampling method according to any one of claims 1-5 or claim 6- The method for calculating the binding free energy of the target molecule-protein complex described in any one of 8.
A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the enhanced sampling method according to any one of claims 1-5 or claim 6- The method for calculating the binding free energy of the target molecule-protein complex described in any one of 8.