CN117577219A

CN117577219A - Protein-drug combination free energy prediction method and prediction system based on MM/PB (GB) SA

Info

Publication number: CN117577219A
Application number: CN202210935272.5A
Authority: CN
Inventors: 袁曙光; 王世玉; 孙晓琳
Original assignee: Shenzhen Alpha Molecular Technology Co ltd
Current assignee: Shenzhen Alpha Molecular Technology Co ltd
Priority date: 2022-08-03
Filing date: 2022-08-03
Publication date: 2024-02-20

Abstract

The invention discloses a protein-drug combination free energy prediction method and a prediction system based on MM/PB (GB) SA. The prediction method comprises the following steps: the method comprises the following steps: predicting the binding free energy of the protein and its ligand using a parameter comprising at least one of the force field of the protein, the force field of the ligand and the charge number of the ligand; respectively carrying out data fitting on the predicted multiple groups of binding free energy and the binding free energy measured by experiments; analyzing the free energy of binding of the protein to the drug with a best fit set of parameters and methods; wherein, the method for predicting the binding free energy is MM/PBSA and/or MM/GBSA. The method fully considers various situations possibly encountered by free energy prediction, has extremely high robustness and accuracy, and is applicable to both water-soluble proteins and membrane proteins. Compared with the traditional free energy perturbation technology and thermodynamic integration, the precision is improved, and the speed is 40-50 times faster.

Description

Protein-drug combination free energy prediction method and prediction system based on MM/PB (GB) SA

Technical Field

The invention relates to the technical field of drug screening, in particular to a protein-drug combination free energy prediction method and a prediction system based on MM/PB (GB) SA.

Background

Predicting pharmaceutical activity is a very challenging topic in the development of new drugs today. In biological systems, the binding free energy determines the direction of many biological processes, such as: protein folding, enzyme catalysis, and drug target binding. Thus, the combination of free energy predictions plays an indispensable role in the relevant field. In the process of drug discovery, the combination of the drug and the biological target is the basis of drug efficacy, the affinity of the drug directly determines the biological activity of the drug, and the free energy of combination is a quantitative index of the affinity of the drug and the target. Therefore, in the optimization stage of the lead compound, the computer is used for predicting the affinity strength of the candidate drug and the biological target, so that theoretical guidance can be provided for optimizing the lead compound, and the progress of drug discovery can be accelerated.

Currently, the methods commonly used in the drug discovery field to predict receptor-ligand interactions are three methods, namely Free Energy Perturbation (FEP), thermodynamic Integration (TI), and molecular mechanics poisson-boltzmann (generalized bern) surface area (MM/PB (GB) SA), which are widely used for ligand-receptor free energy prediction due to the advantages of fast speed, high robustness, and less computational resource consumption compared to the other two methods.

Disclosure of Invention

Against the background art, the invention solves the problems that: and the accuracy of the combination free energy prediction of the protein drug target and the drug ligand is improved on the premise of low calculation resource consumption by utilizing the free energy prediction technology, so that the time and economic cost for drug research and development are saved. Based on the MM/PB (GB) SA theory, the accuracy of the MM/PB (GB) SA calculation combination free energy is further improved by searching suitable calculation parameters in advance. In the test phase, the method uses lower computing resource consumption and achieves accuracy comparable to that of FEP.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

in one aspect, the invention provides a method for predicting protein-drug binding free energy based on MM/PB (GB) SA, comprising the steps of:

(1) Predicting the free energy of binding of the protein and its ligand using a parameter comprising at least one of the force field of the protein, the force field of the ligand and the charge number of the ligand;

(2) Respectively carrying out data fitting on the plurality of groups of binding free energy predicted in the step (1) and the binding free energy measured by the experiment;

(3) Analyzing the free energy of binding of said protein to the drug with the best fit set of parameters and methods of step (2);

wherein in the step (1), the method for predicting the binding free energy is MM/PBSA and/or MM/GBSA.

In the technical scheme of the invention, the process of calculating the binding free energy of the MM/PB (GB) SA is as follows:

ΔG _bind，solve ＝ΔG _{bind，vacuum} +ΔG _solve

＝ΔE _MM +ΔG _solve -TΔS

＝ΔE _MM +ΔG _PB(GB) +ΔG _SA -TΔS

wherein ΔG _bind，solve Is the free energy of binding, ΔG, in the solution environment _{bind，vaccume} Is the binding free energy under vacuum condition, ΔG _solve Is solvation energy; ΔE _MM Is the action energy of the molecule, T is the temperature, and DeltaS is the entropy change of the system; ΔG _PB(GB) And delta G _sA Respectively the solvation energy delta G _solve In the technical scheme of the invention, delta G is a polar contribution term and a nonpolar contribution term _SA Proportional to the solvent accessible surface area.

In a preferred embodiment, in step (1), the ligand and the drug have the same compound skeleton and similar crystal structure, i.e. the ligand and the drug are optimized from the same lead compound, or the drug is optimized from the ligand, or the ligand is optimized from the drug; the lead compound is a compound molecule which needs to be improved and optimized, and consists of a compound skeleton and at least one functional group, wherein the compound skeleton is a core part of the lead compound molecule, the functional group is an atom or an atomic group which determines the chemical property of the lead compound molecule, and common functional groups comprise hydroxyl, carboxyl, ether bond, aldehyde group, carbonyl and the like.

In certain embodiments, in the MM/GBSA method, the commonly used GB models include GBHCT, GBOBC, GBOBC, GBneck, GBneck2;

preferably, the calculation method of the charge number of the ligand is selected from any one of an empirical method, a semi-empirical method and a quantum chemical calculation method; among them, experience-based ligand charge calculation methods such as: charge calculation method based on CHARMM pervasive force field; semi-empirical ligand charge calculation methods combined empirically and quantitatively, such as: AM1-BCC atomic charge calculation method; atomic charge calculation methods based on quantum chemistry, such as electrostatic potential calculation based on density functional theory (density functional theory, DFT) combined with RESP (Restrained Electro Static Potential, limited electrostatic potential) to fit the calculated atomic charge number;

in the technical scheme of the invention, the force field of the ligand is determined by a calculation method of the charge number of the ligand, namely, after the calculation method of the charge is determined, the force field of the ligand is determined. In the present invention, the optional ligand force field is the generic AMBER force field 2 (Generation Amber Force Field) or the CHARMM generic force field (CHARMM General Force Field), etc.

In certain specific embodiments, the protein is a membrane protein, and the parameter further comprises a dielectric constant (membrane dielectric constant) of the membrane protein.

In a preferred embodiment, in step (2), the test is a wet test;

in the technical scheme of the invention, wet experiments, namely, binding free energy measured by adopting molecular, cell, physiological and other test methods in a laboratory, rather than binding free energy obtained by adopting a computer simulation and bioinformatics method, are adopted to fit with the predicted data in the step (1).

In yet another aspect, the present invention provides a protein-drug binding free energy prediction system based on MM/PB (GB) SA, comprising:

and (3) exploring a data module: for collecting and storing structural information data of the protein and its ligand, structural information data of the drug, and experimentally measured binding free energy data of the protein-ligand;

training data module: the method comprises the steps of predicting the binding free energy of the protein and the ligand thereof, and fitting the predicted binding free energy with experimental results to screen out a group of parameters with the best fitting;

and a prediction data module: analyzing the binding free energy of said protein and said drug with a set of predicted parameters selected to fit best;

wherein, the method for predicting the binding free energy is MM/PBSA and/or MM/GBSA.

Preferably, the training data module comprises a preprocessing unit for preprocessing the protein crystal structure.

Preferably, the pretreatment includes hydrogenation, protonation and energy minimization.

In yet another aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program for execution by a processor of the above-described MM/PB (GB) SA-based protein-drug binding free energy prediction method.

The technical scheme has the following advantages or beneficial effects:

the invention discloses a protein-ligand binding free energy calculation and prediction method based on MM/PB (GB) SA, which has the following advantages relative to FEP and TI: (1) The prediction method provided by the invention has lower consumption of computing resources and can save computing time; (2) the prediction method provided by the invention has high robustness; (3) The accuracy of the prediction method provided by the invention is equivalent to that of free energy perturbation technology (FEP, free energy perturbation) and thermodynamic integral (TI, thermodynamic integration), and the speed is 40-50 times faster; (4) The FEP/TI method is mainly used for water-soluble proteins, the effect on membrane proteins is unknown, the method provided by the invention can be used for predicting the binding free energy of drug molecules of a membrane protein system, and the prediction result has certain accuracy.

Drawings

FIG. 1 is a flow chart of the MM/PB (GB) SA-based protein-ligand binding free energy prediction method of the present invention.

FIG. 2 is a graph showing the correlation between the predicted free binding energy of a water-soluble protein and its ligand and the experimentally measured free binding energy by FEP.

FIG. 3 is a graph showing the correlation between the binding free energy of water-soluble protein and its ligand tested by MM/PB (GB) SA and the binding free energy tested by experiment.

FIG. 4 is a graph showing the correlation between the binding free energy of a membrane protein and a ligand tested by MM/PB (GB) SA and the binding free energy tested by experiments.

FIG. 5 is a diagram of the ligand structure of the CDK2 protein test system of the invention.

FIGS. 6-1 and 6-2 are block diagrams of ligand structures of the P38 protein test system of the present invention.

FIG. 7 is a diagram showing the structure of the ligand of the Thrombin protein test system of the present invention.

FIG. 8 is a ligand structure diagram of the Tyk2 protein test system of the invention.

FIG. 9 is a ligand structure diagram of the mPGES protein test system of the invention.

FIG. 10 is a ligand structure diagram of the GPBAR protein test system of the present invention.

FIG. 11 is a ligand structure diagram of the OX1 protein assay system of the invention.

Detailed Description

The following examples are only some, but not all, of the examples of the invention. Accordingly, the detailed description of the embodiments of the invention provided below is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to fall within the scope of the present invention.

In the present invention, all the equipment, raw materials and the like are commercially available or commonly used in the industry unless otherwise specified. The methods in the following examples are conventional in the art unless otherwise specified.

In one embodiment, a flow chart of a method of predicting free energy of protein-drug binding based on MM/PB (GB) SA is shown in FIG. 1.

Hereinafter, it is described in connection with specific operations.

Test methods and test objects in the following implementations: in a water-soluble system, 4 proteins were tested using free energy perturbation technique FEP, including: CDK2, P38, thrombin, tyk2 proteins; membrane protein objects include 3: mPGES, GPBAR, OX1 protein.

Construction of a test system: in an aqueous solution system, firstly, carrying out treatments such as hydrogenation, prediction of the protonation state of amino acid, energy minimization and the like on the three-dimensional structure of known protein crystals; then the ligand molecules and the drug molecules to be predicted are subjected to a limiting molecular docking technology to obtain the binding mode of the drug molecules and the protein; then pass through molecular dynamics tool Gromacs build 12X 12 angstroms ³ Adding sodium chloride solution with the concentration of 0.15M to neutralize redundant charges to enable the whole simulation system to be in electric neutrality and simulate the physiological state of the simulation system in a cell environment, wherein the ligand structure diagrams of the four water-soluble protein test systems are shown in fig. 5, 6-1, 6-2, 7 and 8 respectively; the ligand structure diagrams of the three membrane protein test systems are shown in fig. 9, 10 and 11, respectively.

Predicting the free energy of binding of a known protein drug target to a ligand: after the test system is established, multiple dynamic information of related protein drug targets and ligand molecules under the time scale of 5ns is simulated by molecular dynamics simulation (molecular dynamics simulations, MD); the free energy of binding of the protein drug target to the ligand molecule is predicted by thermodynamic cycle equation of MM/PB (GB) SA as follows:

ΔG _bind，solve ＝ΔG _{bind，vacuum} +ΔG _solve

＝ΔE _MM +ΔG _solve -TΔS

＝ΔE _MM +ΔG _PB(GB) +ΔG _SA -TΔS

in the above, ΔG _bind，solve Is the free energy of binding, ΔG, in the solution environment _{bind，vaccume} Is the binding free energy under vacuum condition, ΔG _solve Is solvation energy; ΔE _MM Is the action energy of the molecule, T is the simulated temperature, and delta S is the entropy change of the system; ΔG _PB(GB) And delta G _SA Respectively the solvation energy delta G _solve In the technical scheme of the invention, delta G is a polar contribution term and a nonpolar contribution term _SA Proportional to the solvent accessible surface area.

In the process of combining free energy through the prediction, systematic testing is carried out on different parameters in the calculation process and a method for acquiring the parameters, and the method comprises the following steps: the charge number of the ligand, the molecular force field of the protein drug target, the molecular force field of the drug ligand, whether additional single-point halogen model is added to treat halogen, different implicit GB models are adopted for a membrane protein system and the like are calculated by different methods.

In one embodiment, the influence of the number of charges obtained by different ligand charge calculation methods and the molecular force field of different proteins on the binding free energy of the water-soluble proteins and the membrane proteins and the ligand molecules thereof is tested, wherein the ligand charge calculation method comprises the following steps: RESP-DFT, RESP-HF, AM1-BCC, CGenFF; the molecular force field comprises: CHARMM36m (abbreviated as CHARMM), amber FF99SB (abbreviated as FF99 SB). The correlation coefficient (obs. R) and the Mean Absolute Error (MAE) obtained by one-to-one fitting the binding free energy of the protein and its ligand by the above parameters and calculation methods are shown in Table 1 (in the table, the higher the value of the index is, the better the value of the index is, the lower the value of the index is, and the same table is as below).

In one embodiment, it is tested whether adding an additional single point halogen model to treat the halogen to calculate the charge number of the ligand and the effect of the different protein molecular force fields on the free energy of binding of the water-soluble protein and the membrane protein to its ligand molecules, wherein the ligand charge calculation method adding a single point halogen treatment model comprises RESP_DFT_EP and RESP_HF_EP, and the RESP_DFT and RESP_HF are not added with a single point halogen treatment model (wherein EP means an additional single point); the molecular force field comprises: CHARMM36m (abbreviated as CHARMM), amber FF99SB (abbreviated as FF99 SB). The correlation coefficient (obs. R) and the Mean Absolute Error (MAE) obtained by one-to-one fitting the binding free energy of the protein and its ligand with the above parameters and calculation methods to the binding free energy measured experimentally are shown in Table 2.

In one embodiment, the influence of different implicit GB models and different protein molecular force fields on the binding free energy of the membrane proteins and ligand molecules thereof is tested, wherein the implicit GB models comprise: GBHCT (igb =1), GBOBC (igb =2), GBOBC2 (igb =5), GBneck (igb =7), GBneck2 (igb =8); the molecular force field comprises: CHARMM36m (abbreviated as CHARMM) and Amber FF99SB (abbreviated as FF99 SB), the ligand charge number calculation method comprises the following steps: resp_dft, resp_hf, am1_bcc, CGenFF. The correlation coefficient (obs. R) and Mean Absolute Error (MAE) obtained by one-to-one fitting the binding free energy of the protein and its ligand to the experimentally measured binding free energy are shown in Table 3.

In one embodiment, the effect of the force field of different protein molecules on the free energy of binding of the membrane protein to its ligand molecules under consideration of the dielectric constant (membrane dielectric constant) of the membrane protein was tested, wherein the dielectric constant (membrane dielectric constant) of the membrane protein was set to emem=1 to 9, respectively; the molecular force field comprises: CHARMM36m (abbreviated as CHARMM) and Amber FF99SB (abbreviated as FF99 SB), the ligand charge number calculation method comprises the following steps: resp_dft, resp_hf, am1_bcc, CGenFF. The correlation coefficient (obs. R) and Mean Absolute Error (MAE) obtained by one-to-one fitting the binding free energy of the protein and its ligand predicted by the above parameters and calculation methods to the experimentally measured binding free energy are shown in Table 4 (wherein inp is the calculation optimization method of the nonpolar solvation free energy).

In one example, the correlation between the free energy of binding of the protein to its ligand, as measured by the free energy perturbation technique (FEP), and the free energy of binding, as measured by wet experiments, is shown in fig. 2.

In one example, the correlation of the binding free energy of the protein and its ligand tested in MM/PB (GB) SA and the experimentally measured binding free energy in the four water-soluble protein systems is shown in FIG. 3.

In one example, in the three membrane protein systems, the binding free energy of the protein and its ligand was tested in MM/PB (GB) SA and the correlation diagram of the experimentally measured binding free energy is shown in FIG. 4.

In one example, the correlation coefficients of the free energy of binding of the protein to the ligand and the free energy of binding as measured by wet experiments are shown in Table 5 (bolded are preferred data in MM/GBSA, MM/PBSA) with molecular docking (docking), optimized MM/PB (GB) SA methods in tables 1-4, and free energy perturbation technique (FEP): in table 5, the larger the absolute value of the numerical value, the higher the accuracy of prediction.

TABLE 5

Therefore, the MM/PB (GB) SA result after systematically optimizing parameters is generally superior to the free energy perturbation technology FEP of the existing gold standard by the prediction method provided by the invention.

In addition, as the current FEP/TI technology mostly adopts an OPLS molecular force field, the force field cannot well reproduce the physical properties of a real cell membrane. Rupture of the cell membrane occurs even during long-time scale molecular dynamics simulation, so FEP/TI based force field is not suitable for drug molecule binding free energy prediction of membrane protein system. The CHARMM36m force field and the AMBER99SB force field adopted in the invention are both very reliable membrane protein force fields, so that the method can be well suitable for predicting the free energy of the combination of drug molecules of a membrane protein system. From the predicted results, the predicted results of three different test systems for the membrane protein system are all highly consistent with the experimental values.

In terms of speed, for FEP/TI calculations, from published reports, on a GTX2080TI graphics card (GPU), FEP/TI can only predict 0.4-0.5 molecules a day. And 20 molecules can be predicted in one day by the optimized MM/PB (GB) SA. The speed is 40-50 times faster than the FEP/TI method.

In one embodiment, the invention provides a MM/PB (GB) SA-based protein-drug binding free energy prediction system comprising:

and (3) exploring a data module: the method comprises the steps of collecting and storing structural information data of proteins and ligands thereof, structural information data of medicines and experimentally measured binding free energy data of the proteins and the ligands;

training data module: the method comprises the steps of predicting the binding free energy of a protein and a ligand thereof, and fitting the predicted binding free energy with experimental results to screen out a group of parameters with the best fit;

and a prediction data module: analyzing the binding free energy of the protein and the drug by using a group of predicted parameters which are screened and have the best fit;

The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims

1. A method for predicting free energy of protein-drug binding based on MM/PB (GB) SA, comprising the steps of:

2. The method of claim 1, wherein the MM/PB (GB) SA computes the binding free energy as follows:

ΔG _bind，solve ＝ΔG _{bind vacuum} +ΔG _solve

＝ΔE _MM +ΔG _solve -TΔS

＝ΔE _MM +ΔG _PB(GB) +ΔΔG _SA -TΔS

wherein ΔG _bind，solve Is the free energy of binding, ΔG, in the solution environment _{bind，vaccume} Is the binding free energy under vacuum condition, ΔG _solve Is solvation energy; ΔE _MM Is the action energy of the molecule, T is the temperature, deltaS is the entropy change of the system, deltaG _PB(GB) And delta G _SA Respectively the solvation energy delta G _solve Polar and nonpolar contributing terms in (a).

3. The method according to claim 1, wherein in step (1), the ligand and the drug are optimized from the same lead compound, or the drug is optimized from the ligand, or the ligand is optimized from the drug.

4. The method according to claim 1, wherein the method for calculating the charge number of the ligand is selected from any one of an empirical method, a semi-empirical method and a quantum chemical calculation method.

5. The method of claim 1, wherein the protein is a membrane protein and the parameter further comprises a dielectric constant of the membrane protein.

6. The method of claim 1, wherein in step (2), the assay is a wet assay.

7. A MM/PB (GB) SA-based protein-drug binding free energy prediction system, comprising:

8. The protein-drug binding free energy prediction system of claim 7, wherein the training data module comprises a preprocessing unit for preprocessing the protein crystal structure.

9. The protein-drug binding free energy prediction system of claim 8, wherein the pretreatment comprises hydrogenation, protonation, and energy minimization.

10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, the program being executed by a processor to perform the MM/PB (GB) SA-based protein-drug binding free energy prediction method of any of claims 1 to 6.