CN110047559B - Method, system, apparatus and medium for calculating protein and drug binding free energy - Google Patents

Method, system, apparatus and medium for calculating protein and drug binding free energy Download PDF

Info

Publication number
CN110047559B
CN110047559B CN201910167720.XA CN201910167720A CN110047559B CN 110047559 B CN110047559 B CN 110047559B CN 201910167720 A CN201910167720 A CN 201910167720A CN 110047559 B CN110047559 B CN 110047559B
Authority
CN
China
Prior art keywords
protein
energy
drug
free energy
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910167720.XA
Other languages
Chinese (zh)
Other versions
CN110047559A (en
Inventor
段莉莉
黄开放
从亚龙
李皓
钟素素
董淑衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinrui Gene Technology Co ltd
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN201910167720.XA priority Critical patent/CN110047559B/en
Publication of CN110047559A publication Critical patent/CN110047559A/en
Application granted granted Critical
Publication of CN110047559B publication Critical patent/CN110047559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present disclosure discloses a method, a system, a device and a medium for calculating protein and drug binding free energy, wherein a function of the protein and drug binding free energy is constructed, and the function of the protein and drug binding free energy is a functional relation between the binding free energy of a protein and drug compound and electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy and interaction entropy; acquiring each energy item of each protein and drug compound in the training set, and performing multivariate linear fitting on the function of the binding free energy of the proteins and the drugs by using each energy item of each protein and drug compound in the training set and the experimental value of each protein and the drugs to obtain the function of the binding free energy of the proteins and the drugs; inputting each energy item of the protein and drug complex to be tested into the function of the trained binding free energy of the protein and the drug, and outputting the binding free energy of the protein and the drug.

Description

Method, system, apparatus and medium for calculating protein and drug binding free energy
Technical Field
The present disclosure relates to a method, system, device and medium for calculating protein and drug binding free energy.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
The interactions between biological molecules, such as receptor-ligands, antigen-antibodies, DNA-proteins, sugar-lectins, RNA-ribosomes, etc., determine to a large extent the physiological functions of the organism, such as self-replication, metabolism, and information processing. Therefore, the study of the recognition principle between biomolecules plays a crucial role in biology. The combination of protein and medicine is the core of medicine design and medicine interaction, and has direct medical research significance and application value in the experimental field and the calculation field in biomedicine. Particularly, in virtual screening, the interaction mode between the target protein and the drug is researched, the binding capacity of the drug is evaluated by using the existing method, and the drug with stronger binding capacity is finally selected.
To the inventors' knowledge, in general, we describe the binding affinity between two with free energy of binding. At present, methods capable of accurately calculating the bound free energy are Free Energy Perturbation (FEP) and Thermodynamic Integration (TI), however, the calculation cost of the two methods is often prohibitive, and the calculated result is difficult to reach a convergence state because the methods need to simulate various non-physical intermediate states of the compound. The Linear Interaction Energy (LIE) method is also a classical method, which estimates the binding free energy by using the interaction energy and adjustable parameters, but the method is only suitable for compounds with similar interaction mechanisms, so that the universality is poor and the calculation cost is very high.
Another more common method is molecular mechanics/poisson-boltzmann surface area (MM/PBSA), which is a method combining molecular mechanics with the calculation of free energy by a continuous medium model, and compared with FEP and TI methods, this method only requires phase space traversal of two states, target and drug binding and non-binding, without considering intermediate states, and thus the amount of calculation is greatly reduced, and is more common. In MM/PBSA, the entropy change is typically approximated by a canonical vibration (Nmode) method. The method uses a harmonic oscillator approximation that considers translation, rotation and vibration as uncoupled, thereby calculating the contributions of translation, rotation and vibration. Due to the large computational load, the non-harmonic contributions tend to be neglected.
Furthermore, it has been found that, in the same trace, the variation of the entropy of vibration reaches even 5kcal/mol [ Kuhn, b.; kollman, P.A. binding of a diversity Set of Ligands to Avidin and Streptavidin, An Accurate Quantitative Prediction of the same Relative affinity by a Combination of Molecular mechanisms and continuous Solvent models J.Med.chem.2000,43,3786-3791. Therefore, many researchers choose to ignore the contribution of entropy to the binding free energy, making the end result often less convincing.
Disclosure of Invention
To address the deficiencies of the prior art, the present disclosure provides methods, systems, devices, and media for calculating protein and drug binding free energy;
in a first aspect, the present disclosure provides a method for calculating the free energy of protein binding to a drug;
a method for calculating free energy of protein and drug combination, comprising:
constructing a function of the free energy of protein and drug binding as a function of the free energy of protein and drug complex as a function of electrostatic, polar solvation energy, van der waals, nonpolar solvation energy, and entropy of interaction;
acquiring energy items of each protein and drug compound in the training set, wherein the energy items comprise: electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy, and entropy of interaction; performing multi-element linear fitting on the function of the free energy of the combination of the protein and the medicine by using each energy item of each protein and medicine compound in a training set and the experimental value of each protein and medicine to obtain the function of the free energy of the combination of the protein and the medicine;
inputting the electrostatic term, the polar solvation energy term, the van der Waals term, the nonpolar solvation energy and the interaction entropy of the protein and drug compound to be calculated into the fitted function of the free energy of the protein and the drug, and outputting the free energy of the protein and the drug.
In a second aspect, the present disclosure also provides a protein and drug binding free energy calculation system;
a protein and drug binding free energy calculation system comprising:
a function construction module of the free energy of combination of the protein and the drug, which is configured to construct a function of the free energy of combination of the protein and the drug, wherein the function of the free energy of combination of the protein and the drug is a function relation of the free energy of combination of the protein and the drug and electrostatic terms, polar solvation energy, van der waals terms, nonpolar solvation energy and interaction entropy;
a multivariate linear fit module configured to acquire an energy term for each protein and drug complex in the training set, the energy terms including: electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy, and entropy of interaction; performing multi-element linear fitting on the function of the free energy of the combination of the protein and the medicine by using each energy item of each protein and medicine compound in a training set and the experimental value of each protein and medicine to obtain the function of the free energy of the combination of the protein and the medicine;
and a binding free energy calculation module configured to input the electrostatic term, the polar solvation energy term, the van der waals term, the non-polar solvation energy and the interaction entropy of the protein-drug complex to be calculated into a function of the trained binding free energy of the protein and the drug, and output the binding free energy of the protein and the drug.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method of the first aspect.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.
Compared with the prior art, the beneficial effect of this disclosure is:
the method integrates the advantages and the disadvantages of PBSA and IE, calculates each energy item of a large amount of protein and drug compounds with experimental values through the PBSA and the IE, performs multiple linear fitting on each calculated energy item and the experimental values to obtain a group of linear fitting functions, and accurately and simply calculates the free energy of combination of the protein and the drug compounds through the fitting functions.
Because each energy item of the fitting function is calculated by a mainstream method and is fitted by a large number of compounds, the calculated combined free energy is extremely close to an experimental value in value from the view of a final calculation result, and the overall correlation is far better than that of the conventional method.
Meanwhile, from the test result performed on the test set, the obtained rule is consistent with the rule obtained on the training set. Therefore, the method has excellent accuracy and universality. Moreover, since the main advantage (i.e., fast computation speed) of the IE method is fully utilized in the present disclosure, the present disclosure also has the advantage of low computation cost.
Based on the above advantages, the present disclosure will highlight the brilliance in describing the affinity of protein-drug based systems. Particularly, in the virtual screening of the medicines, the method provides a new way with reliable calculation, wide application and low cost.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flow diagram of a method of one or more embodiments.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, a first embodiment, an embodiment of the present disclosure, provides a method for calculating free energy of protein binding to a drug;
a method for calculating free energy of protein and drug combination, comprising:
constructing a function of the free energy of protein and drug binding as a function of the free energy of protein and drug complex as a function of electrostatic, polar solvation energy, van der waals, nonpolar solvation energy, and entropy of interaction;
acquiring energy items of each protein and drug compound in the training set, wherein the energy items comprise: electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy, and entropy of interaction; performing multi-element linear fitting on the function of the free energy of the combination of the protein and the medicine by using each energy item of each protein and medicine compound in a training set and the experimental value of each protein and medicine to obtain the function of the free energy of the combination of the protein and the medicine;
inputting the electrostatic term, the polar solvation energy term, the van der Waals term, the nonpolar solvation energy and the interaction entropy of the protein and drug compound to be calculated into the fitted function of the free energy of the protein and the drug, and outputting the free energy of the protein and the drug.
Further, the method further comprises:
inputting the electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy and interaction entropy of the protein and drug complexes in the test set into the fitted functions of the protein and drug binding free energy, and outputting the protein and drug binding free energy estimated values; and evaluating the reasonability, universality and stability of the fitting function by comparing the calculated estimated value with the experimental value.
Further, the specific steps for constructing the function of the free energy of the protein and the drug are as follows:
PBSA_IE=a(ΔEele+ΔGpb)+bΔEvdw+cΔGnp+dIE+e
whereinPBSA _ IE represents the free energy of binding of the protein to the drug; Δ GpbRepresents a polar solvation energy term; delta EvdwRepresents the van der waals energy term of protein and drug; Δ GnpRepresents a non-polar solvation energy term; IE represents the entropy of interaction of the protein with the drug complex; a. b, c, d and e respectively represent parameters to be fitted.
Further, the step of preprocessing the training set and the test set prior to performing the multivariate linear fit comprises:
the same pretreatment procedure was used for both The training and test sets, and The Protein library numbers (PDB IDs) for The proteins and drugs in The training and test sets were derived from The "Protein-ligand complexes: The general set minus refined set" data set in The PDB bind (website: http:// www.pdbbind-cn. org.) database. We randomly selected 84 complexes from 1047 complexes in 1 a-1 u as training set and 44 complexes from 860 complexes in 1 v-2 h as test set. The training set and test set do not contain complexes with charge numbers above 2 or containing metal ions). The crystal structure of the complex was derived from the PDB (website: https:// www.rcsb.org /) database, and the experimental values were queried from the PDB bind.
(110) Pretreatment of the medicine:
the medicine is subjected to two-step quantum processing by utilizing Gaussian 03 software:
the first step is as follows: optimizing the medicine by using an HF/6-31G method of Gaussian 03 software to find an optimal structure, namely a structure with the lowest energy;
the second step is that: performing single-point energy calculation on a Gaussian group B3LYP/cc-PVTZ to obtain the electrostatic potential of the medicine; fitting the charge of each atom by a constrained electrostatic potential method;
(120) protein pretreatment:
because the obtained protein crystal structure generally does not contain information of hydrogen atoms, the hydrogen atoms of the protein lack are supplemented through a LEAP module in AMBER 12;
(130) protein and drug complex pretreatment:
protein is extracted from the plantThe substance and drug complex is placed in AMBERff 12SB force field and placed in a periodic TIP3P water box with solute at minimum distance from the edges of the box
Figure GDA0002981866390000051
Meanwhile, according to the electrification of each system, adding reverse ions to carry out electric balance on the systems;
further, each system in the training set and the test set adopts an AMBER program package to perform MD simulation, and the specific simulation mode is as follows:
(140) optimizing the energy of a protein and drug composite system:
minimizing the energy of the protein-drug compound by a steepest descent method, and then minimizing by a conjugate gradient until the energy of the system reaches convergence;
(150) heating the composite system to normal temperature:
a restrictive MD simulation was performed at 300ps with a constraint constant of
Figure GDA0002981866390000052
The temperature of each system was gradually raised from 0K to 300K, the temperature was adjusted using Langew dynamics, and the collision frequency was 1.0ps-1All bonds related to hydrogen atoms are constrained by a SHAKE algorithm, and the simulation step length is 2 fs;
(160) composite system non-limiting MD simulation:
non-limiting MD simulations with a 2ns step size of 2fs were performed on protein-drug systems:
the first 1ns recorded a trace every 2000 steps, which contained information on the position of all atoms in the water molecules of the complex, the counterion and the water box, the purpose of this MD simulation being to move the complex to a relatively equilibrium conformation;
every 5 steps the last 1ns trace is recorded, which contains only the positional information of the complex, and the energy terms combined with the free energy are obtained by sampling the trace.
Further, the specific steps of collecting the energy items of each protein and drug system in the training set and the test set are as follows:
(170)
Figure GDA0002981866390000053
wherein,
Figure GDA0002981866390000054
is the electrostatic energy of the composite and is,
Figure GDA0002981866390000055
is the electrostatic energy of the protein and is,
Figure GDA0002981866390000056
is the electrostatic energy of the drug; it is given by the following formula:
Figure GDA0002981866390000061
wherein N is the total number of atoms, qi、qjIs the charge of the ith and jth atom, RijIs the distance between the ith atom and the jth atom.
(180) The expression of the van der waals energy term for proteins and drugs is:
Figure GDA0002981866390000062
wherein,
Figure GDA0002981866390000063
and
Figure GDA0002981866390000064
van der Waals potentials of the complex, protein and drug, respectively, are given by the following formula
Figure GDA0002981866390000065
Wherein A isij、BijIs the ith originalLanna-Jones potential between child and j atom, Aij、BijDerived from the AMBER ff12SB force field.
(190) Polar solvation energy term:
ΔGpbfor polar solvation energy terms, solving the Poisson-Boltzmann (PB) equation by an implicit solvent model yields:
Figure GDA0002981866390000066
wherein ε (r) is the dielectric constant of the molecule when r falls inside or outside the surface of the molecule; k (r) represents the Debye-Huckel constant in the solvent region; ρ (r) represents the charge distribution function of the solute molecules; Φ (r) is the potential distribution at r. In this method, the term is solved by the PBSA program in the AMBER software package to solve the PB equation, and the internal and external dielectric constants are set to 1 and 80, respectively.
(200) Non-polar solvation energy term:
it is given by the solvent accessible surface area SASA:
ΔGnp=γSASA+β
wherein the empirical constants γ and β are set to
Figure GDA0002981866390000067
And 0.92 kcal/mol.
(210) Entropy of interaction of protein with drug complex:
Figure GDA0002981866390000068
wherein K is Boltzmann constant, T is temperature,
Figure GDA0002981866390000069
as the interaction energy of the protein with the drug, it can be evaluated by averaging MD simulations:
Figure GDA0002981866390000071
then
Figure GDA0002981866390000072
Where N is the total number of acquired constellations, i is the ith frame constellation, β is a constant, tiIs the time of the i-th frame,<.>a mathematical symbol representing an averaging;
Figure GDA0002981866390000073
is the amplitude of the protein-drug interaction energy fluctuation around the mean energy;
based on the concept of entropy of Interaction (IE), the present disclosure uses it to calculate the entropy change of a complex, instead of the conventional Nmode method. It is derived from a strict theoretical formula, and has more convincing theoretical basis than the Normal mode method which is derived approximately.
Secondly, the average value of the interaction entropy is calculated by directly simulating the conformational locus of the dynamics of the protein and drug complex, so that the interaction energy of the complex in the simulated 'gas phase' of each frame can be used for calculating IE, and the stability of the calculated entropy is better than that of the current method for calculating the entropy.
Furthermore, the Normal mode method, which makes the difference of the absolute entropy values of the complex, protein and drug, often causes numerical error due to too large absolute value, and the IE method perfectly avoids this disadvantage by directly calculating the entropy change of the system, and at the same time, it greatly accelerates the calculation speed. Therefore, the method has the characteristics of strong reliability and low calculation cost.
Due to the inherent defectiveness of the PBSA, the calculation result is often inaccurate and reliable. Although the theoretical value calculated by combining the PBSA and the IE method (namely the PBSA + IE method) has excellent correlation with the experimental value, the numerical value has larger absolute error, and the finally obtained free energy of combination is often overestimated.
The method belongs to a scoring function method based on PBSA and interaction entropy IE, and different weights are given to energy items by performing multivariate linear fitting on a training set consisting of a large number of randomly selected proteins and drug compounds, so that the inherent defects of the PBSA are overcome, and the free energy of the combination of other proteins and the drug compounds is accurately predicted. Meanwhile, in order to verify the universality of the method, a plurality of protein and drug complexes which are not repeated with a training set are randomly selected as a testing set to evaluate the scoring function.
Combining free energy term calculations
The binding free energy is expressed by the sum of the enthalpy term and the entropy:
ΔG=ΔH-TΔS (1)
since the present disclosure employs the IE method to calculate the entropy terms, the above equation can be written as:
ΔG=ΔH+IE (2)
PBSA calculated enthalpy term
The free energy of protein and drug can be regarded as two parts of gas phase binding free energy and solvation energy:
ΔH=ΔEMM+ΔGsol (3)
wherein Δ EMMIs gas phase energy, and is in the form of:
ΔEMM=ΔEele+ΔEvdw (4)
wherein Δ EeleThe expression is as follows:
Figure GDA0002981866390000081
wherein,
Figure GDA0002981866390000082
and
Figure GDA0002981866390000083
electrostatic energy of the complex, protein and drug, respectively, is obtained by the following formula
Figure GDA0002981866390000084
Wherein N is the total number of atoms, qi、qjIs the charge of the ith and jth atom, RijThe distance between the i-th and j-th atoms (the same applies below). Delta E in the formula (4)vdwIs the van der waals energy term for proteins and drugs:
Figure GDA0002981866390000085
wherein,
Figure GDA0002981866390000086
and
Figure GDA0002981866390000087
the van der waals potentials of the complex, protein and drug, respectively, are given by the following formula:
Figure GDA0002981866390000088
wherein A isij、BijThese two sources are AMBER ff12SB force fields for the Lanner-Jones potential between the ith atom and the jth atom.
In the formula (3) < delta > GsolIs the solvation energy of the complex, which can be divided into two parts:
ΔGsol=ΔGpb+ΔGnp (9)
wherein Δ GpbFor polar solvation energy terms, solving the Poisson-Boltzmann (PB) equation by an implicit solvent model yields:
Figure GDA0002981866390000089
wherein ε (r) is the dielectric constant of the molecule when r falls inside or outside the surface of the molecule; k (r) represents the Debye-Huckel constant in the solvent region; ρ represents the charge distribution of the solute molecule; Φ (r) is the potential distribution at r. In this method, the term is solved by the PBSA program in the AMBER software package to solve the PB equation, and the internal and external dielectric constants are set to 1 and 80, respectively.
In the formula (9) < delta > GnpIs a non-polar solvation energy term that is derived from the empirical solvent accessible surface area SASA:
ΔGnp=γSASA+β (11)
in the embodiment of the present application, the empirical constants γ and β are set to be, respectively
Figure GDA0002981866390000091
And 0.92 kcal/mol. The calculated constellation is a total of 100 frames of constellation taken every 1000 frames from the constellation space in the last 1ns for calculation of PBSA.
IE calculation entropy term
In the method of entropy of interaction employed in the practice of the present application, the portion of the gas phase that incorporates the free energy is derived from the following equation:
Figure GDA0002981866390000092
wherein Ep、ElAnd EwRespectively the internal energy of protein, medicine and water,
Figure GDA0002981866390000093
and
Figure GDA0002981866390000094
the interaction energy of protein and drug, protein-water and drug-water, respectively.
Figure GDA0002981866390000095
Is the average value of the interaction energy of the protein and the medicine,
Figure GDA0002981866390000096
Figure GDA0002981866390000097
is the amplitude of the fluctuation of the protein interaction energy with the drug around the mean energy. Therefore, we define IE as:
Figure GDA0002981866390000098
wherein K is the Boltzmann constant, T is the temperature,
Figure GDA0002981866390000099
as the interaction energy of the protein with the drug, it can be evaluated by averaging MD simulations:
Figure GDA00029818663900000910
then
Figure GDA00029818663900000911
N is the total number of acquired constellation frames, i is the ith frame constellation, beta is a constant, tiThe time of the ith frame.
Multivariate linear fitting: PBSA _ IE
According to the essential relationship between the energy items of the binding free energy, the present disclosure performs linear fitting in the following manner:
PBSA_IE=a(ΔEele+ΔGpb)+bΔEvdw+cΔGnp+dIE+e (16)
in a second embodiment, the disclosed embodiments also provide a protein and drug binding free energy calculation system;
a protein and drug binding free energy calculation system comprising:
a function construction module of the free energy of combination of the protein and the drug, which is configured to construct a function of the free energy of combination of the protein and the drug, wherein the function of the free energy of combination of the protein and the drug is a function relation of the free energy of combination of the protein and the drug and electrostatic terms, polar solvation energy, van der waals terms, nonpolar solvation energy and interaction entropy;
a multivariate linear fit module configured to acquire an energy term for each protein and drug complex in the training set, the energy terms including: electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy, and entropy of interaction; performing multi-element linear fitting on the function of the free energy of the combination of the protein and the medicine by using each energy item of each protein and medicine compound in a training set and the experimental value of each protein and medicine to obtain the function of the free energy of the combination of the protein and the medicine;
and a binding free energy calculation module configured to input the electrostatic term, the polar solvation energy term, the van der waals term, the non-polar solvation energy and the interaction entropy of the protein-drug complex to be calculated into a function of the trained binding free energy of the protein and the drug, and output the binding free energy of the protein and the drug.
In a third embodiment, the disclosed embodiment further provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, implement the steps of the method of the first embodiment.
The present disclosure also provides an electronic device, which includes a memory, a processor, and a computer instruction stored in the memory and executed on the processor, where when the computer instruction is executed by the processor, each operation in the method is completed, and details are not described herein for brevity.
It should be understood that in the present disclosure, the processor may be a central processing unit CPU, but may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In a fourth embodiment, the present disclosure further provides a computer-readable storage medium for storing computer instructions, and the computer instructions, when executed by a processor, perform the steps of the method according to the first embodiment.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here. Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a division of one logic function, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In conclusion, the PBSA _ IE method can widely, stably and accurately calculate the free energy of protein binding to a drug complex. Meanwhile, the advantages and the disadvantages of the PBSA and the method are also integrated, and the method has good advantages in the aspect of calculation cost.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (6)

1. A method for calculating free energy of protein and drug combination, which is characterized by comprising the following steps:
constructing a function of the free energy of protein and drug binding as a function of the free energy of protein and drug complex as a function of electrostatic, polar solvation energy, van der waals, nonpolar solvation energy, and entropy of interaction;
acquiring energy items of each protein and drug compound in the training set, wherein the energy items comprise: electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy, and entropy of interaction; performing multi-element linear fitting on the function of the free energy of the combination of the protein and the medicine by using each energy item of each protein and medicine compound in a training set and the experimental value of each protein and medicine to obtain the function of the free energy of the combination of the protein and the medicine;
inputting the electrostatic term, the polar solvation energy term, the van der waals term, the nonpolar solvation energy and the interaction entropy of the protein and drug compound to be calculated into the fitted function of the free energy of the protein and the drug, and outputting the free energy of the protein and the drug;
the specific steps for constructing the function of the free energy of the combination of the protein and the drug are as follows:
PBSA_IE=a(ΔEele+ΔGpb)+bΔEvdw+cΔGnp+dIE+e
wherein PBSA _ IE represents the free energy of the protein combined with the drug; Δ GpbRepresents a polar solvation energy term; delta EvdwRepresents the van der waals energy term of protein and drug; Δ GnpRepresents a non-polar solvation energy term; IE represents the entropy of interaction of the protein with the drug complex; a. b, c, d and e respectively represent parameters to be fitted;
each system in the training set adopts an AMBER program package to perform MD simulation, and the specific simulation mode is as follows:
(140) optimizing the energy of a protein and drug composite system: minimizing the energy of the protein-drug compound by a steepest descent method, and then minimizing by a conjugate gradient until the energy of the system reaches convergence;
(150) heating the composite system to normal temperature: a restrictive MD simulation was performed at 300ps with a constraint constant of
Figure FDA0002981866380000012
Figure FDA0002981866380000013
The temperature of each system was gradually raised from 0K to 300K, the temperature was adjusted using Langew dynamics, and the collision frequency was 1.0ps-1All bonds related to hydrogen atoms are constrained by a SHAKE algorithm, and the simulation step length is 2 fs;
(160) composite system non-limiting MD simulation: non-limiting MD simulations with a 2ns step size of 2fs were performed on protein-drug systems: the first 1ns recorded a trace every 2000 steps, which contained information on the position of all atoms in the water molecules of the complex, the counterion and the water box, the purpose of this MD simulation being to move the complex to a relatively equilibrium conformation; recording the track every 5 steps in the last 1ns, wherein the track only contains the position information of the compound, and each energy item combining free energy is obtained by sampling the track;
the polar solvation energy term is obtained by solving a Poisson-Boltzmann (PB) equation through an implicit solvent model:
Figure FDA0002981866380000011
wherein ε (r) is the dielectric constant of the molecule when r falls inside or outside the surface of the molecule; k (r) represents the Debye-Huckel constant in the solvent region; ρ (r) represents the charge distribution function of the solute molecules; Φ (r) is the potential distribution at r;
the non-polar solvation energy term: it is given by the solvent accessible surface area SASA:
ΔGnp=γSASA+β
wherein the empirical constants γ and β are set to
Figure FDA0002981866380000021
And 0.92 kcal/mol;
entropy of interaction of the protein with the drug complex:
Figure FDA0002981866380000022
wherein K is Boltzmann constant, T is temperature,
Figure FDA0002981866380000023
as the interaction energy of the protein with the drug, it can be evaluated by averaging MD simulations:
Figure FDA0002981866380000024
then
Figure FDA0002981866380000025
Where N is the total number of acquired constellations, i is the ith frame constellation, β is a constant, tiIs the time of the i-th frame,<.>a mathematical symbol representing an averaging;
Figure FDA0002981866380000026
is the amplitude of the fluctuation of the protein interaction energy with the drug around the mean energy.
2. The method of claim 1, wherein the step of preprocessing the training set prior to performing the multivariate linear fit comprises:
the same pretreatment steps are adopted in the training set and the testing set, and PDB IDs of the protein and the medicine in the training set and the testing set are derived from a PDB bind database;
(110) pretreatment of the medicine:
the medicine is subjected to two-step quantum processing by utilizing Gaussian 03 software:
the first step is as follows: optimizing the medicine by using an HF/6-31G method of Gaussian 03 software to find an optimal structure, namely a structure with the lowest energy;
the second step is that: performing single-point energy calculation on a Gaussian group B3LYP/cc-PVTZ to obtain the electrostatic potential of the medicine; fitting the charge of each atom by a constrained electrostatic potential method;
(120) protein pretreatment:
supplementing the missing hydrogen atoms of the protein through a LEAP module in AMBER 12;
(130) pretreatment of a composite system:
the protein and drug complex was placed in an AMBERff 12SB force field and placed in a periodic TIP3P water box with solute at a minimum distance from the edges of the box
Figure FDA00029818663800000310
Meanwhile, according to the charging property of each system, reverse ions are added to carry out electric balance on the system.
3. The method of claim 1, wherein the step of collecting the energy terms for each protein and drug system in the training set and test set comprises:
(170)
Figure FDA0002981866380000031
wherein,
Figure FDA0002981866380000032
is the electrostatic energy of the composite and is,
Figure FDA0002981866380000033
is the electrostatic energy of the protein and is,
Figure FDA0002981866380000034
is the electrostatic energy of the drug; it is given by the following formula:
Figure FDA0002981866380000035
wherein N is the total number of atoms, qi、qjIs the charge of the ith and jth atom, RijIs the distance between the ith atom and the jth atom;
(180) the expression of the van der waals energy term for proteins and drugs is:
Figure FDA0002981866380000036
wherein,
Figure FDA0002981866380000037
and
Figure FDA0002981866380000038
van der Waals potentials of the complex, protein and drug, respectively, are given by the following formula
Figure FDA0002981866380000039
Wherein A isijIs the Lanna-Jones potential between the ith atom and the jth atom, BijIs a Lanna-Jones potential between the ith atom and the jth atom, Aij、BijDerived from the AMBER ff12SB force field.
4. A protein and drug binding free energy calculation system, comprising:
a function construction module of the free energy of combination of the protein and the drug, which is configured to construct a function of the free energy of combination of the protein and the drug, wherein the function of the free energy of combination of the protein and the drug is a function relation of the free energy of combination of the protein and the drug and electrostatic terms, polar solvation energy, van der waals terms, nonpolar solvation energy and interaction entropy;
a multivariate linear fit module configured to acquire energy terms for each protein and drug complex in the training set, the energy terms including: electrostatic terms, polar solvation energy terms, van der waals terms, nonpolar solvation energy, and entropy of interaction; performing multi-element linear fitting on the function of the free energy of the combination of the protein and the medicine by using each energy item of each protein and medicine compound in a training set and the experimental value of each protein and medicine to obtain the function of the free energy of the combination of the protein and the medicine;
a binding free energy calculation module configured to input an electrostatic term, a polar solvation energy term, a van der waals term, a non-polar solvation energy and an interaction entropy of the protein-drug complex to be tested into a function of trained binding free energies of the protein and the drug, and output the binding free energies of the protein and the drug;
the specific steps for constructing the function of the free energy of the combination of the protein and the drug are as follows:
PBSA_IE=a(ΔEele+ΔGpb)+bΔEvdw+cΔGnp+dIE+e
wherein PBSA _ IE represents the free energy of the protein combined with the drug; Δ GpbRepresents a polar solvation energy term; delta EvdwRepresents the van der waals energy term of protein and drug; Δ GnpRepresents a non-polar solvation energy term; IE represents the entropy of interaction of the protein with the drug complex; a. b, c, d and e respectively represent parameters to be fitted;
each system in the training set adopts an AMBER program package to perform MD simulation, and the specific simulation mode is as follows:
(140) optimizing the energy of a protein and drug composite system: minimizing the energy of the protein-drug compound by a steepest descent method, and then minimizing by a conjugate gradient until the energy of the system reaches convergence;
(150) heating the composite system to normal temperature: a restrictive MD simulation was performed at 300ps with a constraint constant of
Figure FDA0002981866380000041
Figure FDA0002981866380000042
The temperature of each system was gradually raised from 0K to 300K, the temperature was adjusted using Langew dynamics, and the collision frequency was 1.0ps-1All bonds related to hydrogen atoms are constrained by a SHAKE algorithm, and the simulation step length is 2 fs;
(160) composite system non-limiting MD simulation: non-limiting MD simulations with a 2ns step size of 2fs were performed on protein-drug systems: the first 1ns recorded a trace every 2000 steps, which contained information on the position of all atoms in the water molecules of the complex, the counterion and the water box, the purpose of this MD simulation being to move the complex to a relatively equilibrium conformation; recording the track every 5 steps in the last 1ns, wherein the track only contains the position information of the compound, and each energy item combining free energy is obtained by sampling the track;
the polar solvation energy term is obtained by solving a Poisson-Boltzmann (PB) equation through an implicit solvent model:
Figure FDA0002981866380000043
wherein ε (r) is the dielectric constant of the molecule when r falls inside or outside the surface of the molecule; k (r) represents the Debye-Huckel constant in the solvent region; ρ (r) represents the charge distribution function of the solute molecules; Φ (r) is the potential distribution at r;
the non-polar solvation energy term: it is given by the solvent accessible surface area SASA:
ΔGnp=γSASA+β
wherein the empirical constants γ and β are set to
Figure FDA0002981866380000044
And 0.92 kcal/mol;
entropy of interaction of the protein with the drug complex:
Figure FDA0002981866380000051
wherein K is Boltzmann constant, T is temperature,
Figure FDA0002981866380000052
as the interaction energy of the protein with the drug, it can be evaluated by averaging MD simulations:
Figure FDA0002981866380000053
then
Figure FDA0002981866380000054
Where N is the total number of acquired constellations, i is the ith frame constellation, β is a constant, tiIs the time of the i-th frame,<.>a mathematical symbol representing an averaging;
Figure FDA0002981866380000055
is the amplitude of the fluctuation of the protein interaction energy with the drug around the mean energy.
5. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 3.
6. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 3.
CN201910167720.XA 2019-03-06 2019-03-06 Method, system, apparatus and medium for calculating protein and drug binding free energy Active CN110047559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910167720.XA CN110047559B (en) 2019-03-06 2019-03-06 Method, system, apparatus and medium for calculating protein and drug binding free energy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910167720.XA CN110047559B (en) 2019-03-06 2019-03-06 Method, system, apparatus and medium for calculating protein and drug binding free energy

Publications (2)

Publication Number Publication Date
CN110047559A CN110047559A (en) 2019-07-23
CN110047559B true CN110047559B (en) 2021-06-25

Family

ID=67274637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910167720.XA Active CN110047559B (en) 2019-03-06 2019-03-06 Method, system, apparatus and medium for calculating protein and drug binding free energy

Country Status (1)

Country Link
CN (1) CN110047559B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161810B (en) * 2019-12-31 2022-03-22 中山大学 Free energy perturbation method based on constraint probability distribution function optimization
CN111341391B (en) * 2020-02-25 2023-12-01 深圳晶泰科技有限公司 Free energy perturbation calculation scheduling method for heterogeneous cluster environment
CN112216350B (en) * 2020-11-05 2022-09-13 深圳晶泰科技有限公司 Physical strict relative free energy calculation method with phase space overlapping maximization
WO2022094870A1 (en) * 2020-11-05 2022-05-12 深圳晶泰科技有限公司 Relative free energy calculation method which is physically rigorous and which maximizes phase space overlap
WO2023123288A1 (en) * 2021-12-30 2023-07-06 深圳晶泰科技有限公司 Method and apparatus for determining contribution to relative binding free energy, and storage medium
CN114360663B (en) * 2021-12-30 2024-07-02 深圳晶泰科技有限公司 Method, device and storage medium for determining relative binding free energy contribution
WO2023123396A1 (en) * 2021-12-31 2023-07-06 深圳晶泰科技有限公司 Enhanced sampling method, and method for calculating binding free energy of complex

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2865627A1 (en) * 2012-03-16 2013-09-19 Robert J. Woods Glycomimetics to inhibit pathogen-host interactions
CN103333227A (en) * 2013-06-07 2013-10-02 东南大学 Metastatic tumor deletion protein small-molecule cyclopeptide inhibitor as well as preparation method and application thereof
CN106220707A (en) * 2016-08-05 2016-12-14 孙非 A kind of method for designing of antibody affinity ligand

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2526648A1 (en) * 2003-05-22 2004-12-29 Richard H. Lathrop Method for producing a synthetic gene or other dna sequence
CN107862173B (en) * 2017-11-15 2021-04-27 南京邮电大学 Virtual screening method and device for lead compound

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2865627A1 (en) * 2012-03-16 2013-09-19 Robert J. Woods Glycomimetics to inhibit pathogen-host interactions
CN103333227A (en) * 2013-06-07 2013-10-02 东南大学 Metastatic tumor deletion protein small-molecule cyclopeptide inhibitor as well as preparation method and application thereof
CN106220707A (en) * 2016-08-05 2016-12-14 孙非 A kind of method for designing of antibody affinity ligand

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JAK2蛋白热点氨基酸与配体相互作用理论计算预测及新方法的应用;周一凡;《中国优秀硕士学位论文全文数据库 工程科技I辑》;20190115;第14页 *
蛋白配体结合自由能精确理论计算新方法研究;刘笑;《中国博士学位论文全文数据库 工程科技I辑》;20190115;第20-61页 *

Also Published As

Publication number Publication date
CN110047559A (en) 2019-07-23

Similar Documents

Publication Publication Date Title
CN110047559B (en) Method, system, apparatus and medium for calculating protein and drug binding free energy
Bleiziffer et al. Machine learning of partial charges derived from high-quality quantum-mechanical calculations
Raman et al. Automated, accurate, and scalable relative protein–ligand binding free-energy calculations using lambda dynamics
Wu et al. Linear-scaling time-dependent density functional theory based on the idea of “from fragments to molecule”
Pracht et al. Comprehensive assessment of GFN tight-binding and composite density functional theory methods for calculating gas-phase infrared spectra
Laury et al. Absolute binding free energies for the SAMPL6 cucurbit [8] uril host–guest challenge via the AMOEBA polarizable force field
Camilloni et al. Assessment of the use of NMR chemical shifts as replica-averaged structural restraints in molecular dynamics simulations to characterize the dynamics of proteins
Alipanahi et al. Determining protein structures from NOESY distance constraints by semidefinite programming
Morales et al. First principles methods: A perspective from quantum Monte Carlo
Yang et al. Valence state parameters of all transition metal atoms in metalloproteins—development of ABEEMσπ fluctuating charge force field
Naden et al. Linear basis function approach to efficient alchemical free energy calculations. 1. Removal of uncharged atomic sites
Constantin et al. Modified fourth-order kinetic energy gradient expansion with Hartree potential-dependent coefficients
Price et al. Conformational complexity of succinic acid and its monoanion in the gas phase and in solution: Ab initio calculations and Monte Carlo simulations
Hill et al. Scoring functions for AutoDock
Mecklenfeld et al. Comparison of RESP and IPolQ-mod partial charges for solvation free energy calculations of various solute/solvent pairs
Boothroyd et al. Open force field evaluator: An automated, efficient, and scalable framework for the estimation of physical properties from molecular simulation
Hu et al. The importance of protonation and tautomerization in relative binding affinity prediction: a comparison of AMBER TI and Schrödinger FEP
Raczyńska Application of the extended HOMED (harmonic oscillator model of aromaticity) index to simple and tautomeric five-membered heteroaromatic cycles with C, N, O, P, and S atoms
Tsai et al. Validation of Free Energy Methods in AMBER
Kantonen et al. Data-driven mapping of gas-phase quantum calculations to general force field Lennard-Jones parameters
Shiozaki et al. Hyperfine coupling constants from internally contracted multireference perturbation theory
Kirchner et al. Predicting vibrational spectroscopy for flexible molecules and molecules with non‐idle environments
Caballero-García et al. Calculation of VS, max and Its Use as a Descriptor for the Theoretical Calculation of p K a Values for Carboxylic Acids
Seidu et al. Applications of time-dependent and time-independent density functional theory to Rydberg transitions
Gong et al. Temperature transferability of force field parameters for dispersion interactions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Duan Lili

Inventor after: Huang Kaifang

Inventor after: Cong Yalong

Inventor after: Li Hao

Inventor after: Zhong Susu

Inventor after: Dong Shuheng

Inventor before: Duan Lili

Inventor before: Huang Kaifang

Inventor before: Cong Yalong

Inventor before: Li Hao

Inventor before: Zhong Susu

Inventor before: Dong Shuheng

TR01 Transfer of patent right

Effective date of registration: 20220907

Address after: Room 2006, Building 1, Lushang Phoenix Plaza, Tangye Street, Jinan Area, China (Shandong) Free Trade Pilot Zone, Jinan City, Shandong Province, 250101

Patentee after: Shandong Zhibo Linghang Technology Innovation Co.,Ltd.

Address before: No.1 Daxue Road, University Science Park, Changqing District, Jinan City, Shandong Province

Patentee before: SHANDONG NORMAL University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220916

Address after: A2-304, Dawangshan Civil Affairs Apartment, Nanshan District, Shenzhen, Guangdong 518000

Patentee after: Yi Jihui

Address before: Room 2006, Building 1, Lushang Phoenix Plaza, Tangye Street, Jinan Area, China (Shandong) Free Trade Pilot Zone, Jinan City, Shandong Province, 250101

Patentee before: Shandong Zhibo Linghang Technology Innovation Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230728

Address after: D1101, Building 4, Software Industry Base, No. 19, 17, 18, Haitian 1st Road, Binhai Community, Yuehai Street, Nanshan District, Shenzhen, Guangdong, 518000

Patentee after: Shenzhen Xinrui Gene Technology Co.,Ltd.

Address before: A2-304, Dawangshan Civil Affairs Apartment, Nanshan District, Shenzhen, Guangdong 518000

Patentee before: Yi Jihui

TR01 Transfer of patent right