CN114121148B

CN114121148B - A method for calculating protein-ligand binding free energy based on cluster model

Info

Publication number: CN114121148B
Application number: CN202111333136.0A
Authority: CN
Inventors: 丁泓铭; 陈远强
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2023-01-06
Anticipated expiration: 2041-11-11
Also published as: CN114121148A

Abstract

Compared with other methods for calculating the protein-ligand binding free energy, the GFN2-xTB method is easier to use, only initial coordinates and element composition are needed to form an input structure, and the combined application of the GFN2-xTB method and the cluster model can effectively reduce the calculation time cost on the premise of ensuring relatively good accuracy. This new method based on cluster models and GFN2-xTB should have great potential in future biomacromolecule-related binding free energy calculations.

Description

A method for calculating protein-ligand binding free energy based on cluster model

技术领域technical field

本发明涉及一种基于簇模型计算蛋白-配体结合自由能的方法，属于计算生物学技术领域。The invention relates to a method for calculating protein-ligand binding free energy based on a cluster model, which belongs to the technical field of computational biology.

背景技术Background technique

蛋白质-配体相互作用在许多生化过程和生物医学应用中发挥着重要作用(例如，免疫反应，信号转导，药物设计)。研究蛋白质和配体相互作用的亲和力(即结合自由能),不仅有助于了解蛋白质的生物学功能，而且对药物研发以及药物作用机制的研究具有十分重要的意义。例如，当前严重急性呼吸系统综合症冠状病毒2(SARS-CoV-2)病毒引起COVID-19大流行，其主要蛋白酶和刺突蛋白都可以作为一个很好的药物靶点。准确计算SARS-CoV-2蛋白酶或刺突蛋白与相应配体的结合自由能，有望筛选出高亲和力的配体以及抑制剂，从而对其进行有效的治疗。但蛋白-配体结合是一个复杂的过程，包含了上千乃至上万个原子，且诸多因素都会影响结合自由能的计算结果。因而如何精确、高效地计算蛋白-配体结合自由能是目前计算生物学领域极为重要的课题。Protein-ligand interactions play an important role in many biochemical processes and biomedical applications (eg, immune response, signal transduction, drug design). The study of the affinity (binding free energy) between proteins and ligands not only helps to understand the biological functions of proteins, but also has very important significance for drug development and drug mechanism research. For example, the major protease and spike protein of the current severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus causing the COVID-19 pandemic could serve as a good drug target. Accurate calculation of the binding free energy between SARS-CoV-2 protease or spike protein and the corresponding ligand is expected to screen out high-affinity ligands and inhibitors for effective treatment. However, protein-ligand binding is a complex process, involving thousands or even tens of thousands of atoms, and many factors will affect the calculation results of binding free energy. Therefore, how to accurately and efficiently calculate protein-ligand binding free energy is an extremely important topic in the field of computational biology.

在过去的几十年中，为了精确计算蛋白质-配体的结合自由能，已经提出和开发了多种方法，主要是基于分子力学的方法，如自由能微扰(FEP)方法以及分子力学/泊松玻尔兹曼表面积(MM/PBSA)和分子力学/广义波尔兹曼表面积(MM/GBSA)方法。自由能微扰(FEP)是一种应用广泛的炼金术自由能方法。在过去几年中，由于在计算能力、经典力场精度、采样方法和模拟设置方面的多项改进，使FEP可以准确可靠的应用于蛋白质-配体结合自由能的计算。详细地说，FEP方法依赖于对两种相似化合物之间能量差异的原子分析，并通过炼金术途径在化合物之间进行转换。这是一套严格的统计力学方法，需要通过缓慢改变势能来计算炼金术过程中的自由能变化。MM/PBSA和MM/GBSA是一种通过结合分子力学计算和连续溶剂化模型来计算大分子结合自由能的方法，这是一种被广泛用于评估对接位点、确定结构稳定性、预测结合亲和力的终点自由能方法。该方法首先在显示溶剂模型下进行蛋白质-配体复合物的MD模拟；之后再从每个MD快照中删除所有溶剂分子和带电离子，并使用隐式PBSA或GBSA溶剂模型来评估溶剂化能；最后从一组选定的快照中计算出溶质构象熵变化，并将这些单独的能量分量相加从而获得最终的结合自由能。In the past few decades, in order to accurately calculate the free energy of protein-ligand binding, various methods have been proposed and developed, mainly based on molecular mechanics, such as free energy perturbation (FEP) method and molecular mechanics/ Poisson Boltzmann Surface Area (MM/PBSA) and Molecular Mechanics/Generalized Boltzmann Surface Area (MM/GBSA) methods. Free energy perturbation (FEP) is a widely used alchemical free energy method. Over the past few years, thanks to several improvements in computing power, classical force field accuracy, sampling methods, and simulation settings, FEP can be applied accurately and reliably to the calculation of protein-ligand binding free energies. In detail, the FEP method relies on the atomic analysis of the energy difference between two similar compounds and transforms between compounds through alchemical pathways. This is a rigorous set of statistical mechanics methods that require the calculation of free energy changes in the alchemical process by slowly changing the potential energy. MM/PBSA and MM/GBSA are methods for calculating the free energy of macromolecular binding by combining molecular mechanics calculations and continuum solvation models, which are widely used to evaluate docking sites, determine structural stability, predict binding End-point free energy method for affinity. The method first performs MD simulations of protein-ligand complexes under the explicit solvent model; then removes all solvent molecules and charged ions from each MD snapshot, and uses implicit PBSA or GBSA solvent models to estimate the solvation energy; Finally, the solute conformational entropy change is calculated from a selected set of snapshots, and these individual energy components are summed to obtain the final binding free energy.

近期，除了分子力学的方法，“几何、频率、非共价、扩展紧密结合”的半经验量子力学方法GFN-xTB也得到广泛的关注和应用。该方法是由Grimme等人在Kohn–Sham密度泛函理论(DFT)的基础上提出的一种扩展的半经验紧束缚模型。相较于其它传统的半经验量子力学方法，GFN2-XTB方法计算效率高且精确，而且其参数化元素涵盖了元素周期表的大部分直至Z＝86。GFN2-XTB方法的提出使得可以对一千多个原子的大分子体系进行结构优化、振动频率和非共价相互作用的计算研究。目前，GFN2-xTB方法已经被应用于一些化学大分子的结构优化筛选与相互作用能的计算，并且都表现出了良好的性能，因而有望推广应用到蛋白质-配体体系的结合自由能计算中。Recently, in addition to the method of molecular mechanics, the semi-empirical quantum mechanical method GFN-xTB of "geometry, frequency, non-covalent, extended tight combination" has also received extensive attention and application. This method is an extended semi-empirical tight-binding model proposed by Grimme et al. on the basis of Kohn–Sham density functional theory (DFT). Compared with other traditional semi-empirical quantum mechanics methods, the GFN2-XTB method has high computational efficiency and accuracy, and its parameterized elements cover most of the periodic table up to Z=86. The introduction of the GFN2-XTB method enables computational studies of structure optimization, vibrational frequency and non-covalent interactions for macromolecular systems with more than one thousand atoms. At present, the GFN2-xTB method has been applied to the structure optimization screening and interaction energy calculation of some chemical macromolecules, and has shown good performance, so it is expected to be extended and applied to the binding free energy calculation of protein-ligand systems .

目前虽然有着不同的方法对蛋白质-配体结合自由能进行研究预测，但这些方法都存在着一些相应的不足与缺陷。首先利用FEP和MM/PBSA(MM/GBSA)等方法计算蛋白质-配体结合自由能通常比较繁琐，需要找到相应的力场参数，模拟设置与数据分析也较为复杂；其次FEP方法估算蛋白质-配体结合能的准确性很高，误差大约在2-3kcal/mol，但是其极为耗时并且收敛困难。相比较而言，MM/PBSA(MM/GBSA)方法所需的计算成本要低得多，但其准确性一般，误差一般在10kcal/mol左右，在高电荷系统中，误差甚至大于20kcal/mol。Although there are different methods for researching and predicting the free energy of protein-ligand binding, these methods all have some corresponding deficiencies and defects. Firstly, the calculation of protein-ligand binding free energy using FEP and MM/PBSA (MM/GBSA) methods is usually cumbersome, and the corresponding force field parameters need to be found, and the simulation settings and data analysis are also relatively complicated; secondly, the FEP method to estimate the protein-ligand The accuracy of bulk binding energy is very high, the error is about 2-3kcal/mol, but it is extremely time-consuming and difficult to converge. In comparison, the calculation cost required by the MM/PBSA (MM/GBSA) method is much lower, but its accuracy is average, and the error is generally around 10kcal/mol. In a high-charge system, the error is even greater than 20kcal/mol .

另外，半经验紧束缚GFN2-xTB方法在计算大分子相互作用方面展示了一定的潜力，但是对于大的蛋白质-配体体系由于其计算量太大，GFN2-xTB方法计算频率极为耗时，目前还无法直接使用。In addition, the semi-empirical tight-binding GFN2-xTB method has shown certain potential in calculating macromolecular interactions, but for large protein-ligand systems, the calculation frequency of the GFN2-xTB method is extremely time-consuming due to the large amount of calculation. Not yet available directly.

发明内容Contents of the invention

为解决上述技术问题，本发明提出簇模型结合GFN2-xTB方法计算蛋白质-配体的结合自由能，相对其它计算蛋白质-配体结合自由能的方法，GFN2-xTB方法更易于使用，只需要初始坐标和元素组成即可构成输入结构，而与簇模型的联合应用，可以在保持良好的计算精度的前提下，有效降低计算时所消耗的时长。In order to solve the above-mentioned technical problems, the present invention proposes a cluster model combined with GFN2-xTB method to calculate the binding free energy of protein-ligand. Compared with other methods for calculating protein-ligand binding free energy, GFN2-xTB method is easier to use and only needs initial The input structure can be formed by the composition of coordinates and elements, and the joint application with the cluster model can effectively reduce the time consumed for calculation while maintaining good calculation accuracy.

本发明的第一个目的是提供一种基于簇模型计算蛋白-配体结合自由能的方法，包括如下步骤：The first object of the present invention is to provide a method for calculating protein-ligand binding free energy based on the cluster model, comprising the following steps:

S1、分别获取蛋白质和配体结构的重原子信息；S1. Obtain the heavy atom information of the protein and ligand structures respectively;

S2、将获取的蛋白质和配体结构的重原子信息在三维分子模型软件中打开，并截取截断距离在预设范围内的蛋白质残基，并保存截取好的蛋白质-配体结构；S2. Open the heavy atom information of the obtained protein and ligand structure in the three-dimensional molecular model software, and intercept the protein residues whose cut-off distance is within the preset range, and save the intercepted protein-ligand structure;

S3、对截取的蛋白质-配体结构进行加氢饱和，得到加氢饱和后的簇模型；S3. Perform hydrogenation saturation on the intercepted protein-ligand structure to obtain a cluster model after hydrogenation saturation;

S4、将加氢饱和后的簇模型输入xtb程序中，对加氢饱和后的簇模型中截断蛋白质的重原子设置一个强约束，并设置蛋白质-配体相互作用的环境模型；S4. Input the cluster model after hydrogenation saturation into the xtb program, set a strong constraint on the heavy atoms of the truncated protein in the cluster model after hydrogenation saturation, and set the environmental model of protein-ligand interaction;

S5、将带有强约束的簇模型分为簇模型整体、截断的蛋白质残基和配体三种结构，并使用xtb程序分别对三种结构进行Hessian计算，从输出文件中读取簇模型整体的自由能G_com、截断的蛋白质残基的自由能G_pro和配体的自由能G_lig，由上述三种自由能计算得到蛋白-配体的结合自由能ΔG。S5. Divide the cluster model with strong constraints into three structures: the overall cluster model, truncated protein residues, and ligands, and use the xtb program to perform Hessian calculations on the three structures, and read the overall cluster model from the output file The free energy of G _com , the free energy of truncated protein residues G _pro and the free energy of ligand G _lig , and the protein-ligand binding free energy ΔG was calculated from the above three free energies.

进一步地，蛋白-配体的结合自由能ΔG按照蛋白质和配体结合自由能公式：ΔG＝G_com-G_pro-G_lig进行计算；Further, the protein-ligand binding free energy ΔG is calculated according to the protein and ligand binding free energy formula: ΔG=G _com -G _pro -G _lig ;

其中，ΔG为蛋白-配体的结合自由能，Among them, ΔG is the binding free energy of protein-ligand,

G_com为簇模型整体的自由能，G _com is the free energy of the cluster model as a whole,

G_pro为截断的蛋白质残基的自由能，G _pro is the free energy of the truncated protein residue,

G_lig为配体的自由能。G _lig is the free energy of the ligand.

进一步地，所述的截断距离的预设范围为

范围。Further, the preset range of the cut-off distance is

scope.

进一步地，在本发明中，使用xtb程序进行Hessian计算时，采用的是GFN2-xTB方法，具体计算过程参见C.Bannwarth.；S.Ehlert,et al.J.Chem.Theory Comput.2019,15,1652-1671.和C.Bannwarth.；E.Caldeweyher,et al.Wiley Interdiscip.Rev.:Comput.Mol.Sci.2021,11,e1493.Further, in the present invention, when using the xtb program to perform Hessian calculations, the GFN2-xTB method is used. For the specific calculation process, see C.Bannwarth.; S.Ehlert, et al.J.Chem.Theory Comput.2019, 15 , 1652-1671. and C. Bannwarth.; E. Caldeweyher, et al. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2021, 11, e1493.

进一步地，所述的三维分子模型软件为PyMOL、VMD或GaussView中的一种或两种。Further, the three-dimensional molecular modeling software is one or both of PyMOL, VMD or GaussView.

其中，VMD和PyMOL可以用来截取蛋白质，GaussView和PyMOL可以对结构进行加氢，因为PyMOL可以单独完成这两个步骤，并且生成的PDB结构包括蛋白质残基的电荷信息，优选使用PyMOL软件处理。Among them, VMD and PyMOL can be used to intercept proteins, and GaussView and PyMOL can hydrogenate the structure, because PyMOL can complete these two steps alone, and the generated PDB structure includes the charge information of protein residues, preferably using PyMOL software for processing.

进一步地，所述的强约束的力常数选择为0.5～1.0Hartree/Bohr²。Further, the force constant of the strong constraint is chosen to be 0.5-1.0 Hartree/Bohr ² .

进一步地，所述的环境模型为隐式水溶剂模型、真空模型、甲醇溶剂模型、甲苯溶剂模型或乙烷模型。Further, the environment model is an implicit water solvent model, a vacuum model, a methanol solvent model, a toluene solvent model or an ethane model.

进一步地，S1步骤中，蛋白质和配体结构的重原子信息从PDB-BIND网站(http://www.pdbbind-cn.org/)以及蛋白质数据库网站(http://www.rcsb.org/pdb/)得到。对于未知的蛋白质-配体，需要通过分子对接方法先得出可能的结合位点，再按照此方法进行计算，得到结合自由能。Further, in step S1, the heavy atom information of the protein and ligand structure is obtained from the PDB-BIND website (http://www.pdbbind-cn.org/) and the protein database website (http://www.rcsb.org/ pdb/) get. For unknown protein-ligands, it is necessary to obtain possible binding sites by molecular docking method, and then calculate according to this method to obtain binding free energy.

进一步地，S3步骤中，加氢饱和采用三维分子模型软件进行。Further, in step S3, the hydrogenation saturation is performed using three-dimensional molecular modeling software.

本发明的有益效果是：The beneficial effects of the present invention are:

本发明相对其它计算蛋白质-配体结合自由能的方法，GFN2-xTB方法更易于使用，只需要初始坐标和元素组成即可构成输入结构，而与簇模型的联合应用，在保证了相对较好的准确性前提下，更有效的降低了计算的时间成本。这种基于簇模型和GFN2-xTB的新方法应该在未来的生物大分子相关的结合自由能计算上有很大的潜力。Compared with other methods for calculating protein-ligand binding free energy, the GFN2-xTB method of the present invention is easier to use, only needs initial coordinates and element composition to form the input structure, and the joint application with the cluster model ensures a relatively good Under the premise of high accuracy, the calculation time cost is more effectively reduced. This new approach based on the cluster model and GFN2-xTB should have great potential for future calculations of binding free energies related to biomacromolecules.

附图说明：Description of drawings:

图1为计算流程图；Figure 1 is a calculation flow chart;

图2为不同截断距离下的簇模型(PDB:1A28)；Figure 2 is the cluster model (PDB: 1A28) under different cut-off distances;

图3为GFN2-xTB方法计算出的结合自由能与截断距离的依赖关系；Figure 3 is the dependence of binding free energy and cut-off distance calculated by GFN2-xTB method;

图4为使用GFN2-xTB方法结合簇模型计算的蛋白质配体结合自由能的MAE值以及计算时长；Figure 4 is the MAE value and calculation time of the protein ligand binding free energy calculated using the GFN2-xTB method binding cluster model;

图5为输出文件相应能量项。Figure 5 shows the corresponding energy items in the output file.

具体实施方式detailed description

下面结合具体实施例对本发明作进一步说明，以使本领域的技术人员可以更好地理解本发明并能予以实施，但所举实施例不作为对本发明的限定。The present invention will be further described below in conjunction with specific examples, so that those skilled in the art can better understand the present invention and implement it, but the given examples are not intended to limit the present invention.

为了准确有效的运用GFN2-xTB计算蛋白质-配体的结合自由能，我们采用了簇模型。图1展示了具体的计算流程，详细地说，首先从PDB-BIND网站(http://www.pdbbind-cn.org/)选定需要计算的蛋白质-配体复合物，再从蛋白质数据库网站(http://www.rcsb.org/pdb/)下载相应的PDB结构，其次，我们从PDB文件中提取蛋白质和配体结构的重原子，并在PyMOL程序包中打开从而截取簇模型，即蛋白质在与配体不同的截断距离上被截断；同时需要在PyMOL软件中加入氢原子使所选残基饱和。值得注意的是，在这里我们把残基作为一个截断的对象，即所选残基中的所有原子都被保留了下来，这有利于确定被截断的蛋白质的总电荷。这将生成xtb(版本6.3.3)程序所需的输入结构，xtb程序包可以在https://github.com/grimme-lab/xtb网站免费获取。In order to accurately and effectively use GFN2-xTB to calculate the binding free energy of protein-ligand, we adopted the cluster model. Figure 1 shows the specific calculation process. In detail, first select the protein-ligand complex to be calculated from the PDB-BIND website (http://www.pdbbind-cn.org/), and then select the protein-ligand complex from the protein database website (http://www.rcsb.org/pdb/) to download the corresponding PDB structure. Secondly, we extract the heavy atoms of the protein and ligand structure from the PDB file, and open it in the PyMOL package to intercept the cluster model, namely The protein is truncated at a different cutoff distance than the ligand; meanwhile, addition of hydrogen atoms in the PyMOL software is required to saturate the selected residues. It is worth noting that here we treat the residue as a truncated object, i.e. all atoms in the selected residue are preserved, which is beneficial to determine the total charge of the truncated protein. This will generate the input structure required by the xtb (version 6.3.3) program, the xtb package is freely available at https://github.com/grimme-lab/xtb.

在获得所需的输入结构后，我们使用xtb程序的GFN2-xTB方法对初始结构进行初步优化，为了计算蛋白质-配体相互作用在水相中的结合自由能，我们进一步将优化后的结构分为一个复合物和两个单体，并分别进行了GFN2-xTB的Hessian计算。在以上两步中，我们采用隐式水溶剂模型计算，即采用具有表面积(SA)贡献的广义Born(GB)模型(GBSA(H₂O))。此外，由于我们使用的是簇模型，因此我们对截断蛋白质的重原子设置了一个强约束(力常数选择为1.0Hartree/Bohr²)。After obtaining the desired input structure, we used the GFN2-xTB method of the xtb program to perform a preliminary optimization of the initial structure. In order to calculate the binding free energy of the protein-ligand interaction in the aqueous phase, we further divided the optimized structure into For a complex and two monomers, the Hessian calculations of GFN2-xTB were performed respectively. In the above two steps, we calculated using an implicit water-solvent model, that is, a generalized Born(GB) model (GBSA( _H2O )) with a surface area (SA) contribution. Furthermore, since we are using a cluster model, we placed a strong constraint on the heavy atoms of the truncated protein (the force constant was chosen to be 1.0 Hartree/Bohr ² ).

本发明实施例中是基于PDB中已有的坐标确定的蛋白质和配体的结合自由能计算方法，是为了验证方法的精确性。对于未知的蛋白质-配体，需要通过分子对接方法先得出可能的结合位点，再按照此方法进行计算，得到结合自由能。In the embodiment of the present invention, the method for calculating the binding free energy of proteins and ligands determined based on the existing coordinates in the PDB is to verify the accuracy of the method. For unknown protein-ligands, it is necessary to obtain possible binding sites by molecular docking method, and then calculate according to this method to obtain binding free energy.

实施例1：最优簇模型方案Example 1: Optimal Cluster Model Scheme

本实施例以1A28蛋白为例，展示了其在不同截断距离下的簇模型，如图2所示。当截断值为

时，蛋白中只有一个残基，这明显不够。随着截距的增大，残基/原子的数量越来越多，配体可以被残基包围。截断为

时，原子总数接近1500个，接近GFN2-xTB方法的上限。为确定截断值的准确可靠，我们选定了八个蛋白配体体系并使用GFN2-xTB方法计算了不同截断距离下的结合自由能，并评估计算预测的结合自由能与截断值的依赖关系。如图3所示，发现计算得到的结合自由能随着截断距离的增加而趋于平稳。当截止距离大于

时，计算出的结合自由能变化很小。在这可以判断截断距离的预设范围为

In this example, the 1A28 protein is taken as an example to show its cluster models at different cutoff distances, as shown in FIG. 2 . When the cutoff value is

, there is only one residue in the protein, which is obviously not enough. As the intercept increases, the number of residues/atoms increases and the ligand can be surrounded by residues. truncated to

, the total number of atoms is close to 1500, which is close to the upper limit of the GFN2-xTB method. In order to determine the accuracy and reliability of the cutoff value, we selected eight protein ligand systems and used the GFN2-xTB method to calculate the binding free energy at different cutoff distances, and evaluated the dependence of the predicted binding free energy on the cutoff value. As shown in Fig. 3, it was found that the calculated binding free energies leveled off as the cut-off distance increased. When the cut-off distance is greater than

, the calculated binding free energy changes very little. Here you can judge the preset range of the cut-off distance as

实施例2：Example 2:

本实施例使用不同截断值的簇模型结合GFN2-xTB方法按上述流程计算了25组蛋白质-配体复合物的结合自由能，进一步确定簇模型的截断范围。In this example, cluster models with different cutoff values combined with the GFN2-xTB method were used to calculate the binding free energies of 25 groups of protein-ligand complexes according to the above process, and further determine the cutoff range of the cluster model.

如图4a所示，我们发现当截断距离为

或

时，使用GFN2-xTB方法计算蛋白质-配体结合自由能的MAE为10.4kcal/mol和6.1kcal/mol，这两种方法的误差与前面所提及的MM/PBSA(MM/GBSA)误差相当，而且其还存在未将与配体相互作用的残基截取的问题，因此该截断值不适用于簇模型。而当截断值为

时，使用GFN2-xTB方法计算蛋白质-配体结合自由能的MAE值大多数都小于10.0kcal/mol，平均MAE值约为5.0kcal/mol，接近于FEP方法，具有相对较好的准确性。同时在图4b中给出了计算蛋白质-配体簇模型的所耗时长(消耗时长包括初始结构优化以及单体和复合物的频率计算，在Intel XeonPlatinum9242@2.3GHz集群上使用了12核心)，我们发现当截断值为

时，平均计算时间约为26h，时间成本较高，而截断值为

时，计算成本适中。因此，在簇模型结合GFN2-xTB方法计算蛋白质-配体结合自由能中，截断值为

都是较好的选择。此外考虑到使用截断值为

的簇模型结合GFN2-xTB方法计算时，大部分蛋白质-配体的簇模型的计算时间低于两小时，平均计算时间为4574s，可有效的降低了计算的时间成本，同时在个别情况下，能够更完全可靠的将与配体相互作用的蛋白质残基截取。因而我们认为截断值为

是一个更安全有效的选择。As shown in Figure 4a, we find that when the cutoff distance is

or

When using the GFN2-xTB method to calculate the protein-ligand binding free energy MAE is 10.4kcal/mol and 6.1kcal/mol, the errors of these two methods are comparable to the MM/PBSA (MM/GBSA) error mentioned above , and it also has the problem of not truncating residues that interact with the ligand, so this cutoff value is not suitable for cluster models. And when the cutoff value is

When using GFN2-xTB method to calculate the MAE value of protein-ligand binding free energy, most of them are less than 10.0kcal/mol, and the average MAE value is about 5.0kcal/mol, which is close to the FEP method and has relatively good accuracy. At the same time, the time taken to calculate the protein-ligand cluster model is given in Figure 4b (the time spent includes initial structure optimization and frequency calculation of monomers and complexes, using 12 cores on the Intel Xeon Platinum9242@2.3GHz cluster), We found that when the cut-off value

When , the average calculation time is about 26h, the time cost is high, and the cut-off value is

, the computational cost is moderate. Therefore, in the cluster model combined with GFN2-xTB method to calculate the protein-ligand binding free energy, the cutoff value is

are better choices. Also consider using a cutoff value of

When the cluster model combined with the GFN2-xTB method is calculated, the calculation time of most protein-ligand cluster models is less than two hours, and the average calculation time is 4574s, which can effectively reduce the calculation time cost. At the same time, in some cases, It can more completely and reliably intercept the protein residues that interact with the ligand. Therefore, we consider the cut-off value to be

is a safer and more effective option.

实施例3：Example 3:

以蛋白-配体复合物1AU0为例，按照本发明方法计算结合自由能，并与网站上公布的信息计算得到的实验结合自由能值进行比较。Taking the protein-ligand complex 1AU0 as an example, the binding free energy is calculated according to the method of the present invention, and compared with the experimental binding free energy calculated from the information published on the website.

(1)首先从PDB-BIND网站上选择蛋白-配体复合物1AU0，页面内该复合物的基本信息都已经给出，包括蛋白质名字、相应的配体(SDK)以及pK_d值(7.66)等。我们可以通过pK_d＝-logK_d和ΔG＝RTlnK_d求出其相应的实验结合自由能值。(1) First select the protein-ligand complex 1AU0 from the PDB-BIND website. The basic information of the complex has been given on the page, including the protein name, corresponding ligand (SDK) and pK _d value (7.66) Wait. We can obtain the corresponding experimental binding free energy value through pK _d =-logK _d and ΔG = RTlnK _d .

(2)得到蛋白1AU0相关信息后，我们再从蛋白质数据库网站搜索下载相应的PDB结构，并且需要从PDB文件中提取蛋白质和配体结构的重原子。如果不做处理，直接用PyMOL加氢饱和，将会有3329个原子，使用GFN2-xTB直接进行优化和Hessian计算，时间成本太高。因而我们使用簇模型结合GFN2-xTB的方法计算蛋白质和配体的结合自由能。(2) After obtaining the relevant information of protein 1AU0, we search and download the corresponding PDB structure from the protein database website, and need to extract the heavy atoms of the protein and ligand structure from the PDB file. If you don’t do any processing and directly use PyMOL to hydrogenate and saturate, there will be 3329 atoms. Using GFN2-xTB to directly optimize and calculate Hessian, the time cost is too high. Therefore, we used the cluster model combined with GFN2-xTB to calculate the binding free energy of protein and ligand.

(3)在PyMOL中打开仅含有蛋白质和配体重原子的PDB结构，以1AU0的配体(SDK)为截断点，截取其

范围内的蛋白质残基并保存截取好的蛋白质-配体结构。具体操作：在命令行PyMOL＞中输入“select ligand,resn SDK”和“select 6A,byres ligand expand6”。然后再保存处理好的结构：file＞Export Molecule＞selection＞6A＞PDB Options＞Save。(3) Open the PDB structure containing only protein and ligand heavy atoms in PyMOL, take the 1AU0 ligand (SDK) as the cut-off point, and intercept its

range of protein residues and preserve the truncated protein-ligand structure. Specific operation: Enter "select ligand, resn SDK" and "select 6A, byres ligand expand6" in the command line PyMOL>. Then save the processed structure: file>Export Molecule>selection>6A>PDB Options>Save.

(4)同时需要在PyMOL中加氢饱和并保存，此时蛋白-配体复合物簇模型只有489个原子。这也将生成xtb程序所需的输入结构，即加氢饱和后的簇模型。(4) At the same time, it needs to be saturated with hydrogen and stored in PyMOL. At this time, the protein-ligand complex cluster model has only 489 atoms. This will also generate the required input structure for the xtb program, the cluster model after hydrogenation saturation.

(5)获得所需的输入结构之后，我们首先使用xtb程序的GFN2-xTB方法对其进行限制性优化。(5) After obtaining the desired input structure, we first constrainedly optimize it using the GFN2-xTB method of the xtb program.

(6)为了计算蛋白质-配体相互作用在水相中的结合自由能，我们进一步将优化后的结构分为一个复合物(簇模型整体)和两个单体(截断的蛋白质残基和配体)，并分别对其进行了GFN2-xTB的Hessian计算。(6) In order to calculate the binding free energy of protein-ligand interaction in aqueous phase, we further divided the optimized structure into a complex (cluster model overall) and two monomers (truncated protein residues and ligand body), and the Hessian calculation of GFN2-xTB was performed on them respectively.

在第五和第六步中，我们采用隐式水溶剂模型计算，即GBSA(H₂O)。由于我们使用的是簇模型，因此我们额外设置了一个输入文件md.inp，其中可以对截断蛋白质的重原子设置一个强约束(力常数选择为1.0Hartree/Bohr²)。需要注意的是，对配体的Hessian计算不需要读取此输入文件。计算完成后，最后将输出文件(如图5所示)中相应的能量分量按照蛋白质和配体结合自由能公式相加减，即读取计算复合体的相应能量记为G_com(图5a)，两单体(蛋白残基和配体)相应能量记为G_pro(图5b)和G_lig(图5c)，计算相应的ΔG为G_com-G_pro-G_lig。从而获得最终的结合自由能。In the fifth and sixth steps, we calculate using the implicit water solvent model, namely GBSA(H ₂ O). Since we are using a cluster model, we additionally set an input file md.inp in which a strong constraint can be placed on the heavy atoms of the truncated protein (the force constant is chosen to be 1.0 Hartree/Bohr ² ). It should be noted that the Hessian calculation for the ligand does not need to read this input file. After the calculation is completed, add and subtract the corresponding energy components in the output file (as shown in Figure 5) according to the protein and ligand binding free energy formula, that is, read and calculate the corresponding energy of the complex as G _com (Figure 5a) , the corresponding energies of the two monomers (protein residue and ligand) are recorded as G _pro (Fig. 5b) and G _lig (Fig. 5c), and the corresponding ΔG is calculated as G _com -G _pro -G _lig . Thus, the final binding free energy is obtained.

蛋白质和配体结合自由能公式为：The free energy formula for protein and ligand binding is:

ΔG＝G_com-G_pro-G_lig (1)ΔG＝G _com -G _pro -G _lig (1)

其中，G_com为蛋白-配体复合物的自由能，G_pro为蛋白的自由能，G_lig为配体的自由能。Among them, G _com is the free energy of the protein-ligand complex, G _pro is the free energy of the protein, and G _lig is the free energy of the ligand.

蛋白1AU0相应的计算结果在表一中给出。表中不仅给出了我们使用GFN2-xTB方法计算蛋白质-配体簇模型的结合自由能以及所耗时长，而且给出了自由能的其它能量贡献项。可以看出，结合簇模型使用GFN2-xTB方法计算蛋白质-配体结合自由能可以有效的降低时间成本，而且还保持着相对较好的准确性。The corresponding calculation results of protein 1AU0 are given in Table 1. The table not only gives the binding free energy and time-consuming time of the protein-ligand cluster model calculated by the GFN2-xTB method, but also gives other energy contribution items of the free energy. It can be seen that the binding cluster model using the GFN2-xTB method to calculate the protein-ligand binding free energy can effectively reduce the time cost and maintain relatively good accuracy.

表一使用GFN2-XTB方法计算的蛋白1AU0结合自由能结果Table 1 Binding free energy results of protein 1AU0 calculated by GFN2-XTB method

PDBPDB Gcom/EhGcom/Eh Gpro/EhGpro/Eh Glig/EhGlig/Eh ΔG(kcal/mol)ΔG (kcal/mol) 实验ΔG(kcal/mol)Experiment ΔG(kcal/mol) WALL Time/sWALL Time/s 1AU01AU0 -776.618-776.618 -649.127-649.127 -127.480-127.480 -6.90-6.90 -10.45-10.45 49554955

注：1Eh＝627.51kcal/molNote: 1Eh=627.51kcal/mol

以上所述实施例仅是为充分说明本发明而所举的较佳的实施例，本发明的保护范围不限于此。本技术领域的技术人员在本发明基础上所作的等同替代或变换，均在本发明的保护范围之内。本发明的保护范围以权利要求书为准。The above-mentioned embodiments are only preferred embodiments for fully illustrating the present invention, and the protection scope of the present invention is not limited thereto. Equivalent substitutions or transformations made by those skilled in the art on the basis of the present invention are all within the protection scope of the present invention. The protection scope of the present invention shall be determined by the claims.

Claims

1. A method for calculating protein-ligand binding free energy based on a cluster model, comprising the steps of:

s1, respectively obtaining heavy atom information of protein and ligand structures;

s2, opening the obtained heavy atom information of the protein and ligand structures in three-dimensional molecular model software, intercepting protein residues with the intercepting distance within a preset range, and storing the intercepted protein and the ligand together into a protein-ligand structure;

s3, carrying out hydrogenation saturation on the intercepted protein-ligand structure to obtain a cluster model after hydrogenation saturation;

s4, inputting the cluster model after hydrogenation saturation into a GFN2-xTB program, setting a strong constraint on heavy atoms of truncated proteins in the cluster model after hydrogenation saturation, and setting an environment model of protein-ligand interaction; wherein the strongly constrained force constant is selected to be 0.5-1.0 Hartree/Bohr ² (ii) a The environment model is an implicit hydrosolvent model, a vacuum model, a methanol solvent model, a toluene solvent model or an ethane model;

s5, dividing the cluster model with strong constraint into three structures of a cluster model whole body, a truncated protein residue and a ligand, respectively performing Hessian calculation on the three structures by using a GFN2-xTB method, and reading the free energy G of the cluster model whole body from an output file _com Free energy of truncated protein residue G _pro And free energy G of the ligand _lig And calculating the binding free energy delta G of the protein-ligand according to the three free energies, wherein the cluster model is a protein-ligand structure which is subjected to hydrogenation saturation and has strong constraint as a whole.

2. The method of claim 1, wherein the free energy of protein-ligand bindingΔ G according to the protein and ligand binding free energy formula: Δ G = G _com -G _pro -G _lig Calculating;

wherein AG is the binding free energy of protein-ligand,

G _com is the free energy of the cluster model as a whole,

G _pro in order to truncate the free energy of the protein residue,

G _lig is the free energy of the ligand.

3. The method of claim 1, wherein the predetermined range of the truncation distance is

A range.

4. The method according to claim 1, wherein the three-dimensional molecular modeling software is one or two of PyMOL, VMD, or GaussView.

5. The method of claim 1, wherein in step S1, the heavy atom information of the protein and ligand structures is obtained from PDB-BIND website and protein database website.

6. The method of claim 1, wherein in the step S3, the hydrogenation saturation is performed by using three-dimensional molecular modeling software.