WO2022094873A1 - Molecular force field quality control system and control method therefor - Google Patents

Molecular force field quality control system and control method therefor Download PDF

Info

Publication number
WO2022094873A1
WO2022094873A1 PCT/CN2020/126782 CN2020126782W WO2022094873A1 WO 2022094873 A1 WO2022094873 A1 WO 2022094873A1 CN 2020126782 W CN2020126782 W CN 2020126782W WO 2022094873 A1 WO2022094873 A1 WO 2022094873A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
energy
molecular
force field
index
Prior art date
Application number
PCT/CN2020/126782
Other languages
French (fr)
Chinese (zh)
Inventor
王果
马健
张佩宇
方栋
温书豪
赖力鹏
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2020/126782 priority Critical patent/WO2022094873A1/en
Publication of WO2022094873A1 publication Critical patent/WO2022094873A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like

Definitions

  • the invention belongs to the field of molecular mechanics, and in particular relates to a molecular force field quality control system and a control method thereof.
  • the invention is suitable for evaluating parameters in a molecular force field and for evaluating poor performance in a molecular force field. parameters for fitting correction.
  • Molecular mechanics has a wide range of applications in various fields such as drug design and materials science due to its low computational cost and high computational accuracy. Molecular mechanics uses specific functions to calculate molecular properties (such as structure and energy), and the functional form and parameters in the function are called molecular force fields.
  • the commonly used force field energy can be decomposed into two parts: bonding (intramolecular) and non-bonding (intermolecular), of which the bonding part describes the vibration and rotation of the chemical bond in the molecule, including bond length, bond angle, dihedral angle, surface
  • the outer bending parameter, the non-bonding part describes the attraction or repulsion of atoms with far distances between or within the molecule due to electrostatic and van der Waals forces, including the charge model and van der Waals parameters.
  • the coverage of the training set affects the scope of application of the parameters, and the similarity within the training set affects the accuracy of the parameters , so the generality and accuracy of the force field need to be balanced, which results in differences in the accuracy of most force fields for target molecules.
  • force field development and application it is necessary to evaluate and validate specific test sets to obtain quality control data.
  • the present invention provides a molecular force field quality control system and a control method thereof.
  • the molecular conformational energy and structure optimized by a high-precision quantum chemical method are used as standards to test and evaluate the molecular mechanics calculation results of different force fields. Energy and structural deviations between standards.
  • Molecular force field quality control system including four modules: data acquisition module, energy analysis module, structure analysis module and data statistics module;
  • the data acquisition module selects corresponding computational simulation tools to generate the quantum mechanical optimization data for the target and the molecular mechanics optimization data of the force field to be measured, And organize it into a data table and molecular coordinate file containing the specified column, as the input data of the energy analysis module;
  • the energy analysis module calculates the evaluation indexes related to the molecular conformational energy and potential energy surface of the index set; the program reads the molecular name, the dihedral atomic number, angle and conformational energy provided by the user to limit optimization, and according to two test methods Parameter calculation energy analysis index, generating original data table;
  • the structure analysis module calculates the evaluation index related to the molecular structure of the index set;
  • the program reads the molecular name, the dihedral atomic number and 3D coordinates provided by the user to limit optimization, calculates the structure analysis index, and generates a raw data table;
  • the data statistics module includes a series of general statistical and graphing programs for energy analysis and structural analysis, which can read the original data table of energy and structural analysis results, call corresponding programs, calculate statistical indicators, print statistical data, and draw statistics. chart.
  • the data acquisition module selects the corresponding tool according to the specific situation, and the other modules are implemented with Python programs, involving numpy, scipy, pandas, matplotlib and rdkit libraries, integrating indicator calculation, data statistics and drawing processes into Jupyter notebook .
  • the energy analysis module calculates the single conformation energy related indexes and potential energy surface related indexes in turn, and can extract a specific potential energy surface for drawing and viewing;
  • the structural analysis module calculates the RMSD, bond length, bond angle, and dihedral angle related indexes in turn.
  • the force field developer can additionally input the parameter item data table to calculate the parameter item related indexes;
  • the data statistics module reads the calculation results of steps (4) and (5), defines the colors and labels of pictures and tables, and outputs statistical data and reports.
  • Indicator calculation and data analysis tools are decoupled from commercial software, which can process and analyze calculation data from a wide range of sources.
  • Fig. 1 is the molecular force field quality control process flow of the present invention
  • Fig. 2 is the potential energy surface diagram of embodiment
  • Fig. 3 is the molecular structure diagram of the embodiment
  • Fig. 4a is the conformational energy correlation of Example QM
  • Fig. 4b is the energy correlation of the MM conformation of the embodiment
  • Fig. 5a is the QM energy dependence of the potential energy surface minima of the embodiment
  • Fig. 5b is the MM energy dependence of the potential energy surface minima of the embodiment
  • Fig. 6 is the potential energy surface RMSE distribution of embodiment
  • Fig. 7 is the distribution of potential energy surface minimum value correlation coefficient R of the embodiment
  • Fig. 8 is the distribution of potential energy surface correlation coefficient R of the embodiment
  • Fig. 9 is the position (angle) deviation of the extremum value of the potential energy surface of the embodiment.
  • Fig. 10a is the two-dimensional distribution of potential energy surface minimum value energy and position error of the GAFF2 force field of the embodiment
  • Fig. 10b is the two-dimensional distribution of the minimum value energy and position error of the potential energy surface of the OpenFF force field of the embodiment
  • Figure 11 is a root mean square displacement (RMSD) distribution for an embodiment
  • Fig. 12 is the bond length deviation distribution of embodiment
  • Fig. 13 is the bond angle deviation distribution of embodiment
  • FIG. 14 is a non-fixed dihedral angle deviation distribution of the embodiment.
  • the quality control index set of molecular force field is shown in Table 1.
  • the metric set defines a series of metrics that evaluate the performance of the force field parameters on the test set.
  • Molecular force field quality control system including four modules: data acquisition module, energy analysis module, structure analysis module and data statistics module;
  • the data acquisition module can choose the corresponding tool according to the specific situation, and the rest of the modules are implemented by Python programs, involving numpy, scipy, pandas, matplotlib and rdkit libraries, integrating indicator calculation and data statistics and drawing process into Jupyter notebook.
  • Python programs involving numpy, scipy, pandas, matplotlib and rdkit libraries, integrating indicator calculation and data statistics and drawing process into Jupyter notebook. This process architecture removes the reliance on commercial software, facilitates use by users and developers with scientific computing backgrounds, and enables visualization.
  • Data acquisition can be based on the source of molecules in the test set, the source of the force field to be measured, and user preferences, and corresponding computational simulation tools can be selected to generate the target quantum mechanics optimization data and the molecular mechanics optimization data of the force field to be measured, and organize them as Contains the data table and molecular coordinate file of the specified column as the input data of the analysis module.
  • the energy analysis module calculates the evaluation indexes related to the molecular conformational energy and potential energy surface of the index set.
  • the program reads the molecular name provided by the user, the atomic number of the dihedral angle, the angle and the conformational energy provided by the user, calculates the energy analysis index according to the two test method parameters, and generates a raw data table.
  • the structure analysis module calculates the evaluation indexes related to the molecular structure of the index set.
  • the program reads the molecular name, the atomic number of the dihedral angle and the 3D coordinates provided by the user to limit the optimization, calculates the structural analysis index, and generates a raw data table.
  • the data statistics module includes a series of general statistical and graphing programs for energy analysis and structural analysis. It can read the original data table of energy and structural analysis results, call corresponding programs, calculate statistical indicators, print statistical data, and draw statistical charts.
  • control method of the molecular force field quality control system includes the following steps:
  • QM data Preparation of high-precision quantum chemical data (QM data): select the target test set, use quantum chemical calculation software (such as PSI4), and use high-precision quantum chemical methods (such as B3LYP/6-31G(d)) to analyze molecules
  • QM data select the target test set, use quantum chemical calculation software (such as PSI4), and use high-precision quantum chemical methods (such as B3LYP/6-31G(d)) to analyze molecules
  • PSI4 quantum chemical calculation software
  • high-precision quantum chemical methods such as B3LYP/6-31G(d)
  • MM data molecular mechanics data of the force field to be measured: using the conformation in step (1) as the input conformation, one or more force fields to be measured are used to perform molecular mechanics optimization for more than 1000 steps respectively, and the obtained
  • the energy data E MM1 , E MM2 , ... are saved in the format shown in Table 2, the energy unit is kcal/mol, and the structure data is saved in .mol or .sdf format.
  • the energy analysis module calculates the single conformation energy related indexes and potential energy surface related indexes in turn, and can extract a specific potential energy surface for drawing and viewing.
  • the structural analysis module calculates the RMSD and the related indexes of bond length, bond angle and dihedral angle in turn.
  • the force field developer can additionally input the parameter item data table to calculate the parameter item related indexes.
  • the data statistics module reads the calculation results of steps (4) and (5), defines the colors and labels of pictures and tables, and outputs statistical data and reports, as shown in Table 3-Table 5, Figure 4a- Figure 14 .
  • This example takes the Roche test set as an example.
  • the test set contains 459 molecules with rotatable dihedral angles, and evaluates the performance of two open source force fields, GAFF2 and OpenFF.
  • Molecular coordinates were analyzed using the structural analysis module to obtain RMSD, bond length, bond angle and dihedral angle data tables.
  • the correlation coefficient R of the GAFF2 potential energy surface of the 006-C12H12N2 molecule and the QM results is less than 0, showing a negative correlation, and the GAFF2 force field does not perform well.
  • the energy barrier of the potential energy surface is less than 5kcal/mol, and the energy barrier of the dihedral angle rotation is lower, but the description of the extreme point position by the GAFF2 force field is wrong. Therefore, when users need to calculate the molecules of secondary aromatic amines, they can choose other force fields, and force field developers can consider optimizing the parameters of such molecules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A molecular force field quality control system and a control method therefor. The molecular force field quality control system comprises: a data acquisition module for reading quantum mechanics optimization data for benchmarking and molecular mechanics optimization data of a force field to be tested, and consolidating said data into a data table and a molecular coordinate file that comprise specified items; an energy analysis module for calculating evaluation indexes in an index set that are related to molecular conformation energy and a potential energy surface, calculating an energy analysis index by means of parameters according to two test methods, and generating an original data table; a structural analysis module for calculating an evaluation index in the index set that is related to a molecular structure, calculating a structural analysis index, and generating an original data table; and a data statistics module for calculating a statistical index, printing statistical data, and drawing a statistical chart. By incorporating force field parameter items into a quality control process, a force field developer can accurately identify underperforming parameter items in a test set, and targetedly optimize parameters.

Description

分子力场质量控制系统及其控制方法Molecular force field quality control system and its control method 技术领域technical field
本发明属于分子力学领域,具体涉及一种分子力场质量控制系统及其控制方法,作为测试力场中分子内参数表现的方法,适用于对于对分子力场中参数进行评价和对于表现不好的参数进行拟合修正。The invention belongs to the field of molecular mechanics, and in particular relates to a molecular force field quality control system and a control method thereof. As a method for testing the performance of intramolecular parameters in a force field, the invention is suitable for evaluating parameters in a molecular force field and for evaluating poor performance in a molecular force field. parameters for fitting correction.
背景技术Background technique
分子力学因其较低的计算成本和较高的计算精度,在药物设计、材料科学等各个领域有广泛的应用。分子力学用特定的函数计算分子性质(如结构和能量),采用的函数形式与函数中的参数称为分子力场。常用的力场能量可分解为成键(分子内)和非键(分子间)两大部分,其中成键部分描述分子内化学键的振动和转动,包含键长、键角、二面角、面外弯曲参数,非键部分描述分子间或分子内间隔较远的原子由静电和范德华力造成的吸引或排斥作用,包括电荷模型和范德华参数。Molecular mechanics has a wide range of applications in various fields such as drug design and materials science due to its low computational cost and high computational accuracy. Molecular mechanics uses specific functions to calculate molecular properties (such as structure and energy), and the functional form and parameters in the function are called molecular force fields. The commonly used force field energy can be decomposed into two parts: bonding (intramolecular) and non-bonding (intermolecular), of which the bonding part describes the vibration and rotation of the chemical bond in the molecule, including bond length, bond angle, dihedral angle, surface The outer bending parameter, the non-bonding part, describes the attraction or repulsion of atoms with far distances between or within the molecule due to electrostatic and van der Waals forces, including the charge model and van der Waals parameters.
由于分子力场的模型本身包含近似处理,同时参数来源于大量分子的量子化学计算结果拟合或经验规律,训练集的覆盖度影响参数的适用范围,训练集内部的相似性影响参数的精确度,因此力场的通用性和精确度需要进行平衡,这造成了大多数力场对目标分子的精确度存在差异。在力场开发和应用中,有必要对特定测试集进行评价和验证,获得质量控制数据。Since the molecular force field model itself contains approximation processing, and the parameters are derived from the fitting of quantum chemical calculation results of a large number of molecules or empirical laws, the coverage of the training set affects the scope of application of the parameters, and the similarity within the training set affects the accuracy of the parameters , so the generality and accuracy of the force field need to be balanced, which results in differences in the accuracy of most force fields for target molecules. In force field development and application, it is necessary to evaluate and validate specific test sets to obtain quality control data.
目前对开源或商用力场的文献报道中,使用的评价指标不统一,指标定义不明确,测试集差异大,计算流程通常与数据分析流程耦合,且存在大量数据不公开,缺乏完善的评价方法和指标体系。研究人员和商业用户在特异化的使用场景下,一般没有条件对选用的力场进行定量的方法验证,获得数据支撑。一套完善且独立的质量控制流程有助于在不同场景下对力场参数进行选择和优化,得到更精确的分子力学计算结果。In the current literature reports on open source or commercial force fields, the evaluation indicators used are not uniform, the indicators are not clearly defined, the test sets are very different, the calculation process is usually coupled with the data analysis process, and there is a large amount of data that is not disclosed, lacking a complete evaluation method. and indicator system. Researchers and commercial users generally do not have the conditions to perform quantitative method verification on the selected force field and obtain data support under the specific usage scenario. A complete and independent quality control process helps to select and optimize force field parameters in different scenarios, and obtain more accurate molecular mechanics calculation results.
发明内容SUMMARY OF THE INVENTION
针对上述技术问题,本发明提供一种分子力场质量控制系统及其控制方法,以高精度的量子化学方法优化的分子构象能量和结构作为标准,测试和评价不同力场的分子力学计算结果与标准之间的能量和结构偏差。In view of the above technical problems, the present invention provides a molecular force field quality control system and a control method thereof. The molecular conformational energy and structure optimized by a high-precision quantum chemical method are used as standards to test and evaluate the molecular mechanics calculation results of different force fields. Energy and structural deviations between standards.
具体技术方案为:The specific technical solutions are:
分子力场质量控制系统,包括数据获取模块、能量分析模块、结构分析模块和数据统计模块四个模块;Molecular force field quality control system, including four modules: data acquisition module, energy analysis module, structure analysis module and data statistics module;
所述的数据获取模块,数据获取根据测试集分子来源、待测力场来源和用户偏好,选用相应的计算模拟工具生成用于对标的量子力学优化数据和待测力场的分子力学优化数据,并整理为包含指定栏目的数据表格和分子坐标文件,作为能量分析模块的输入数据;In the data acquisition module, according to the molecular source of the test set, the source of the force field to be measured and user preference, the data acquisition selects corresponding computational simulation tools to generate the quantum mechanical optimization data for the target and the molecular mechanics optimization data of the force field to be measured, And organize it into a data table and molecular coordinate file containing the specified column, as the input data of the energy analysis module;
所述的能量分析模块,计算指标集种分子构象能量和势能面相关的评价指标;程序读取用户提供的分子名称、限制优化的二面角原子序号、角度和构象能量,按照两个测试方法参数计算能量分析指标,生成原始数据表格;The energy analysis module calculates the evaluation indexes related to the molecular conformational energy and potential energy surface of the index set; the program reads the molecular name, the dihedral atomic number, angle and conformational energy provided by the user to limit optimization, and according to two test methods Parameter calculation energy analysis index, generating original data table;
所述的结构分析模块,计算指标集种分子结构相关的评价指标;程序读取用户提供的分子名称、限制优化的二面角原子序号和3D坐标,计算结构分析指标,生成原始数据表格;The structure analysis module calculates the evaluation index related to the molecular structure of the index set; the program reads the molecular name, the dihedral atomic number and 3D coordinates provided by the user to limit optimization, calculates the structure analysis index, and generates a raw data table;
所述的数据统计模块,包括一系列能量分析和结构分析通用的统计和作图程序,可读取能量和结构分析结果的原始数据表格,调用相应程序,计算统计指标,打印统计数据,绘制统计图表。The data statistics module includes a series of general statistical and graphing programs for energy analysis and structural analysis, which can read the original data table of energy and structural analysis results, call corresponding programs, calculate statistical indicators, print statistical data, and draw statistics. chart.
其中,所述的数据获取模块根据具体情况选择相应的工具,其余模块均用Python程序实现,涉及numpy,scipy,pandas,matplotlib和rdkit库,将指标计算和数据统计及绘图流程集成到Jupyter notebook中。Among them, the data acquisition module selects the corresponding tool according to the specific situation, and the other modules are implemented with Python programs, involving numpy, scipy, pandas, matplotlib and rdkit libraries, integrating indicator calculation, data statistics and drawing processes into Jupyter notebook .
分子力场质量控制系统的控制方法,包括以下步骤:The control method of the molecular force field quality control system includes the following steps:
(1)高精度的量子化学数据即QM数据准备:选择目标测试集,用量子化学计算软件,使用高精度的量子化学方法对分子的可旋转二面角进行势能面扫描,扫描间隔一般为15或30度,所得的能量E QM和坐标r QM用于对标; (1) Preparation of high-precision quantum chemical data, namely QM data: select the target test set, use quantum chemical calculation software, and use high-precision quantum chemical methods to scan the potential energy surface of the rotatable dihedral angle of the molecule, and the scan interval is generally 15 or 30 degrees, the resulting energy E QM and coordinate r QM are used for benchmarking;
(2)待测力场的分子力学数据即MM数据准备:以步骤(1)中的构象作为输入构象,分别用一个或多个待测力场进行1000步以上的分子力学优化,将所得的能量数据E MM1,E MM2,…保存为表格的格式,能量单位为kcal/mol,结构数据保存为.mol或.sdf格式; (2) Preparation of molecular mechanics data of the force field to be measured, namely MM data: using the conformation in step (1) as the input conformation, one or more force fields to be measured are used to perform molecular mechanics optimization for more than 1000 steps respectively, and the obtained Energy data E MM1 , E MM2 , ... are saved in table format, energy unit is kcal/mol, and structure data is saved in .mol or .sdf format;
(3)能量平移:输入数据表格,对每个可旋转二面角的E QM和E MMi进行平移,i=1,2,…,使E的能量最低点的值为0;E MMi有两种平移策略可选择,分别为最低点为0和最小化与E QM的方差; (3) Energy translation: input the data table, translate the E QM and E MMi of each rotatable dihedral angle, i=1, 2,..., so that the value of the lowest energy point of E is 0; E MMi has two There are a variety of translation strategies to choose from, namely, the lowest point is 0 and the variance with E QM is minimized;
(4)能量分析模块依次计算单构象能量相关指标和势能面相关指标,可抽取特定的势能面绘图,进行查看;(4) The energy analysis module calculates the single conformation energy related indexes and potential energy surface related indexes in turn, and can extract a specific potential energy surface for drawing and viewing;
(5)结构分析模块依次计算RMSD和键长、键角、二面角相关指标,力场开发人员可额外输入参数项数据表格,计算参数项相关指标;(5) The structural analysis module calculates the RMSD, bond length, bond angle, and dihedral angle related indexes in turn. The force field developer can additionally input the parameter item data table to calculate the parameter item related indexes;
(6)数据统计模块读取步骤(4)和步骤(5)的计算结果,定义图片、表格的颜色和标签,输出统计数据和报表。(6) The data statistics module reads the calculation results of steps (4) and (5), defines the colors and labels of pictures and tables, and outputs statistical data and reports.
本发明提供的分子力场质量控制系统及其控制方法,具有以下技术优势:The molecular force field quality control system and its control method provided by the present invention have the following technical advantages:
(1)规范了分子力场质量控制的标准化指标集。(1) Standardized set of standardized indicators for molecular force field quality control.
(2)提出了评估二面角势能面形状与QM势能面相似度的一系列未见于文献报道的指标。(2) A series of unreported indexes are proposed to evaluate the similarity between the shape of the dihedral angle potential energy surface and the QM potential energy surface.
(3)将力场参数项纳入质量控制流程,力场开发者可准确识别测试集中表现不佳的参数项,从而有针对性地进行参数优化。(3) Incorporating the force field parameter items into the quality control process, the force field developers can accurately identify the parameter items that perform poorly in the test set, so as to optimize the parameters in a targeted manner.
(4)指标计算和数据分析工具与商业软件解耦,可处理和分析来源广泛的计算数据。(4) Indicator calculation and data analysis tools are decoupled from commercial software, which can process and analyze calculation data from a wide range of sources.
(5)指标计算和数据统计及绘图流程集成到Jupyter notebook中,实现可视化。(5) The indicator calculation and data statistics and drawing process are integrated into the Jupyter notebook to realize visualization.
附图说明Description of drawings
图1为本发明的分子力场质量控制流程;Fig. 1 is the molecular force field quality control process flow of the present invention;
图2为实施例的势能面图;Fig. 2 is the potential energy surface diagram of embodiment;
图3为实施例的分子结构图;Fig. 3 is the molecular structure diagram of the embodiment;
图4a为实施例QM构象能量相关性;Fig. 4a is the conformational energy correlation of Example QM;
图4b为实施例MM构象能量相关性;Fig. 4b is the energy correlation of the MM conformation of the embodiment;
图5a为实施例的势能面极小值的QM能量相关性;Fig. 5a is the QM energy dependence of the potential energy surface minima of the embodiment;
图5b为实施例的势能面极小值的MM能量相关性;Fig. 5b is the MM energy dependence of the potential energy surface minima of the embodiment;
图6为实施例的势能面RMSE分布;Fig. 6 is the potential energy surface RMSE distribution of embodiment;
图7为实施例的势能面极小值相关系数R的分布;Fig. 7 is the distribution of potential energy surface minimum value correlation coefficient R of the embodiment;
图8为实施例的势能面相关系数R的分布;Fig. 8 is the distribution of potential energy surface correlation coefficient R of the embodiment;
图9为实施例的势能面极值小值位置(角度)偏差;Fig. 9 is the position (angle) deviation of the extremum value of the potential energy surface of the embodiment;
图10a为实施例GAFF2力场的势能面极小值能量和位置误差二维分布;Fig. 10a is the two-dimensional distribution of potential energy surface minimum value energy and position error of the GAFF2 force field of the embodiment;
图10b为实施例OpenFF力场的势能面极小值能量和位置误差二维分布;Fig. 10b is the two-dimensional distribution of the minimum value energy and position error of the potential energy surface of the OpenFF force field of the embodiment;
图11为实施例的均方根位移(RMSD)分布;Figure 11 is a root mean square displacement (RMSD) distribution for an embodiment;
图12为实施例的键长偏差分布;Fig. 12 is the bond length deviation distribution of embodiment;
图13为实施例的键角偏差分布;Fig. 13 is the bond angle deviation distribution of embodiment;
图14为实施例的非固定二面角偏差分布。FIG. 14 is a non-fixed dihedral angle deviation distribution of the embodiment.
具体实施方式Detailed ways
结合实施例说明本发明的具体技术方案。The specific technical solutions of the present invention are described with reference to the embodiments.
具体技术方案为:The specific technical solutions are:
分子力场质量控制指标集见表1。指标集定义了一系列指标,评估力场参数在测试集上的表现。The quality control index set of molecular force field is shown in Table 1. The metric set defines a series of metrics that evaluate the performance of the force field parameters on the test set.
表1分子力场质量控制指标集Table 1 Molecular force field quality control index set
Figure PCTCN2020126782-appb-000001
Figure PCTCN2020126782-appb-000001
Figure PCTCN2020126782-appb-000002
Figure PCTCN2020126782-appb-000002
分子力场质量控制系统,包括数据获取模块、能量分析模块、结构分析模块和数据统计模块四个模块;Molecular force field quality control system, including four modules: data acquisition module, energy analysis module, structure analysis module and data statistics module;
数据获取模块可根据具体情况选择相应的工具,其余模块均用Python程序实现,涉及numpy,scipy,pandas,matplotlib和rdkit库,将指标计算和数据统计及绘图流程集成到Jupyter notebook中。这种流程架构可去除对商业软件的依赖,方便拥有科学计算背景的用户和开发人员使用,同时实现可视化。The data acquisition module can choose the corresponding tool according to the specific situation, and the rest of the modules are implemented by Python programs, involving numpy, scipy, pandas, matplotlib and rdkit libraries, integrating indicator calculation and data statistics and drawing process into Jupyter notebook. This process architecture removes the reliance on commercial software, facilitates use by users and developers with scientific computing backgrounds, and enables visualization.
数据获取模块数据获取可根据测试集分子来源、待测力场来源和用户偏好,选用相应的计算模拟工具生成用于对标的量子力学优化数据和待测力场的分子力学优化数据,并整理为包含指定栏目的数据表格和分子坐标文件,作为分析模块的输入数据。Data acquisition module Data acquisition can be based on the source of molecules in the test set, the source of the force field to be measured, and user preferences, and corresponding computational simulation tools can be selected to generate the target quantum mechanics optimization data and the molecular mechanics optimization data of the force field to be measured, and organize them as Contains the data table and molecular coordinate file of the specified column as the input data of the analysis module.
能量分析模块计算指标集种分子构象能量和势能面相关的评价指标。程序读取用户提供的分子名称、限制优化的二面角原子序号、角度和构象能量,按照两个测试方法参数计算能量分析指标,生成原始数据表格。The energy analysis module calculates the evaluation indexes related to the molecular conformational energy and potential energy surface of the index set. The program reads the molecular name provided by the user, the atomic number of the dihedral angle, the angle and the conformational energy provided by the user, calculates the energy analysis index according to the two test method parameters, and generates a raw data table.
结构分析模块计算指标集种分子结构相关的评价指标。程序读取用户提供的分子名称、限制优化的二面角原子序号和3D坐标,计算结构分析指标,生成原始数据表格。The structure analysis module calculates the evaluation indexes related to the molecular structure of the index set. The program reads the molecular name, the atomic number of the dihedral angle and the 3D coordinates provided by the user to limit the optimization, calculates the structural analysis index, and generates a raw data table.
数据统计模块包括一系列能量分析和结构分析通用的统计和作图程序,可读取能量和结构分析结果的原始数据表格,调用相应程序,计算统计指标,打印统计数据,绘制统计图表。The data statistics module includes a series of general statistical and graphing programs for energy analysis and structural analysis. It can read the original data table of energy and structural analysis results, call corresponding programs, calculate statistical indicators, print statistical data, and draw statistical charts.
如图1所示,分子力场质量控制系统的控制方法,包括以下步骤:As shown in Figure 1, the control method of the molecular force field quality control system includes the following steps:
(1)高精度的量子化学数据(QM数据)准备:选择目标测试集,用量子化学计算软件(如PSI4),使用高精度的量子化学方法(如B3LYP/6-31G(d))对分子的可旋转二面角进行势能面扫描,扫描间隔一般为15或30度,所得的能量E QM和坐标r QM用于对标。 (1) Preparation of high-precision quantum chemical data (QM data): select the target test set, use quantum chemical calculation software (such as PSI4), and use high-precision quantum chemical methods (such as B3LYP/6-31G(d)) to analyze molecules The potential energy surface is scanned at the rotatable dihedral angle of , and the scanning interval is generally 15 or 30 degrees, and the obtained energy E QM and coordinate r QM are used for benchmarking.
(2)待测力场的分子力学数据(MM数据)准备:以步骤(1)中的构象作为输入构象,分别用一个或多个待测力场进行1000步以上的分子力学优化,将所得的能量数据E MM1,E MM2,…保存为表2所示的格式,能量单位为kcal/mol,结构数据保存为.mol或.sdf格式。 (2) Preparation of molecular mechanics data (MM data) of the force field to be measured: using the conformation in step (1) as the input conformation, one or more force fields to be measured are used to perform molecular mechanics optimization for more than 1000 steps respectively, and the obtained The energy data E MM1 , E MM2 , ... are saved in the format shown in Table 2, the energy unit is kcal/mol, and the structure data is saved in .mol or .sdf format.
表2输入数据表格Table 2 Input data table
Figure PCTCN2020126782-appb-000003
Figure PCTCN2020126782-appb-000003
(3)能量平移:输入如表2所示表格,对每个可旋转二面角的E QM和E MMi(i=1,2,…)进行平移,使E的能量最低点的值为0。E MMi有两种平移策略可选择,分别为最低点为0和最小化与E QM的方差。 (3) Energy translation: Enter the table as shown in Table 2, and translate the E QM and E MMi (i=1, 2,...) of each rotatable dihedral angle, so that the value of the lowest energy point of E is 0 . E MMi has two translation strategies to choose from, which are nadir at 0 and minimization of variance with E QM .
(4)能量分析模块依次计算单构象能量相关指标和势能面相关指标,可抽取特定的势能面绘图,进行查看。(4) The energy analysis module calculates the single conformation energy related indexes and potential energy surface related indexes in turn, and can extract a specific potential energy surface for drawing and viewing.
(5)结构分析模块依次计算RMSD和键长、键角、二面角相关指标,力场开发人员可额外输入参数项数据表格,计算参数项相关指标。(5) The structural analysis module calculates the RMSD and the related indexes of bond length, bond angle and dihedral angle in turn. The force field developer can additionally input the parameter item data table to calculate the parameter item related indexes.
(6)数据统计模块读取步骤(4)和步骤(5)的计算结果,定义图片、表格的颜色和标签,输出统计数据和报表,如表3-表5,图4a-图14所示。(6) The data statistics module reads the calculation results of steps (4) and (5), defines the colors and labels of pictures and tables, and outputs statistical data and reports, as shown in Table 3-Table 5, Figure 4a-Figure 14 .
本实施例以Roche测试集为例,测试集包含459个含有可旋转二面角的分子,评估两个开源力场GAFF2和OpenFF的表现。先从公开发布的网页下载QM势能面扫描的能量和结构,再用Amber软件获得GAFF2力场的MM数据,用OpenMM软件计算OpenFF力场的MM数据,制成如表2所示的能量表格,并将分子坐标储存为.mol格式。This example takes the Roche test set as an example. The test set contains 459 molecules with rotatable dihedral angles, and evaluates the performance of two open source force fields, GAFF2 and OpenFF. First download the energy and structure of the QM potential energy surface scan from the publicly released web page, then use the Amber software to obtain the MM data of the GAFF2 force field, use the OpenMM software to calculate the MM data of the OpenFF force field, and make the energy table shown in Table 2, And store the molecular coordinates in .mol format.
使用能量分析模块对能量表格进行分析,得到QC能量分析数据表格,包含单个构象的QC指标数值。Use the energy analysis module to analyze the energy table to obtain a QC energy analysis data table, including the QC index values of a single conformation.
使用结构分析模块对分子坐标进行分析,得到RMSD,键长、键角和二面角数据表格。Molecular coordinates were analyzed using the structural analysis module to obtain RMSD, bond length, bond angle and dihedral angle data tables.
用数据统计模块生成统计数据和报表,如表3-表5,图4a-图14所示。Use the data statistics module to generate statistical data and reports, as shown in Table 3-Table 5, Figure 4a-Figure 14.
表3测试参数、构象能量偏差、二面角RMSE和相关系数R统计信息Table 3 Test parameters, conformational energy deviation, dihedral angle RMSE and correlation coefficient R statistics
Figure PCTCN2020126782-appb-000004
Figure PCTCN2020126782-appb-000004
表4势能面极值点匹配率Table 4 Matching rate of extreme points of potential energy surface
Figure PCTCN2020126782-appb-000005
Figure PCTCN2020126782-appb-000005
表5结构指标打印结果示例Table 5 Example of print result of structure index
Figure PCTCN2020126782-appb-000006
Figure PCTCN2020126782-appb-000006
从统计报表中可以看出,OpenFF在表3和图4a-图8所示的大部分能量指标上,偏差普遍小于GAFF2,表现更优;但在势能面形状的描述方面,表3中的势能面相关性系数R,GAFF2与OpenFF表现相当,表4所示的势能面极小值和全局最小值匹配率以及图10a和图10b所示的极小值能量和位置的误差二维分布,在势能面形状的精确匹配方面,GAFF2的表现甚至略优于OpenFF。在结构方面,表5的统计指标和图11-图14中的各项指标分布来看,在对力场质量而言更加重要的RMSD和二面角偏差两类指标上,OpenFF的表现依然略优于GAFF2。It can be seen from the statistical report that OpenFF's deviation is generally smaller than that of GAFF2 on most of the energy indicators shown in Table 3 and Figure 4a-Figure 8, and the performance is better; but in terms of the description of the shape of the potential energy surface, the potential energy in Table 3 The surface correlation coefficient R, GAFF2 is comparable to OpenFF, the matching rate of the potential energy surface minimum and global minimum shown in Table 4 and the two-dimensional distribution of the error of the minimum energy and position shown in Figure 10a and Figure 10b, in In terms of exact matching of potential energy surface shapes, GAFF2 performs even slightly better than OpenFF. In terms of structure, according to the statistical indicators in Table 5 and the distribution of indicators in Figures 11-14, the performance of OpenFF is still slightly lower in the two indicators of RMSD and dihedral angle deviation, which are more important to the quality of the force field. better than GAFF2.
给势能面相关系数排序后,发现006-C12H12N2分子的GAFF2势能面与QM结果的相关系数R小于0,呈现负相关,GAFF2力场表现不好。抽取该势能面数据绘制如图2所 示的势能面图和图3所示的分子结构,可以定位该二面角所在的分子属于二级芳香胺类,在本例中为角C3-N4-C11-C10。势能面能垒小于5kcal/mol,二面角旋转的能垒较低,但GAFF2力场对极值点位置的描述是错误的。因此用户在需要对二级芳香胺类的分子进行计算时,可选择其他力场,力场开发人员可考虑对这类分子的参数进行优化。After sorting the correlation coefficient of the potential energy surface, it is found that the correlation coefficient R of the GAFF2 potential energy surface of the 006-C12H12N2 molecule and the QM results is less than 0, showing a negative correlation, and the GAFF2 force field does not perform well. Extracting the potential energy surface data to draw the potential energy surface diagram shown in Figure 2 and the molecular structure shown in Figure 3, it can be determined that the molecule where the dihedral angle is located belongs to the secondary aromatic amines, in this case the angle C3-N4- C11-C10. The energy barrier of the potential energy surface is less than 5kcal/mol, and the energy barrier of the dihedral angle rotation is lower, but the description of the extreme point position by the GAFF2 force field is wrong. Therefore, when users need to calculate the molecules of secondary aromatic amines, they can choose other force fields, and force field developers can consider optimizing the parameters of such molecules.

Claims (3)

  1. 分子力场质量控制系统,其特征在于,包括数据获取模块、能量分析模块、结构分析模块和数据统计模块四个模块;The molecular force field quality control system is characterized in that it includes four modules: a data acquisition module, an energy analysis module, a structural analysis module and a data statistics module;
    所述的数据获取模块,数据获取根据测试集分子来源、待测力场来源和用户偏好,选用相应的计算模拟工具生成用于对标的量子力学优化数据和待测力场的分子力学优化数据,并整理为包含指定栏目的数据表格和分子坐标文件,作为能量分析模块的输入数据;In the data acquisition module, according to the molecular source of the test set, the source of the force field to be measured and user preference, the data acquisition selects corresponding computational simulation tools to generate the quantum mechanical optimization data for the target and the molecular mechanics optimization data of the force field to be measured, And organize it into a data table and molecular coordinate file containing the specified column, as the input data of the energy analysis module;
    所述的能量分析模块,计算指标集种分子构象能量和势能面相关的评价指标;程序读取用户提供的分子名称、限制优化的二面角原子序号、角度和构象能量,按照两个测试方法参数计算能量分析指标,生成原始数据表格;The energy analysis module calculates the evaluation indexes related to the molecular conformational energy and potential energy surface of the index set; the program reads the molecular name, the dihedral atomic number, angle and conformational energy provided by the user to limit optimization, and according to two test methods Parameter calculation energy analysis index, generating original data table;
    所述的结构分析模块,计算指标集种分子结构相关的评价指标;程序读取用户提供的分子名称、限制优化的二面角原子序号和3D坐标,计算结构分析指标,生成原始数据表格;The structure analysis module calculates the evaluation index related to the molecular structure of the index set; the program reads the molecular name, the dihedral atomic number and 3D coordinates provided by the user to limit optimization, calculates the structure analysis index, and generates a raw data table;
    所述的数据统计模块,包括一系列能量分析和结构分析通用的统计和作图程序,可读取能量和结构分析结果的原始数据表格,调用相应程序,计算统计指标,打印统计数据,绘制统计图表。The data statistics module includes a series of general statistical and graphing programs for energy analysis and structural analysis, which can read the original data table of energy and structural analysis results, call corresponding programs, calculate statistical indicators, print statistical data, and draw statistics. chart.
  2. 根据权利要求1所述的分子力场质量控制系统,其特征在于,所述的数据获取模块根据具体情况选择相应的工具,其余模块均用Python程序实现,涉及numpy,scipy,pandas,matplotlib和rdkit库,将指标计算和数据统计及绘图流程集成到Jupyter notebook中。The molecular force field quality control system according to claim 1, wherein the data acquisition module selects corresponding tools according to specific conditions, and the remaining modules are implemented with Python programs, involving numpy, scipy, pandas, matplotlib and rdkit Libraries that integrate metrics calculation and data statistics and graphing processes into Jupyter notebooks.
  3. 根据权利要求1或2所述的分子力场质量控制系统的控制方法,其特征在于,包括以下步骤:The control method of the molecular force field quality control system according to claim 1 or 2, is characterized in that, comprises the following steps:
    (1)高精度的量子化学数据即QM数据准备:选择目标测试集,用量子化学计算软件,使用高精度的量子化学方法对分子的可旋转二面角进行势能面扫描,扫描间隔一般为15或30度,所得的能量E QM和坐标r QM用于对标; (1) Preparation of high-precision quantum chemical data, namely QM data: select the target test set, use quantum chemical calculation software, and use high-precision quantum chemical methods to scan the potential energy surface of the rotatable dihedral angle of the molecule, and the scan interval is generally 15 or 30 degrees, the resulting energy E QM and coordinate r QM are used for benchmarking;
    (2)待测力场的分子力学数据即MM数据准备:以步骤(1)中的构象作为输入构象,分别用一个或多个待测力场进行1000步以上的分子力学优化,将所得的能量数据E MM1,E MM2,…保存为表格的格式,能量单位为kcal/mol,结构数据保存为.mol或.sdf格式; (2) Preparation of molecular mechanics data of the force field to be measured, namely MM data: using the conformation in step (1) as the input conformation, one or more force fields to be measured are used to perform molecular mechanics optimization for more than 1000 steps respectively, and the obtained Energy data E MM1 , E MM2 , ... are saved in table format, energy unit is kcal/mol, and structure data is saved in .mol or .sdf format;
    (3)能量平移:输入数据表格,对每个可旋转二面角的E QM和E MMi进行平移,i=1,2,…,使E的能量最低点的值为0;E MMi有两种平移策略可选择,分别为最低点为0和最小化与E QM的方差; (3) Energy translation: input the data table, translate the E QM and E MMi of each rotatable dihedral angle, i=1, 2,..., so that the value of the lowest energy point of E is 0; E MMi has two There are a variety of translation strategies to choose from, namely, the lowest point is 0 and the variance with E QM is minimized;
    (4)能量分析模块依次计算单构象能量相关指标和势能面相关指标,可抽取特定的 势能面绘图,进行查看;(4) The energy analysis module calculates the single conformation energy-related index and the potential energy surface related index in turn, and can extract a specific potential energy surface for drawing and viewing;
    (5)结构分析模块依次计算RMSD和键长、键角、二面角相关指标,力场开发人员可额外输入参数项数据表格,计算参数项相关指标;(5) The structural analysis module calculates the RMSD, bond length, bond angle, and dihedral angle related indexes in turn. The force field developer can additionally input the parameter item data table to calculate the parameter item related indexes;
    (6)数据统计模块读取步骤(4)和步骤(5)的计算结果,定义图片、表格的颜色和标签,输出统计数据和报表。(6) The data statistics module reads the calculation results of steps (4) and (5), defines the colors and labels of pictures and tables, and outputs statistical data and reports.
PCT/CN2020/126782 2020-11-05 2020-11-05 Molecular force field quality control system and control method therefor WO2022094873A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/126782 WO2022094873A1 (en) 2020-11-05 2020-11-05 Molecular force field quality control system and control method therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/126782 WO2022094873A1 (en) 2020-11-05 2020-11-05 Molecular force field quality control system and control method therefor

Publications (1)

Publication Number Publication Date
WO2022094873A1 true WO2022094873A1 (en) 2022-05-12

Family

ID=81458536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126782 WO2022094873A1 (en) 2020-11-05 2020-11-05 Molecular force field quality control system and control method therefor

Country Status (1)

Country Link
WO (1) WO2022094873A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527626A (en) * 2022-08-16 2022-12-27 腾讯科技(深圳)有限公司 Molecular processing method, molecular processing apparatus, electronic device, storage medium, and program product
CN117423394A (en) * 2023-10-19 2024-01-19 中北大学 ReaxFF post-treatment method based on Python extraction product, cluster and chemical bond information

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120154440A1 (en) * 2010-11-11 2012-06-21 Openeye Scientific Software, Inc. Augmented 2d representation of molecular structures
CN103020413A (en) * 2011-09-28 2013-04-03 中国石油化工股份有限公司 Method for calculating activation energy and reaction rate constant in arene hydrogenation reaction by computer
CN106355025A (en) * 2016-09-06 2017-01-25 北京理工大学 Allele competing reaction QM/MM method in living system
CN108595906A (en) * 2018-04-03 2018-09-28 清华大学 The calculating design system and method for luminous organic material
CN108763852A (en) * 2018-05-09 2018-11-06 深圳晶泰科技有限公司 The automation conformational analysis method of class medicine organic molecule
CN110767267A (en) * 2019-09-30 2020-02-07 华中科技大学 Python-based method for processing ReaxFF force field calculation result data
CN111415710A (en) * 2020-03-06 2020-07-14 深圳晶泰科技有限公司 Potential energy surface scanning method and system for molecular conformation space analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120154440A1 (en) * 2010-11-11 2012-06-21 Openeye Scientific Software, Inc. Augmented 2d representation of molecular structures
CN103020413A (en) * 2011-09-28 2013-04-03 中国石油化工股份有限公司 Method for calculating activation energy and reaction rate constant in arene hydrogenation reaction by computer
CN106355025A (en) * 2016-09-06 2017-01-25 北京理工大学 Allele competing reaction QM/MM method in living system
CN108595906A (en) * 2018-04-03 2018-09-28 清华大学 The calculating design system and method for luminous organic material
CN108763852A (en) * 2018-05-09 2018-11-06 深圳晶泰科技有限公司 The automation conformational analysis method of class medicine organic molecule
CN110767267A (en) * 2019-09-30 2020-02-07 华中科技大学 Python-based method for processing ReaxFF force field calculation result data
CN111415710A (en) * 2020-03-06 2020-07-14 深圳晶泰科技有限公司 Potential energy surface scanning method and system for molecular conformation space analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527626A (en) * 2022-08-16 2022-12-27 腾讯科技(深圳)有限公司 Molecular processing method, molecular processing apparatus, electronic device, storage medium, and program product
CN115527626B (en) * 2022-08-16 2023-04-25 腾讯科技(深圳)有限公司 Molecular processing method, molecular processing device, electronic apparatus, storage medium, and program product
CN117423394A (en) * 2023-10-19 2024-01-19 中北大学 ReaxFF post-treatment method based on Python extraction product, cluster and chemical bond information
CN117423394B (en) * 2023-10-19 2024-05-03 中北大学 ReaxFF post-treatment method based on Python extraction product, cluster and chemical bond information

Similar Documents

Publication Publication Date Title
WO2022094873A1 (en) Molecular force field quality control system and control method therefor
Wei et al. GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
Chen et al. DART-ID increases single-cell proteome coverage
Do et al. Protein multiple sequence alignment
Midani et al. The importance of accurately correcting for the natural abundance of stable isotopes
Cheng et al. Evaluation and optimization of task-oriented measurement uncertainty for coordinate measuring machines based on geometrical product specifications
Adams et al. ACER conquest
CN112233733B (en) Molecular force field quality control system and control method thereof
Rantanen et al. An analytic and systematic framework for estimating metabolic flux ratios from 13 C tracer experiments
Ulitzsch et al. Alleviating estimation problems in small sample structural equation modeling—A comparison of constrained maximum likelihood, Bayesian estimation, and fixed reliability approaches.
CN110516920A (en) Gyroscope credit rating appraisal procedure based on index fusion
Sundqvist et al. Validation-based model selection for 13C metabolic flux analysis with uncertain measurement errors
Liu et al. MATO: An updated tool for capturing and analyzing cytotaxonomic and morphological data
Cardini et al. Procrustes shape cannot be analyzed, interpreted or visualized one landmark at a time
Cueli et al. Tomography-based observational measurements of the halo mass function via the submillimeter magnification bias
Shui et al. A comparison of semilandmarking approaches in the analysis of size and shape
Sankar et al. Fast local alignment of protein pockets (FLAPP): a system-compiled program for large-scale binding site alignment
Phillips et al. The validation of CMM task specific measurement uncertainty software
Wiechert et al. A primer to 13C metabolic flux analysis
Nikoo et al. An MII-aware soa editor for the industrial internet of things
Mazur et al. Analysis of the planar point identification accuracy in CMM measurements
CN108038056B (en) Software defect detection system based on asymmetric classification evaluation
Puerto et al. A study on the uncertainty of a laser triangulator considering system covariances
Backman et al. BayFlux: A Bayesian method to quantify metabolic Fluxes and their uncertainty at the genome scale
McFerrin et al. Package ‘HDMD’

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20960334

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC

122 Ep: pct application non-entry in european phase

Ref document number: 20960334

Country of ref document: EP

Kind code of ref document: A1