WO2021031545A1 - 一种用于异构集群环境中的自由能微扰计算调度方法 - Google Patents

一种用于异构集群环境中的自由能微扰计算调度方法 Download PDF

Info

Publication number
WO2021031545A1
WO2021031545A1 PCT/CN2020/076599 CN2020076599W WO2021031545A1 WO 2021031545 A1 WO2021031545 A1 WO 2021031545A1 CN 2020076599 W CN2020076599 W CN 2020076599W WO 2021031545 A1 WO2021031545 A1 WO 2021031545A1
Authority
WO
WIPO (PCT)
Prior art keywords
calculation
file
free energy
energy
generate
Prior art date
Application number
PCT/CN2020/076599
Other languages
English (en)
French (fr)
Inventor
刘增辉
何咪
杨明俊
赖力鹏
马健
温书豪
Original Assignee
深圳晶泰科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳晶泰科技有限公司 filed Critical 深圳晶泰科技有限公司
Priority to PCT/CN2020/076599 priority Critical patent/WO2021031545A1/zh
Priority to US17/266,115 priority patent/US20220115094A1/en
Publication of WO2021031545A1 publication Critical patent/WO2021031545A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/90Programming languages; Computing architectures; Database systems; Data warehousing

Definitions

  • the invention belongs to the technical field of high-performance computing and drug design, and is specifically a free energy perturbation computing scheduling method used in a heterogeneous cluster environment: used in a heterogeneous (cpu+gpu) architecture cluster, it is a free energy micro
  • the interference computing service provides efficient resource utilization and micro-scheduling requirements.
  • free energy perturbation calculation is a high-precision method for evaluating the binding strength of small drug molecules and targets. It can effectively remove false positive molecules, increase the design success rate, and accelerate the development of new drugs. Process.
  • the FEP process based on enhanced sampling algorithm, high-precision molecular force field (Xforce) and rigorous data statistical analysis method needs to accurately calculate the binding free energy of hundreds of small molecule drug candidate compounds and targets in a short time .
  • XFEP's prediction error (mean unsigned error) in the calculation of functional group replacement and skeleton transition in dozens of test systems and a series of actual projects is lower than 1.0kcal/mol, and the predicted value and the data obtained in the experiment show significant correlation .
  • This high-precision method requires a large amount of calculation to support, and industrial applications require the timeliness of the calculation system, which requires as much calculation time and accuracy as possible.
  • the present invention provides a free energy perturbation calculation scheduling method used in a heterogeneous cluster environment, which makes full use of gpu+cpu heterogeneous cluster resources to undertake mass free energy perturbation calculations, especially in Adopt the copy exchange method based on the temperature of the solvent (assuming that when the temperature rises, the solvent molecules move while the solute molecules are locked).
  • Free energy perturbation is a common method for calculating free energy. Taking the canonical ensemble as an example, the free energy change from state a to state b can be calculated by the following formula:
  • T is the temperature
  • HA and HB are the Hamiltonian of state A and state A respectively
  • kB is Boltzmann's constant
  • ⁇ > represents the average of the ensemble in the state A ensemble.
  • the Rest2 method is based on the copy exchange dynamics of the Hamiltonian. By enhancing the sampling of neighboring copies, the degree of freedom of interest can be selected.
  • the biasing intensity is related to (exp(-V/kt)) to speed up the free energy calculation process.
  • a free energy perturbation calculation and scheduling method used in a heterogeneous cluster environment includes the following steps:
  • the heterogeneous cluster includes many nodes, and the method includes:
  • the calculation characteristics of the task include: running npt ensemble balance simulation, running remd parallel calculation, and whether it needs GPU support, single point calculation of the structure, analysis of a large number of small files Processing, etc.;
  • the characteristics of a computing node include: the number of free CPUs, memory usage, and gpu utilization of the node;
  • Step A In the header directory, use a recursive algorithm to scan multiple folders, scan structures and files such as molecules/proteins that have been constructed, and determine which molecular dynamics calculations need to be performed according to the feature input file of amber (such as min.in) Path, save path, according to the current environment, write the environment variables needed at runtime into the calculation call script (named run.sh), each path will run the amber program, run the npt ensemble dynamics process, and obtain the system equilibrium structure. According to the number of free CPUs in the computing node, the parallelism is determined, and the calculation is queued according to the path list. If the allocation node contains gpu cards, the number of tasks running at the same time is determined according to the idle number of gpu cards, and each task is randomly allocated to each gpu card.
  • the feature input file of amber such as min.in
  • Step B After the above steps are completed, copy the task execution script to the directory: including the script used by the slurm scheduling system; used to construct the remd input generation program and data processing script.
  • tasks need to be allocated to computing nodes with 4 or 6 GPU cards.
  • each copy corresponds to a scale Factor (lambda, the value is between 0-1)
  • each lambda is used to affect the interaction between the bonding and non-bonding of the copy
  • each copy runs a molecular dynamics simulation of the number of steps in advance, according to the metropolis standard
  • Each adjacent replica configuration can be exchanged to complete sampling in phase space. This step requires a long running time, usually several hours.
  • Step C Generate a new slurm scheduling script and process data in the cpu queue: generate cpptraj input files, use the modified cpptraj program to parse the mdcrd trajectory file, and create a new process to parse the trajectory in each folder File to achieve the effect of multi-process operation.
  • a large number of trajectory files in crd format are obtained, which are combined with each prmtop file to generate a new amber calculation input file.
  • the number of files is the number of images multiplied by the number of pre-extracted trajectories, and these files are formed into a new calculation queue. Single point energy calculation of molecular dynamics.
  • the maximum number of simultaneous computing tasks is determined.
  • new tasks are sequentially extracted from the queue and assigned to a new process for calculation, making full use of the multi-core environment. Remove the handle file from the previous step;
  • Step D Use regular expressions to extract the energy value in the log file and write it to the corresponding energy-*.dat file, clean up the redundant files, and bring the data into the free energy calculation formula to complete the calculation process.
  • the calculation amount of the step B is very large and needs to be completed on the gpu, and a series of compilation level optimizations are made to the amber program according to the cpu/gpu model of the cluster.
  • the present invention adopts the above technical solutions and has the advantages of (1) strong fault tolerance, and the failure of a single task will not affect other tasks. According to the previous technical process, once an error occurs at a certain point during runtime, it is easy to cause the entire calculation process to fail. Although it can be repaired, it will bring extra human work or more code refactoring.
  • the subsequent data processing module is used to obtain the required data results. Without distinguishing the calculation queue, the machine-time cost will be more. (3)
  • the hard disk is used reasonably. When processing a large number of small files, the intermediate files are cleared while parsing, which reduces the peak value of hard disk usage, so that the result analysis of multiple molecules can be run at the same time in a limited node.
  • calculation configuration is flexible. Non-professionals need to simply write configuration documents and generate runtime scripts through configuration. The entire process steps can be automatically executed, or certain steps can be specified for calculation, which has good calculation decoupling.
  • High-speed trajectory analysis module which can extract tens of thousands of trajectories within a few minutes on a multi-core machine, and this part may take 5-6 hours to generate new molecular dynamics calculation input files (used by amber) .
  • the computing resources are fully utilized, reducing the hard disk occupancy, especially when multiple tasks are running at the same time, the hard disk is used reasonably, the high peak value of the hard disk usage is significantly reduced, and the overall cost performance is high.
  • Figure 1 is the rest2 free energy calculation framework of a single molecule in the embodiment.
  • a free energy perturbation calculation scheduling method used in a heterogeneous cluster environment the specific calculation steps are:
  • Step (1) Configure the computing environment file (config.lsp) in a key-value manner, the parameters include: the path of each binary program, file directory, queue characteristic parameters, etc.;
  • Step (2) Before calculating a batch of molecules, create a calculation handle file according to actual needs in the directories that need to be calculated such as charge and vdw, such as naming single-run-unit, and the scheduler controls the follow-up according to whether this file exists Calculation steps.
  • the file name in the calculation directory is named with the lambda value, that is, in the form of a number string, which is sorted according to the size of the string number to be consistent with other input templates.
  • Step (3) After the preparation is completed, the scheduler will recursively scan the folders of the current directory, and generate the run.sh file that calls the amber program according to the min.in and template files contained in the sub-directory, and then summarize the execution file paths and run each Run.sh under the path.
  • This step is to perform npt ensemble simulation on the structure to obtain the equilibrium structure at each lambda value. The running time of each run.sh is not long, this step will not take too much time.
  • Step (4) After the script finds the single-run-unit directory, it will copy the task calculation template to this directory, generate the group.dat file and run the script run_gpu.sh according to the current directory file according to the predefined rules, and execute run_gpu.sh File, perform long-term remd calculation in the gpu node, and finally get the track file mdcrd.
  • a remd task uses 6 GPU cards for parallel calculation.
  • Step (5) After running the gpu job, the trajectory file needs to be parsed.
  • the corresponding directory will continue to perform data processing: first generate the input files required by the cpptraj program (customized version for inpcrd analysis), and parse out all coordinate files , And generate amber input file at the same time, ready to calculate the single point energy of each structure.
  • Step (6) After the above steps are completed, tens of thousands of small files will eventually be generated, which are used to calculate a single point of energy for amber.
  • the serial calculation is time-consuming. For this reason, we use the classic producer-consumer model to queue Schedule computing tasks and make full use of the CPU multi-core resources in the node.
  • the calculation and analysis tasks generated by each molecule are summarized in a data processing job, which facilitates the use of the cluster scheduling system and subsequent error debugging and debugging.
  • Step (7) Use regular expressions to extract the required data from the log file and write it to the corresponding energy-*.dat file. At the same time, delete the intermediate file to reduce the hard disk usage. After the analysis is completed, check the data integrity. And save and clean up files, thus ending the calculation process.
  • the script automatically generates the input files needed for the calculation, performs the gpu calculation, and starts the subsequent analytical calculation process after the calculation is completed.
  • Figure 1 It is mainly the recalculation and energy analysis part, which is carried out in the cpu queue. Since the recalculation process requires a large number of independent calculation tasks and generates a large number of intermediate files, the most conventional producer-consumer parallel calculation is adopted. The model is used for calculation, and while parsing the log file, the previous intermediate files will be deleted to reduce the use of the hard disk. And run the pmemd.mpi.CUDA program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明提供一种用于异构集群环境中的自由能微扰计算调度方法,包括以下步骤:步骤A:通过预先构造好的分子/蛋白质等结构和输入文件,先进行npt系综动力学模拟过程,获得平衡态的结构;步骤B:运行基于哈密顿量的副本交换动力学计算,获得足够多的轨迹数据;步骤C:解析轨迹文件,再与各种prmtop组合,生成新的amber计算输入,计算组合后的每个构象对应的单点能;步骤D:使用正则表达式提取log文件中的能量数值,同时清理中间临时文件,完成单个分子的计算过程。本发明的方法配置灵活,可以同时处理多个分子的自由能计算,使用者需要简单地写好配置文档,通过配置生成运行时脚本,按照设定好的流程自动执行,也可指定某几步部分计算,计算解耦性好。

Description

一种用于异构集群环境中的自由能微扰计算调度方法 技术领域
本发明属于高性能计算和药物设计技术领域,具体为一种用于异构集群环境中的自由能微扰计算调度方法,:使用在异构(cpu+gpu)架构集群中,为自由能微扰计算服务提供高效的资源利用和微调度需求。
背景技术
现代计算机硬件gpu的更新换代,使得显卡gpu具备了强大的数据并行计算能力,将cpu与gpu结合起来构建异构集群很容易获得强大的计算能力,特别适合计算密集型的应用,而且越来越多的高性能计算(HPC)用户正在向基于GPU的群集迁移,来运行其科学和工程应用。在异构计算环境中允许用户计算模型中同时使用CPU和GPU,其中,应用的连续部分在CPU上运行,而计算密集的部分在GPU上运行。通过挖掘GPU的大规模并行能力、用户运行应用程序的速度和传统的基于CPU的模式相比,速度得到极大的提升。
在众多的药物设计方法中,自由能微扰计算(FEP)是评估药物小分子和靶点结合强度的一种高精度方法,它可以有效的去除假阳性分子,提高设计成功率,加速新药研发的进程。基于增强采样算法,高精度分子力场(Xforce)和严谨的数据统计分析方法建立起来的FEP流程,需要在短时间内精确地计算出数百个小分子药物候选化合物和靶点的结合自由能。XFEP在数十个测试体系和一系列的实际项目中的官能团替换和骨架跃迁计算上的预测误差(mean unsigned error)均低于1.0kcal/mol,预测值和实验中得到的数据显示显著相关性。这一高精度方法需要大量的计算量来支撑,而且工业应用,对计算体系有时效性的要求,这就需要尽可能地提升计算时间和精度。
发明内容
为了解决以上技术问题,本发明提供一种用于异构集群环境中的自由能微扰计算调度方法,充分利用gpu+cpu异构集群资源来承担大批量的自 由能微扰计算,特别是在采用基于溶剂温度的副本交换(假设在温度升高时,溶剂分子运动,而溶质分子锁定的一种副本交换方法)。
自由能微扰是计算自由能的一种常用方法,以正则系综为例,从状态a到状态b的自由能变化可以由下式子算出:
Figure PCTCN2020076599-appb-000001
其中T为温度,HA和HB分别为状态A和状态A的哈密顿量,kB为玻尔兹曼常数,<>表示在状态A的系综中去系综平均。简单地说,为了计算状态A与状态B之间的自由能差,需要通过在状态A的系综中对两状态之间的能量差采样,然后求平均。采样可以使用分子动力学或者蒙特卡洛方法模拟。由于精确的自由能计算需要大量的采样,由此带来的计算量很大,实现过程中,为加速采样过程,人们发展了一系列增强采样的方法,其中Replica Exchange with Solute Tempering(rest2)增强采样方法就是其中一种有效方法。Rest2方法是基于哈密顿量的副本交换动力学,通过增强了邻近副本的采样,可选择感兴趣的自由度,biasing强度与(exp(-V/kt))相关来加速了自由能计算过程。
具体步骤包括如下:
一种用于异构集群环境中的自由能微扰计算调度方法,包括以下步骤:
其特征在于,应用于异构集群中的集群调度器,所述异构集群包括很多节点,所述方法包括:
获取任务的不同计算特点和计算节点的特征参数;其中任务的计算特点包括:跑npt系综平衡模拟,运行remd并行计算,以及是否需要gpu支持,结构的单点能计算,大量小文件的解析处理等;计算节点的特征包括:该节点的cpu空闲数量,内存使用,以及gpu的利用率;
步骤A:在头目录中,使用递归算法扫描多层文件夹,扫描已经构造好的分子/蛋白质等结构和文件,根据amber的特征输入文件(比如min.in)来确定需要执行分子动力计算的路径,保存路径,根据当前环境,将运行 时需要的环境变量写到计算调用脚本中(命名run.sh),每个路径将运行amber程序,运行npt系综动力学过程,获得体系平衡态的结构。根据计算节点中空闲的cpu数量,确定并行度,按照路径列表排队计算。若分配节点含有gpu卡,则根据gpu卡的空闲数量,来确定同时运行的任务数量,每个任务随机分配到各gpu卡上。
步骤B:上述步骤运行完毕后,拷贝任务执行脚本到目录中:包括slurm调度系统用的脚本;用于构造remd输入的生成程序,数据处理脚本。运行调度模块,任务需要分配到含有4个或者6个gpu卡的计算节点中。根据lambda的总数目,生成对应的group文件,调用并行版本的amber程序,运行基于哈密顿量的副本交换动力学计算;每个副本的温度都是相同的,但是每个副本都对应一个标度因子(lambda,值在0-1之间),每个lambda都是用来影响副本的成键和非成键的相互作用,每个副本跑预先步数的分子动力学模拟,根据metropolis标准,相邻的每个副本构型可以进行交换,在相空间上完成取样。此步骤需要较长的运行时间,一般需要几个小时不等。完毕后拷贝轨迹文件到各自对应的lambda值的文件夹中,生成后续数据处理时的脚本,最后生成句柄文件,用于标记计算状态。
步骤C:生成新的slurm调度脚本,在cpu队列中进行数据处理:生成cpptraj的输入文件,使用修改后的cpptraj程序来解析mdcrd轨迹文件,同时创建新的进程来解析每个文件夹内的轨迹文件,达到多进程的运行效果。解析完毕后,得到大量的crd格式的轨迹文件,再与各prmtop文件组合,生成新的amber计算输入文件,文件数为images数乘以预提取的轨迹数,将这些文件组成新的计算队列进行分子动力学的单点能计算。根据cpu核的空闲数量,确定最大同时计算任务数,当某一个任务结束时,从排队队列中顺序提取新的任务,并分配到新的进程中计算,充分利用多核环境。移除上一步的句柄文件;
步骤D:使用正则表达式提取log文件中的能量数值,并写到对应的energy-*.dat文件中,清理多余文件,将数据带入到自由能计算公式中, 完成计算过程。
其中,所述步骤B的计算量很大,需要在gpu上来完成,且根据集群的cpu/gpu型号,对amber程序做了一系列编译级别的优化。
本发明采用以上技术方案其优点在于(1)较强的容错能力,单个任务的失败不会影响其他任务。按照之前的技术流程,运行时一旦某个点出现错误,很容易导致整个计算过程失败。虽然可以修复,但会带来额外地人力工作或者更多的代码重构。而且在实践过程中,由于集群系统的不稳定性,偶发的未知错误,(比如硬盘空间不够,io异常等)往往导致整个流程的失败,这是一个限制因素;(2)并行化调度计算,最大化地利用gpu与cpu集群;原有的计算工作流程接近于串行使用,不能方便地利用集群多核环境,既导致核数浪费,又耗费时间,而且对轨迹文件的解析过程,采用的第三方python库,解析数据的速度很慢。分子动力学程序的平衡计算部分,往往需要模拟的时间很长(十几纳秒),利用gpu来进行加速计算,计算完毕后,通过后续数据处理模块,来获得需要的数据结果,由于运行时未区分计算队列,机时花费会比较多。(3)硬盘使用合理,当处理大量小文件时,边解析边清楚中间文件,降低硬盘使用峰值,使得有限节点中可以同时运行多个分子的结果解析。
本发明带来了如下效果:
(1)计算配置灵活,非专业人士需要简单地写好配置文档,通过配置生成运行时脚本,整个过程步骤即可自动执行,也可指定某几步部分计算,计算解耦性好。
(2)分子动力学的平衡计算充分利用单机多gpu卡的情形;大批量的数据分析在cpu端进行,并充分利用多核处理架构。
(3)利用统计数据和lisp语言自身的特点,用较少的代码量解决运行时一些个别的错误会导致整体计算流程的失败的情况。
(4)高速的轨迹解析模块,多核机器上可以在几分钟内提取上万个轨迹,而原来这部分可能需要5~6个小时,用来生成新的分子动力学计算输 入文件(amber使用)。
(5)计算资源得到了充分利用,降低了硬盘占用量,特别是多任务同时运行时,硬盘使用合理,显著降低硬盘使用的高峰值,综合性价比高。
附图说明
图1是实施例单个分子的rest2自由能计算框架。
具体实施方式
下面结合附图,对本发明的较优的实施例作进一步的详细说明:
一种用于异构集群环境中的自由能微扰计算调度方法,计算步骤具体的为:
步骤(1):按照key-value的方式配置计算环境文件(config.lsp),参数包括:各个二进制程序的路径,文件目录,队列特征参数等;
步骤(2):在计算一批分子前,在需要计算的目录比如charge,vdw目录下,根据实际需要创建计算句柄文件,比如命名single-run-unit,调度程序根据是否存在这个文件来控制后续的计算步骤。计算目录下的文件名用lambda值来命名,即数字字符串形式,根据字符串数字的大小排序与其他输入模版保持一致。
步骤(3):准备完毕后,调度程序会递归扫描当前目录的文件夹,根据子目录中含有min.in和模版文件,生成调用amber程序的run.sh文件,然后汇总执行文件路径,运行各个路径下的run.sh。这一步是对结构进行npt系综模拟,获得各个lambda值时的平衡态结构。每个run.sh的运行时间不长,这步不会耗费太多时间。
步骤(4):脚本找到single-run-unit目录后,会复制任务计算模版到这个目录下,根据当前目录文件按照预先定义的规则生成group.dat文件和运行脚本run_gpu.sh,执行run_gpu.sh文件,在gpu节点中进行长时间的remd的计算,最终得到轨迹文件mdcrd。这里一个remd的任务使用6块gpu卡做并行计算。
步骤(5):运行gpu作业完毕后,需要对轨迹文件进行解析对应的目 录会继续执行数据处理:首先生成cpptraj程序(定制版本,用于inpcrd解析)需要的输入文件,解析出所有的坐标文件,同时生成amber的输入文件,准备计算各个结构的单点能。
步骤(6)上述步骤结束后,最终会产生上万个不等的小文件,用于amber计算单点能,串行计算比较耗时,为此我们使用经典的生产者-消费者模型来排队调度计算任务,充分利用节点内的cpu多核资源。同时每个分子产生的计算解析任务,都汇总在一个数据处理作业中,这样方便集群调度系统的使用和后续的查错调试。
步骤(7):利用正则表达式从log文件中提取需要的数据,并写到对应的energy-*.dat文件,同时删除中间文件,降低硬盘的使用量,解析完毕后,检查数据完整性,并进行文件保存和清理,从而结束计算流程。
如图1为:remd计算,脚本自动生成计算需要的输入文件,进行gpu计算,计算完毕后启动后续的解析计算过程。
如图1为:主要为重新计算和能量解析部分,这部分在cpu队列中进行,由于重新计算过程需要独立计算的任务众多且会产生大量中间文件,采用最常规的生产者-消费者并行计算模型来计算,同时在解析日志文件的过程,会接着删除之前的中间文件,降低硬盘的使用。并运行pmemd.mpi.CUDA程序。
以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。

Claims (1)

  1. 一种用于异构集群环境中的自由能微扰计算调度方法,其特征在于,包括以下步骤:
    步骤A:通过预先构造好的分子/蛋白质等结构和输入文件,先进行npt系综动力学模拟过程,获得平衡态的结构;
    步骤B:运行基于哈密顿量的副本交换动力学计算;每个副本的温度都是相同的,但是每个副本都对应一个标度因子即lambda,值在0-1之间,每个lambda都是用来影响副本的成键和非成键的相互作用,每个副本跑预先步数的分子动力学模拟,根据metropolis标准,相邻的每个副本构型可以进行交换,在相空间上取样并完成计算;
    步骤C:完成之后解析轨迹文件,与各种prmtop组合,生成新的amber计算输入,计算组合后的每个轨迹对应的单点能,来计算新的能量;
    步骤D:使用正则表达式提取log文件中的能量数值,清理中间临时文件,带入到自由能定义的公式中得到自由能值。
PCT/CN2020/076599 2020-02-25 2020-02-25 一种用于异构集群环境中的自由能微扰计算调度方法 WO2021031545A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/076599 WO2021031545A1 (zh) 2020-02-25 2020-02-25 一种用于异构集群环境中的自由能微扰计算调度方法
US17/266,115 US20220115094A1 (en) 2020-02-25 2020-02-25 Free energy perturbation computation scheduling method used in heterogeneous cluster environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/076599 WO2021031545A1 (zh) 2020-02-25 2020-02-25 一种用于异构集群环境中的自由能微扰计算调度方法

Publications (1)

Publication Number Publication Date
WO2021031545A1 true WO2021031545A1 (zh) 2021-02-25

Family

ID=74660382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/076599 WO2021031545A1 (zh) 2020-02-25 2020-02-25 一种用于异构集群环境中的自由能微扰计算调度方法

Country Status (2)

Country Link
US (1) US20220115094A1 (zh)
WO (1) WO2021031545A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117423394B (zh) * 2023-10-19 2024-05-03 中北大学 基于Python提取产物、团簇和化学键信息的ReaxFF后处理方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108519856A (zh) * 2018-03-02 2018-09-11 西北大学 基于异构Hadoop集群环境下的数据块副本放置方法
US20190026423A1 (en) * 2013-03-15 2019-01-24 Schrödinger, Llc Cycle Closure Estimation of Relative Binding Affinities and Errors
CN109859806A (zh) * 2019-01-17 2019-06-07 中山大学 一种预测药物-靶标结合强度的绝对自由能微扰方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026423A1 (en) * 2013-03-15 2019-01-24 Schrödinger, Llc Cycle Closure Estimation of Relative Binding Affinities and Errors
CN108519856A (zh) * 2018-03-02 2018-09-11 西北大学 基于异构Hadoop集群环境下的数据块副本放置方法
CN109859806A (zh) * 2019-01-17 2019-06-07 中山大学 一种预测药物-靶标结合强度的绝对自由能微扰方法

Also Published As

Publication number Publication date
US20220115094A1 (en) 2022-04-14

Similar Documents

Publication Publication Date Title
Zhang et al. An empirical study on program failures of deep learning jobs
CN110309071B (zh) 测试代码的生成方法及模块、测试方法及系统
US8572575B2 (en) Debugging a map reduce application on a cluster
JP3480973B2 (ja) 並列処理システムの動作解析装置
Mushtaq et al. Sparkga: A spark framework for cost effective, fast and accurate dna analysis at scale
Hammond et al. WARPP: a toolkit for simulating high-performance parallel scientific codes
EP3021224B1 (en) Method and apparatus for producing a benchmark application for performance testing
US20090217246A1 (en) Evaluating Software Programming Skills
CN113704097A (zh) 在处理器追踪日志中使用编译器类型信息进行数据流分析的方法和设备
Lee et al. Scalable HPC & AI infrastructure for COVID-19 therapeutics
CN111341391B (zh) 一种用于异构集群环境中的自由能微扰计算调度方法
Kathiresan et al. Accelerating next generation sequencing data analysis with system level optimizations
Minder et al. How to translate a book within an hour: towards general purpose programmable human computers with crowdlang
WO2021031545A1 (zh) 一种用于异构集群环境中的自由能微扰计算调度方法
Bruce et al. Enabling reproducible and agile full-system simulation
US11592448B2 (en) Tandem identification engine
Liang et al. Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks
Giorgi et al. Analyzing the impact of operating system activity of different linux distributions in a distributed environment
Panagiotou et al. EDEN: A high-performance, general-purpose, NeuroML-based neural simulator
Bei et al. OSC: An Online Self-Configuring Big Data Framework for Optimization of QoS
Wannipurage et al. A Framework to capture and reproduce the Absolute State of Jupyter Notebooks
Ossyra et al. Highly interactive, steered scientific workflows on hpc systems: Optimizing design solutions
Zhou et al. SparkOT: Diagnosing operation level inefficiency in spark
Liu et al. Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective
CN114207594B (zh) 计算机程序系统的静态分析和运行时分析

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20855707

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC

122 Ep: pct application non-entry in european phase

Ref document number: 20855707

Country of ref document: EP

Kind code of ref document: A1