CN117292759A - Protein-ligand affinity evaluation method based on domestic super-computing platform - Google Patents

Protein-ligand affinity evaluation method based on domestic super-computing platform Download PDF

Info

Publication number
CN117292759A
CN117292759A CN202311092537.0A CN202311092537A CN117292759A CN 117292759 A CN117292759 A CN 117292759A CN 202311092537 A CN202311092537 A CN 202311092537A CN 117292759 A CN117292759 A CN 117292759A
Authority
CN
China
Prior art keywords
model
data
deep learning
protein
computing platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311092537.0A
Other languages
Chinese (zh)
Inventor
陈溟
谭华
苏亮
杨帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Guoshi Technology Group Co ltd
Original Assignee
Qingdao Guoshi Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Guoshi Technology Group Co ltd filed Critical Qingdao Guoshi Technology Group Co ltd
Priority to CN202311092537.0A priority Critical patent/CN117292759A/en
Publication of CN117292759A publication Critical patent/CN117292759A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a protein-ligand affinity evaluation method based on a domestic super computing platform, which comprises the following steps: the construction steps of the domestic super-computing production environment comprise: compiling a dependency library required by the deep learning model operation, and completing the frame configuration of SWPyTorch; the design and implementation steps of the deep learning model comprise: constructing a deep learning model based on a protein-ligand affinity evaluation data set, implementing model implementation on the deep learning model based on a PyTorch frame under an X86 platform, implementing model transplantation after the model implementation is completed, and transplanting the deep learning model to a domestic super computing platform; the parallel optimization step of the deep learning model comprises the following steps: based on a domestic super computing platform, optimizing the deep learning model from data parallelism, computation parallelism, communication parallelism, operator library optimization and SWPyTorch multi-node parallelism; the job submitting and running steps include: and configuring computing node resources of the domestic super computing platform, activating the dependent environment of the domestic super computing platform, and submitting operation jobs.

Description

Protein-ligand affinity evaluation method based on domestic super-computing platform
Technical Field
The invention relates to the technical field of high-performance calculation, in particular to a protein-ligand affinity evaluation method based on a domestic super-computing platform.
Background
The super computer is an important mark for measuring the national science and technology development level and the comprehensive national force, has the characteristics of strong function, high operation speed and large storage capacity, supports the calculation of large-scale complex application subjects, and can be widely applied to the medical field and the high-tech field. In recent years, domestic supercomputers represented by "Shenwei Taihu light" have been rapidly developed, and "Shenwei Taihu light" is a supercomputer whose performance exceeds 100 pflips for the first time in the world.
The supercomputer plays an important role in the medical field, particularly in the field of drug virtual screening, wherein protein-ligand affinity evaluation is a key stage in the drug virtual screening and drug development process, and accurate and efficient protein-ligand affinity evaluation can greatly reduce the development cycle and development cost of drugs. The existing protein-ligand affinity evaluation method based on supercomputer relies on the Intel X86 instruction set architecture in hardware, and adopts molecular docking software developed by European universities, research institutions or companies in software.
Therefore, there is a need to develop a protein-ligand affinity evaluation method based on a domestic super computing platform.
Disclosure of Invention
The invention provides a protein-ligand affinity evaluation method based on a domestic super-computing platform, which aims to solve the problem that the existing protein-ligand affinity evaluation method based on the super-computing platform cannot support large-scale protein-ligand affinity evaluation, and builds a production environment of the protein-ligand affinity evaluation method based on the domestic super-computing platform, thereby realizing the design, transplantation and optimization of a model of the protein-ligand affinity evaluation based on the deep learning technology under the domestic super-computing platform.
In order to achieve the above object, the present invention provides a protein-ligand affinity evaluation method based on a domestic super computing platform, comprising:
the construction steps of the domestic super-computing production environment comprise: compiling a dependency library required by the deep learning model operation, and completing the frame configuration of SWPyTorch;
the design and implementation steps of the deep learning model comprise: constructing a deep learning model based on a protein-ligand affinity evaluation data set, implementing model implementation on the deep learning model based on a PyTorch frame under an X86 platform, implementing model transplantation after the model implementation is completed, and transplanting the deep learning model to a domestic super computing platform;
the parallel optimization step of the deep learning model comprises the following steps: based on a domestic super computing platform, optimizing the deep learning model from data parallelism, computation parallelism, communication parallelism, operator library optimization and SWPyTorch multi-node parallelism;
the job submitting and running steps include: and configuring computing node resources of the domestic super computing platform, activating the dependent environment of the domestic super computing platform, and submitting operation jobs.
Further, the deep learning model adopts a model based on a Bi-LSTM neural network fusion attribute mechanism.
Further, the large-scale protein-ligand affinity evaluation dataset comprises 3420 thousands of protein-ligand interaction data, mainly comprising protein-compound interaction data and protein-protein interaction data, and also comprises structural sequence data of proteins and compounds, wherein the affinity characteristics of the structural sequence data comprise pharmacophores, molecular frameworks, hydrophobic groups, water solubility and fat solubility.
Further, the model structure based on the Bi-LSTM neural network fusion technology mechanism comprises:
after inputting structural sequence data of proteins and compounds, word2vec vectorization processing is carried out to obtain word embedding vectors;
inputting the word embedded vector into a convolution layer for multiple convolution operations to obtain a feature sequence;
inputting the characteristic sequence into a Bi-LSTM neural network structure, and taking characteristic information output by the Bi-LSTM neural network structure as input of an attribute mechanism layer;
the characteristic information is processed by an attribute mechanism layer, and the weight of the characteristic information is adjusted to obtain text representation covering the special structural characteristics of the molecule;
integrating the text representations to obtain comprehensive text representations, and taking the comprehensive text representations as the input of the Softmax classifier;
after the affinity between the protein and the ligand is classified by the Softmax classifier, the affinity evaluation result between the protein and the ligand is obtained.
Further, the dependency library includes 13 Python libraries, including: torch, torchtext, sklearn, django, fire, os, numpy, random, json, jieba, collections, sys and opt.
Further, the building step of the domestic super computing production environment further comprises the steps of loading SWPyTorch and SWPython to a specified directory and configuring environment variables.
Further, the parallel optimization step of the deep learning model further comprises compiling the related main program, main kernel program and auxiliary kernel program in sequence through a sw9gcc compiler to finally generate an executable program, and the device adopts a CPU when the model is compiled.
Further, the parallel optimization step of the deep learning model specifically includes:
data parallelism, including setting the minimatch setting space capacity of each computing node to 64K;
computing parallelism, including master-slave core operation optimization and multi-node parallel optimization; the master-slave core operation optimization is to add a cgsp 64 instruction to automatically call the slave core resource when the super-computing cluster job is submitted;
the multi-node parallel optimization realizes the distributed data parallel of the model by calling an algorithm library, and realizes the distributed data parallel operation of the multi-node by adjusting a job submitting command;
the communication is parallel, namely, the data transmission between the master core and the slave core is optimized, and the slave core register is adopted to complete the flow of data from the memory to the computing unit through the access of the three-level REG-LDM-MEM memory hierarchical structure;
the operator library optimization mainly adopts a multithreading operation optimization method, and multithreading operation and code blocks are added in a key matrix multiplication operation link, so that the optimization acceleration of matrix multiplication and deep learning operator library is realized;
SWPyTorr multi-node parallelism, firstly loading a distributed data parallel library in a SWPyTorr framework; model parallelism is carried out, and the model parallelism is that the model is copied to a plurality of CPUs after the model is acquired; performing data parallelism, wherein the data parallelism comprises the following steps: acquiring the total number of CPUs in distributed data parallelism, segmenting a training set according to the total number of CPUs, distributing data of the training set in parallel, distributing test set data in parallel, carrying out distributed loading on the training set data, and carrying out distributed loading on the test set data; and setting a communication mode, initializing a process group and setting a back-end communication mode as mpi.
Further, the job submitting and running steps specifically include:
the computing node resources comprise computing nodes, CPU core numbers and storage sizes;
the dependent environment for activating the domestic super computing platform comprises environment activation on SWPyThon and SWPyTorch and environment variable setting;
the job submitting and running is performed by submitting the bsub command to a plurality of computing nodes to run the completion model distributed parallel training, and the bsub command is used for completing model prediction.
Further, the job submission and execution steps further include increasing training performance of the model distributed parallel training by 64 slave cores per node.
Compared with the prior art, the invention has the advantages and positive effects that: the invention builds a production environment of a protein-ligand affinity evaluation method by taking a domestic super computing platform as a support, and the core technology is not limited by foreign manufacturers and has completely independent intellectual property rights; the invention realizes the design, transplantation and optimization of a protein-ligand affinity evaluation model based on a deep learning technology under a domestic super computing platform, accelerates the training and prediction speed of a machine learning model, and improves the accuracy and efficiency of large-scale protein-ligand affinity evaluation.
Drawings
FIG. 1 is a step schematic diagram of a method for transplanting a deep learning model based on a domestic super computing platform;
FIG. 2 is a diagram of a model structure based on a Bi-LSTM neural network fusion mechanism of the present invention;
FIG. 3 is a directory structure of the invention after model implementation based on the PyTorch framework under the X86 platform;
FIG. 4 is a frame diagram of a computer device according to an embodiment of the invention;
in the above figures:
s1, building a domestic super-computing production environment; s2, designing and realizing a deep learning model; s3, parallel optimization of the deep learning model; s4, submitting and operating the operation; s11, compiling a dependency library; s12, configuring an SWPyTorch framework; s21, constructing a data set; s22, constructing a deep learning model; s23, realizing a model; s24, model transplanting; s31, data parallelism; s32, parallel calculation; s33, communication is parallel; s34, operator library optimization; s35, SWPyTorch multi-node parallelization; s41, configuring computing node resources; s42, activating a dependent environment; s43, submitting and operating the job; 81. a processor; 82. a memory; 83. a communication interface; 80. a bus.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the description of the present application, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate description of the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application. The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.
Embodiment one:
fig. 1 is a schematic step diagram of a method for transplanting a deep learning model based on a domestic super computing platform, and as shown in fig. 1, this embodiment discloses a specific implementation manner of a method for transplanting a deep learning model based on a domestic super computing platform.
Specifically, the method disclosed in this embodiment mainly includes the following steps:
step S1: the construction of the domestic super-computing production environment comprises the following steps: compiling a dependency library required by the deep learning model operation, and completing the frame configuration of SWPyTorch;
specifically, the step S1 specifically includes the steps of:
step S11: the method comprises the steps of compiling dependency libraries, wherein 13 Python libraries are needed for realizing the deep learning model, the Python libraries and version information thereof are shown in the following table 1, and other Python libraries except os, random, collections, sys, opt and the like are needed to be installed and compiled. In addition, the version of Python is 3.5. Based on the homemade super calculation, the invention carries out homemade transplanting on frames such as Python, pyTorch and the like so as to meet the environmental requirements required by the model, the transplanted Python version is 3.5, and the PyTorr frame version is 1.5.
Dependency library Version number
torch ≥1.1.0
torchtext ≥0.4.0
sklearn ≥0.20.3
django ≥3.0.1
fire ≥0.2.1
os **
numpy ≥1.16.2
random **
json ≥2.0.9
jieba ≥0.39
collections **
sys **
opt **
TABLE 1
Step S12: the SWPyTorr framework is configured, SWPytorch is a grafted version of Pytorch on the homemade supercomputer, and is a classical multidimensional matrix data operation tensor library which is widely applied in machine learning and other mathematics intensive applications. In theory, SWPytorch supports all X86 architecture common models, and the tests determine that the converged models are mainly: alexNet, restNet50, vgg16, vgg19, leNet and other image recognition models; LSTM, GRU, transformer, etc.; fasterRCNN, YOLOv3, etc. The SWPyTorch framework supports distributed data parallelism, and after the deep learning model is transplanted to the home-made super computing, the distributed data parallelism and the model parallelism under multiple nodes can be realized by changing source codes, so that the efficiency of functions such as training and prediction of the model is improved.
The SWPyTorch framework used in the invention is compiled and transplanted from the PyTorch framework, and the corresponding domestic operation is finished in the early stage. In the framework configuration process, SWPyTorr and SWPython need to be loaded into a specified directory, and relevant environment variables are configured, so that the configuration work of the SWPyTorr framework is completed.
Step S2: the design and implementation of the deep learning model comprises the following steps: constructing a deep learning model based on a protein-ligand affinity evaluation data set, implementing model implementation on the deep learning model based on a PyTorch frame under an X86 platform, implementing model transplantation after the model implementation is completed, and transplanting the deep learning model to a domestic super computing platform;
specifically, the step S2 specifically includes the steps of:
step S21: construction of data sets the present invention constructs a large scale protein-compound affinity evaluation data set (LPCBDataSet). In data volume, the LPCBDataSet contains 3420 thousands of protein-compound interaction data. Data components mainly cover: protein-compound interaction data, protein-protein interaction data. In the form of data, the information of proteins and compounds is structural sequence data, and certain specific sequence structures often contain active characteristics such as pharmacophores, molecular frameworks, hydrophobic groups, water solubility, fat solubility and the like, and the active characteristics often take key roles in molecular drug formation and drug molecular activity.
Step S22: because the protein and compound information covered by the LPCBDataSet data set is structural sequence data, in some embodiments, the deep learning model selects an addition mechanism based on a bidirectional multilayer LSTM model, so that the active characteristics in the molecular structure sequence are better utilized, and the performance of the deep learning model is improved.
As shown in FIG. 2, the deep learning model constructed by the invention is a model based on a Bi-LSTM neural network fusion intent mechanism. The model structure based on the Bi-LSTM neural network fusion technology mechanism comprises the following steps: after inputting structural sequence data of proteins and compounds, word2vec vectorization processing is carried out to obtain word embedding vectors; inputting the word embedded vector into a convolution layer for multiple convolution operations to obtain a feature sequence; inputting the characteristic sequence into a Bi-LSTM neural network structure, and taking characteristic information output by the Bi-LSTM neural network structure as input of an attribute mechanism layer; the characteristic information is processed by an attribute mechanism layer, and the weight of the characteristic information is adjusted to obtain text representation covering the special structural characteristics of the molecule; integrating the text representations to obtain comprehensive text representations, and taking the comprehensive text representations as the input of the Softmax classifier; after the affinity between the protein and the ligand is classified by the Softmax classifier, the affinity evaluation result between the protein and the ligand is obtained.
Step S23: the model implementation is based on a PyTorch framework under an X86 platform, the deep learning model is implemented, the directory structure after implementation is shown in figure 3, wherein a data folder stores a data set, a run folder stores a model result file, under a src folder, data set. Py is responsible for loading the data set and preprocessing the data, model. Py is a bidirectional LSTM+attention model file, metrics. Py is a model performance measurement index file, and main. Py is a main file of an item.
Step S24: and (3) model transplanting, namely transplanting the deep learning model to a domestic super computing platform after the completion of debugging and implementation under an X86 architecture. On domestic supercomputer platforms, the PyTorrch framework has been migrated (i.e., SWPyTorrch framework), and other dependent libraries have also been migrated successfully. Therefore, the invention only needs to transplant the deep learning model to the domestic super computing platform. In the Shenwei environment, the related main program, main kernel program and slave kernel program are compiled in turn by a sw9gcc compiler, and finally an executable program is generated.
For the deep learning model in the invention, because the deep learning model is based on the python program, the environment dependency library is only required to be compiled successfully in sequence. In addition, the domestic super computing does not support GPU acceleration, and the device adopts a CPU when model coding is carried out. The transplanting work of the deep learning model is completed.
Step S3: the parallel optimization step of the deep learning model comprises the following steps: based on a domestic super computing platform, optimizing the deep learning model from data parallelism, computation parallelism, communication parallelism, operator library optimization and SWPyTorch multi-node parallelism;
specifically, the step S3 specifically includes the steps of:
step S31: and (3) data parallelism, namely partitioning training data. The size of the mini-batch cannot be infinitely enlarged, and increasing the size of the mini-batch with parallel data can reduce generalization of the neural network. Meanwhile, in order to ensure that each computing node has enough computing tasks to exert the computing capacity of the many-core architecture, the mini-batch of each node cannot be too small. Each slave core of the Shenwei many-core processor has a block of high-speed local data storage space LDM, which has a total space capacity of 256KB. The invention sets mini-batch to 64K in combination with the size of the slave core cache.
Step S32: and the computation is parallel, including master-slave core operation optimization and multi-node parallel optimization. The master-slave core operation optimization is mainly based on a domestic super-computing many-core architecture, each node is provided with 1 master core and 64 slave cores, the slave cores are important computing resources, and the efficiency of model training can be obviously improved by fully utilizing the slave core resources. According to the invention, the full use of the secondary core resources is realized by adding the cgsp 64 instruction when the super-computing cluster job is submitted, and after the instruction is added, the secondary core resources can be automatically used by an operation mechanism of the domestic many-core architecture, so that the acceleration of the computing task is realized. The multi-node parallel optimization realizes multi-node distributed model training on the basis of single-node operation, and the detail aspect of multi-node parallel optimization mainly covers 2 aspects: in terms of source codes, distributed data parallelism of the model is realized by calling algorithm libraries such as torch.distributed, torch.nn.parallel, torch.utes.data.distributed and the like; in terms of job task submission, distributed data parallel operations for multiple nodes are implemented by adjusting job submission commands, in some embodiments, taking the following instructions as an example:
bsub-I-akernel-b-o out.log-q q_swhfnl-node 26917-26918-N 10-cgsp 64-ro_size 256-share_size 11000-mpecg 64-cache_size 0python3 main.py-train
the instruction submits a calculation task to a q_swhfnl queue, and 10 nodes between 26917-26918 are adopted to perform parallel operation.
Step S33: and the communication is parallel, namely, based on the optimization of data transmission between a master core and a slave core in a Shenwei environment, the slave core register is adopted to access and store the flow of data from a memory to a computing unit through a three-level REG-LDM-MEM memory hierarchical structure, the data of A and B are loaded to the LDM in a DMA mode, and when each slave core calculates a C small block, the access to A and B is converted from direct access to master memory to access to local LDM.
Step S34: the operator library optimization mainly adopts a multithreading operation optimization method, and multithreading operation and code blocks are added in a key matrix multiplication operation link, so that the optimization acceleration of matrix multiplication and deep learning operator library is realized, and in some embodiments, the added code blocks are as follows:
#prama omp parallel for num_threads(CORE_NUM)
step S35: SWPyTorr multi-node parallelism, firstly loading a distributed data parallel library in a SWPyTorr framework; model parallelism is carried out, and the model parallelism is that the model is copied to a plurality of CPUs after the model is acquired; performing data parallelism, wherein the data parallelism comprises the following steps: acquiring the total number of CPUs in distributed data parallelism, segmenting a training set according to the total number of CPUs, distributing data of the training set in parallel, distributing test set data in parallel, carrying out distributed loading on the training set data, and carrying out distributed loading on the test set data; and setting a communication mode, initializing a process group and setting a back-end communication mode as mpi.
In some examples, a specific implementation of swpyresch multinode parallelism is as follows:
step S351: the distributed data parallel library loaded into the SWPyTorch framework is instructed as follows:
import torch.distributed as dist
from torch.utils.data.distributed import DistributedSampler
from torch.nn.parallel import DistributedDataParallel
step S352: model parallelism is performed:
obtaining a custom model
Model=model () # acquisition Model
Copying a model to multiple CPUs
model=torch.nn.parallel.distributed dataparallel # -distributed data parallelism
Step S353: performing data parallelism
Obtaining CPU total in distributed data parallelism
size=dist.get_world_size()
Dividing training set according to total number of CPU
bsz=int(batch_size/size)
Training set data parallel distribution
train_dataset=torch.utils.data.distributed.DistributedSampler(train_dataset)
Test set data parallel distribution
test_dataset=torch.utils.data.distributed.DistributedSampler(test_dataset)
Training set data distributed loading
train_loader=DataLoader(train_dataset,batch_size=bsz,shuffe=(train_dataset is None),sampler=train_dataset)
Test set data distributed loading
test_loader=DataLoader(train_dataset,batch_size=bsz,shuffe=(train_dataset is None),sampler=train_dataset)
Step S354: communication mode setting
Initializing a process group and setting a back-end communication mode as mpi
dist.init_process_group(backend='mpi')
Step S4: job submission and execution, comprising: configuring computing node resources of the domestic super computing platform, activating the dependent environment of the domestic super computing platform, and submitting operation jobs.
Specifically, the step S4 specifically includes the steps of:
and S41, configuring computing node resources. And configuring the computing node, the CPU core number and the storage size according to the actual demands of the tasks.
Step S42, activating dependent environments, including environment activation on SWPyThon and SWPyTorch and environment variable setting, specifically comprising the following steps:
step S421: SWPython Environment activation
source/usr/sw/swpython/setenv-swpython
Step S422: SWPyTorch environmental activation
./usr/sw/swpython/setenv-torch
Step S423: environment variable settings
export
LD_LIBRARY_PATH=$dynamic_mpi_lib_path:$LD_LIBRARY_PATH
export LD_BIND_NOW=1
export MPIR_CVAR_ASYNC_PROGRESS=1
Step S43, job submitting operation, namely completing model distributed parallel training by submitting a bsub command to a plurality of computing nodes and completing model prediction by using the bsub command, specifically comprising the following steps:
step S431 model distributed parallel training
Submitting to a plurality of computing nodes to run using a bsub command:
bsub-I-akernel-b-o out.log-q q_swhfnl-node 26917-26918-N 10-cgsp 64-ro_size 256-share_size 11000-mpecg 6-cache_size 0python3 main.py-train
step S432 model prediction
Model prediction using the bsub command completion model:
bsub-b-I-o out.log-akernel-q q_sw_expr-shared-n 1-cgsp 64-share_size11000python3 main.py-predict
further, the job submission and execution steps further include increasing training performance of the model distributed parallel training by 64 slave cores per node. The more node resources are used in model training, the higher the training performance of the model is, and the 64 slave cores of each node are fully used to achieve the effect of training acceleration. However, the buffering of the master core and the slave core, and the LDM communication between the master core and the slave core have capacity limitations, and corresponding optimization in the model code is required. The model performance is mainly influenced by the data set scale and the model structure, and under the condition that the data set scale is fixed, the model performance is higher by adopting a pre-training model.
Embodiment two:
referring to FIG. 4, this embodiment discloses a specific implementation of a computer device. The computer device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may include a Central Processing Unit (CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, solid state Drive (Solid State Drive, SSD), flash memory, optical Disk, magneto-optical Disk, tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In a particular embodiment, the Memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (Electrically Erasable Programmable Read-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (Electrically Alterable Read-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be Static Random-Access Memory (SRAM) or dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory FPMDRAM), extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory EDODRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 implements any of the deep learning model migration methods of the above embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected to each other through the bus 80 and perform communication with each other.
The communication interface 83 is used to implement communications between various modules, devices, units, and/or units in embodiments of the present application.
Communication port 83 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both, coupling components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Add Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (Accelerated Graphics Port), abbreviated AGP, or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated MCa) Bus, a peripheral component interconnect (Peripheral Component Interconnect, abbreviated PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (Seria l Advanced Technology Attachment, abbreviated SATA) Bus, a video electronics standards association local (Video Electronics Standards Association Local Bus, abbreviated VLB) Bus, or other suitable Bus, or a combination of two or more of the foregoing. Bus 80 may include one or more buses, where appropriate. Although embodiments of the present application describe and illustrate a particular bus, the present application contemplates any suitable bus or interconnect.
In addition, in combination with the deep learning model migration method in the above embodiment, the embodiment of the application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the deep learning model migration methods of the above embodiments.
The present invention is not limited to the above-mentioned embodiments, and any equivalent embodiments which can be changed or modified by the technical content disclosed above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above-mentioned embodiments according to the technical substance of the present invention without departing from the technical content of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (10)

1. A protein-ligand affinity evaluation method based on a domestic super computing platform, which is characterized by comprising the following steps:
the construction steps of the domestic super-computing production environment comprise: compiling a dependency library required by the deep learning model operation, and completing the frame configuration of SWPyTorch;
the design and implementation steps of the deep learning model comprise: constructing a deep learning model based on a protein-ligand affinity evaluation data set, implementing model implementation on the deep learning model based on a PyTorch frame under an X86 platform, implementing model transplantation after the model implementation, and transplanting the deep learning model to a domestic super computing platform;
the parallel optimization step of the deep learning model comprises the following steps: based on the domestic super computing platform, optimizing the deep learning model from data parallelism, computation parallelism, communication parallelism, operator library optimization and SWPyTorch multi-node parallelism;
the job submitting and running steps include: configuring computing node resources of the domestic super computing platform, activating the dependent environment of the domestic super computing platform, and submitting operation jobs.
2. The protein-ligand affinity evaluation method based on a domestic super computing platform according to claim 1, wherein the deep learning model adopts a model based on a Bi-LSTM neural network fusion Attention mechanism.
3. The method of claim 2, wherein the large-scale protein-ligand affinity evaluation dataset comprises 3420 kiloproteins-ligand interaction data, including mainly protein-compound interaction data and protein-protein interaction data, and further comprising structural sequence data of proteins and compounds, wherein the affinity characteristics of the structural sequence data include pharmacophores, molecular backbones, hydrophobic groups, water solubility, and lipid solubility.
4. The method for evaluating protein-ligand affinity based on the domestic super computing platform according to claim 3, wherein the model structure based on the Bi-LSTM neural network fusion mechanism comprises:
after inputting structural sequence data of proteins and compounds, word2vec vectorization processing is carried out to obtain word embedding vectors;
inputting the word embedded vector into a convolution layer for convolution operation for a plurality of times to obtain a characteristic sequence;
inputting the characteristic sequence into a Bi-LSTM neural network structure, wherein characteristic information output by the Bi-LSTM neural network structure is used as input of an attention mechanism layer;
the characteristic information is processed by the attribute mechanism layer, and the weight of the characteristic information is adjusted to obtain text representation covering the special structural characteristics of the molecule;
integrating the text representations to obtain comprehensive text representations, and taking the comprehensive text representations as input of a Softmax classifier;
and after the affinity between the protein and the ligand is classified by the Softmax classifier, obtaining an affinity evaluation result between the protein and the ligand.
5. The method for evaluating protein-ligand affinity based on a domestic super computing platform according to claim 1, wherein the dependent libraries comprise 13 Python libraries, comprising: torch, torchtext, sklearn, django, fire, os, numpy, random, json, jieba, collections, sys and opt.
6. The method for evaluating protein-ligand affinity based on a domestic super computing platform according to claim 1, wherein the step of constructing a domestic super computing production environment further comprises loading SWPyTorch and SWPython into a specified directory and configuring environment variables.
7. The protein-ligand affinity evaluation method based on a domestic super computing platform according to claim 1, wherein the parallel optimization step of the deep learning model further comprises compiling the related main program, main kernel program and slave kernel program in sequence through a sw9gcc compiler to finally generate an executable program, and the device adopts a CPU when performing model compiling.
8. The protein-ligand affinity evaluation method based on a domestic super computing platform according to claim 1, wherein the parallel optimization step of the deep learning model specifically comprises:
the data parallelism comprises the steps of setting the minimatch setting space capacity of each computing node to 64K;
the computation parallelism comprises master-slave core operation optimization and multi-node parallel optimization; the master-slave core operation optimization is to add a cgsp 64 instruction to automatically call the slave core resource when the super-computing cluster job is submitted;
the multi-node parallel optimization realizes the distributed data parallel of the model by calling an algorithm library, and realizes the distributed data parallel operation of multiple nodes by adjusting a job submitting command;
the communication is parallel, namely, the data transmission between the master core and the slave core is optimized, and the slave core register is adopted to complete the flow of data from the memory to the computing unit through the three-level REG-LDM-MEM memory hierarchical structure access;
the operator library optimization mainly adopts a multithreading operation optimization method, and multithreading operation and code blocks are added in a key matrix multiplication operation link, so that the optimization acceleration of matrix multiplication and deep learning operator libraries is realized;
the SWPyTorch multi-node parallelism is realized by firstly loading a distributed data parallel library in a SWPyTorch framework; model parallelism is carried out, and the model parallelism is that the model is copied to a plurality of CPUs after the model is acquired; performing data parallelism, wherein the data parallelism comprises the following steps: acquiring the total number of CPUs in distributed data parallelism, segmenting a training set according to the total number of CPUs, distributing data of the training set in parallel, distributing test set data in parallel, carrying out distributed loading on the training set data, and carrying out distributed loading on the test set data; and setting a communication mode, initializing a process group and setting a back-end communication mode as mpi.
9. The method for evaluating protein-ligand affinity based on a domestic super computing platform according to claim 1, wherein the job submitting and running steps specifically comprise:
the computing node resources comprise computing nodes, CPU core numbers and storage sizes;
the dependent environment for activating the domestic super computing platform comprises environment activation on SWPyThon and SWPyTorch and environment variable setting;
the job submitting and running is performed by submitting the bsub command to a plurality of computing nodes to run the completion model distributed parallel training, and performing model prediction by using the bsub command.
10. The method of claim 9, wherein the job submission and execution step further comprises increasing training performance of model distributed parallel training by 64 slave cores per node.
CN202311092537.0A 2023-08-28 2023-08-28 Protein-ligand affinity evaluation method based on domestic super-computing platform Pending CN117292759A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311092537.0A CN117292759A (en) 2023-08-28 2023-08-28 Protein-ligand affinity evaluation method based on domestic super-computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311092537.0A CN117292759A (en) 2023-08-28 2023-08-28 Protein-ligand affinity evaluation method based on domestic super-computing platform

Publications (1)

Publication Number Publication Date
CN117292759A true CN117292759A (en) 2023-12-26

Family

ID=89239958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311092537.0A Pending CN117292759A (en) 2023-08-28 2023-08-28 Protein-ligand affinity evaluation method based on domestic super-computing platform

Country Status (1)

Country Link
CN (1) CN117292759A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831640A (en) * 2024-03-05 2024-04-05 青岛国实科技集团有限公司 Medical industry digital twin platform based on super calculation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831640A (en) * 2024-03-05 2024-04-05 青岛国实科技集团有限公司 Medical industry digital twin platform based on super calculation
CN117831640B (en) * 2024-03-05 2024-05-14 青岛国实科技集团有限公司 Medical industry digital twin platform based on super calculation

Similar Documents

Publication Publication Date Title
Ben-Nun et al. A modular benchmarking infrastructure for high-performance and reproducible deep learning
Gao et al. Estimating GPU memory consumption of deep learning models
Hou et al. Auto-tuning strategies for parallelizing sparse matrix-vector (spmv) multiplication on multi-and many-core processors
Calore et al. Performance and portability of accelerated lattice Boltzmann applications with OpenACC
Barrett et al. Navigating an evolutionary fast path to exascale
CN117292759A (en) Protein-ligand affinity evaluation method based on domestic super-computing platform
Reguly et al. Under the hood of sycl–an initial performance analysis with an unstructured-mesh cfd application
US9280382B1 (en) Parallel processing of multidimensional arrays
Thies et al. PHIST: a pipelined, hybrid-parallel iterative solver toolkit
Barrett et al. Implementing a portable multi-threaded graph library: The MTGL on Qthreads
Halbiniak et al. Exploration of OpenCL heterogeneous programming for porting solidification modeling to CPU‐GPU platforms
Wang et al. Paralleljs: An execution framework for javascript on heterogeneous systems
Mehta et al. Evaluating performance portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs using the roofline methodology
Demeure et al. High-level GPU code: a case study examining JAX and OpenMP.
Martínez del Amor et al. Simulation of recognizer P systems by using manycore GPUs
CN114139333A (en) Method for simulating contact
Al-Mouhamed et al. SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPU
Wahlgren Using GPU-aware message passing to accelerate high-fidelity fluid simulations
Soi et al. An implicitly parallel meshfree solver in regent
Pan et al. Heterogeneous multi-core optimization of mumps solver and its application
Li et al. Building a domain-specific compiler for emerging processors with a reusable approach
Lang Enhancing R with advanced compilation tools and methods
Xue et al. Multi-GPU performance optimization of a CFD code using OpenACC on different platforms
US20240176663A1 (en) Tensor map cache storage
Carvalho et al. Towards a transprecision polymorphic floating-point unit for mixed-precision computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination