CN111243660B

CN111243660B - Parallel marine drug screening method based on heterogeneous many-core architecture

Info

Publication number: CN111243660B
Application number: CN202010010007.7A
Authority: CN
Inventors: 刘昊; 王新茹; 魏志强; 张志雨
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2021-03-12
Anticipated expiration: 2040-01-06
Also published as: CN111243660A

Abstract

The invention relates to a parallel marine drug screening method based on an isomeric multi-core framework, belonging to the technical field of drug screening. The method provided by the invention utilizes a high-performance computing framework of the heterogeneous many-core, so that the drug screening is more efficient and accurate, and the finally screened molecules have simple structures and good effect on target spots.

Description

Parallel marine drug screening method based on heterogeneous many-core architecture

Technical Field

The invention belongs to the technical field of drug screening, and particularly relates to a parallel marine drug screening method based on a heterogeneous many-core architecture.

Background

The marine medicine is prepared with marine organism and marine microbe as medicine source and through modern technological process. Most of the existing marine drugs belong to the category of natural drugs, namely active ingredients directly extracted from marine organisms, and some active ingredients of the marine organisms are obtained by artificial synthesis or biotechnology conversion. The total amount of marine medicinal resources is rich, which is the strategic importance of high-quality development of pharmaceutical industry and has become a biological medicine focus resource of global attention. Since the 40 s of the 20 th century, a series of results are obtained in international marine drug research, nearly 4 million marine compounds are found, more than half of the marine compounds have activity, and 13 marine drugs on the market are successfully developed. The screening of marine natural products aims at cancer, cardiovascular and cerebrovascular diseases, virus infection (AIDS and the like) and other difficult and complicated diseases which seriously harm human health. However, at present, few marine drugs are really used for clinical research. The research and development of marine drugs are helpful for improving the research and development capacity of new drugs in China, and around the serious diseases such as hearts, cancers and the like which seriously threaten the healthy life of human beings in the current society, a group of marine innovative drugs which are innovative and meet the market and clinical requirements are promoted, so that a new drug source support is provided for improving the international competitiveness of China in the research and development field of the innovative drugs, solving the difficult miscellaneous diseases which harm the life health of human beings, saving the lives and improving the life happiness index of human beings, and the method has great social benefits.

Since marine natural products growing in special environments such as high salt and high pressure have complex structures and are difficult to collect and chemically synthesize, the docking calculation process of virtual screening is very important.

Aiming at the complex conformation of the marine compound, the conformation search part of virtual screening is completed by adopting an isomeric many-nucleus framework. The system adopts a system architecture of a main processor and a coprocessor, wherein the main processor (a main core) is responsible for processing complex logic control tasks, and the coprocessor (a secondary core) is responsible for processing large-scale data parallel tasks with high computation density and simple logic branches, and the main processor and the secondary core cooperate to provide a high-efficiency computing platform for specific application. The heterogeneous many-core processor integrates a general processor core with a control management function and a large number of simplified computing cores for accelerating computing in the same chip, can realize higher performance power consumption ratio and computing density, and makes up the defects of homogeneous many-core.

To achieve high throughput screening of marine drugs, MPI parallel technology is used. MPI is a cross-language communication protocol for compiling parallel computing, supporting point-to-point and broadcasting. MPI is an information passing application program interface that includes protocols and semantic descriptions that specify how it performs its features in various implementations. The goals of MPI are high performance, large scale and portability. MPI is based on parallelism of processes that have independent virtual address spaces and processor scheduling, and execute independently of each other. MPI is designed to support fleet systems connected by a network and to enable communication through message passing.

Aiming at the particularity of the structure of the marine medicine, in order to improve the virtual screening precision of the marine medicine, parameters of a scoring function in the screening process are optimized by using a machine learning algorithm. Protein-ligand docking is a computational method in virtual drug screening to predict the most likely position, orientation and concept of ligand binding to protein, and docking methods that predict the binding free energy of ligand to protein in different ways can be divided into three categories: 1. molecular force field based methods; 2. a method based on empirical regression parameters; 3. a method of scoring function based on experience. Empirical scoring functions calculate the affinity of protein-ligand binding by summarizing the contributions of many individual terms. Each term generally represents an important energy factor in protein-ligand binding. Each of these functions involves several parameters, which can be modified to improve the prediction. Finally, each term is weighted before summing to the final predicted binding affinity.

Although the calculation speed and the calculation accuracy of the current virtual drug screening program are greatly improved, the virtual drug screening program still has many defects.

1. The conformational search of conventional virtual drug screening is typically run sequentially deployed on a single processor and therefore is also time consuming throughout the screening process.

2. The I/O of the conventional molecular docking method generally reads a target data file, a small molecule data file and a configuration file, and then generates a result file including scores by docking, i.e., the conventional virtual screening technology is mostly serial. But virtual screening facing high throughput is computationally inefficient and places greater storage and I/O pressure on the system.

3. The scoring function for virtual drug screening is generally trained by regression analysis of the experiment and prediction of binding affinity for the selected data set. This approach ensures good scoring capabilities, but not necessarily good docking capabilities.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a parallel high-precision marine drug screening method based on a heterogeneous many-core architecture, which utilizes the high-performance computing architecture of the heterogeneous many-core architecture to enable drug screening to be more efficient and accurate through I/O optimization, parallelization of a virtual screening process and optimization of a scoring function. The marine compound is adopted as a ligand data set, which is beneficial to accelerating the research and development progress of marine drugs.

In order to solve the technical problems, the invention adopts the technical scheme that:

a parallel high-precision marine drug screening method based on an isomeric multi-core framework comprises the steps of constructing a heterogeneous multi-core high-performance computing framework, inputting and outputting and parallelizing a drug high-throughput screening program, grading and optimizing a molecular structure;

the high-performance computing architecture for constructing the heterogeneous many cores comprises the following steps: in the virtual screening calculation process, an iterative local search global optimization algorithm is adopted to search the conformation of the ligand after one rotation; calculating the binding free energy of the rotated conformation and a corresponding receptor under the conformation according to a scoring function, adaptively determining the number of steps in the search process according to the docking complexity, and selecting a random conformation to start to perform a plurality of searches; the main control process is put on a main core to run, the conformation search process is put on a secondary core to calculate, each secondary core circularly executes the energy calculation of conformation, the calculation result is finally returned to the main core, then the optimal conformation is screened, and finally a plurality of conformations which are graded before are screened out, so that the internal parallel acceleration of the drug virtual screening program is realized;

further, the rotation includes changes in the position and orientation of the ligand, as well as changes in the twist of actively rotatable bonds and flexible residues in the ligand. The input, the output and the parallelization of the drug high-throughput screening program are as follows: first, an input containing a plurality of ligand information is generated using an automation module, and then a docking calculation of the ligand to the receptor in the input is performed sequentially, producing an output containing a plurality of calculation results. The method combines a plurality of groups of small-capacity inputs of a virtual drug screening program into a group of multi-ligand large-capacity inputs, and combines multi-path operation outputs into a single-path large-operation output; therefore, the I/O pressure of the multi-core batch processing computing system is reduced, and the storage efficiency of the system is improved.

Secondly, an MPI parallel technology is used for realizing multitask parallelization, a main process of the MPI is responsible for reading input of multi-ligand information and sequentially distributing the ligand information to each subprocess, each subprocess receives the ligand information and then carries out an independent virtual screening process, output containing a plurality of calculation results is generated after calculation is finished, and each subprocess continues to apply for ligand molecules to the main process until all the ligand molecules in the list file are calculated;

the scoring is as follows: adopting a marine natural product as a ligand data set, training a scoring function by using a linear regression algorithm, and generating a large number of scoring functions by adjusting weights of terms including Gaussian space attraction, secondary space repulsion, vacuum electrostatic interaction, hydrophobic interaction, hydrogen bond interaction, lenard-jones potential energy, ionic bond interaction energy and pi stacking effect of aromatic rings in the scoring function; in order to train a scoring function, firstly minimizing each crystal structure in the marine natural product small molecule data set, then selecting a ligand conformation with a score ranking at the front from a screening result, and calculating the average RMSD between the ligand posture with the minimum binding free energy and the original structure, wherein the minimum RMSD value is the scoring function with the best scoring capability;

the optimized molecular structure is as follows: aiming at ligand molecules with better scoring results, a method of combining chemical reaction growth fragments with skeleton transition based on electronic isostere replacement is used for optimizing the molecular structure, a lead compound skeleton is modified under the guidance of a receptor protein structure, a main functional group structure reacting with the receptor protein is reserved, a certain atom or a certain group in the ligand molecules is selected as a reaction site, classical chemical reaction is selected for fragment growth, and finally, compounds are enumerated according to protein pocket matching characteristics and optimized. The molecule designed by the method has a simpler structure than that of a lead compound and has better effect on target spots. Skeletal transitions are largely classified into 4 types: heterocyclic substitutions, open or closed loops, peptide mimetics, and topology-based transitions. The framework transition is mainly used for finding candidate compounds, but the framework transition method cannot generally predict the potential of the candidate compounds, so the framework transition is often used in combination with computational simulation and virtual screening; isosteres will have similar size, shape, charge distribution and substitution of equivalents of physicochemical properties, including single atoms or entire groups of atoms, which can result in new compounds with similar biological activity to the parent active ingredient.

Compared with the prior art, the invention has the beneficial effects that:

(1) the invention adopts a high-performance computing framework of heterogeneous many-core, allocates a molecular conformation search process with larger computation amount to the slave core for parallel execution, greatly shortens the search time of the molecular conformation in the docking process, and improves the docking speed.

(2) The invention improves the input and output of the drug screening program, reduces the I/O pressure of a computer system and improves the storage efficiency.

(3) The invention provides a parallel method for virtual screening of drugs. The MPI parallel technology is utilized to encapsulate the virtual drug screening program, and the original serial program is parallelized, so that the calculation efficiency is greatly improved.

(4) The invention integrates the machine learning algorithm on the basis of the original scoring function, ensures that the virtual drug screening program has good scoring capability and docking capability, and improves the accuracy of docking results.

(5) According to the invention, the ligand molecular structure with a better grading result is optimized, so that the molecular structure is simple and the effect on a target is better.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be further described with reference to the accompanying drawings.

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a flow chart of the present invention for optimizing a supercomputing system;

FIG. 3 is a parallel computational diagram of the heterogeneous many-core conformation search of the present invention.

FIG. 4 shows the molecular structure of alkannin derivatives.

Detailed Description

The present invention will be further described with reference to specific embodiments thereof, it being understood that the embodiments described are only a few, and not all, of the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The parallel high-precision marine drug screening method based on the heterogeneous many-core architecture comprises the steps of constructing a high-performance computing architecture of the heterogeneous many-core, inputting, outputting and parallelizing a drug high-throughput screening program, grading and optimizing a molecular structure;

1) constructing a high-performance computing framework of a heterogeneous many-core, and searching the conformation of the ligand after one rotation by adopting an iterative local search global optimization algorithm in the virtual screening computing process; rotation includes changes in the position and orientation of the ligand, as well as changes in the twist of actively rotatable bonds and flexible residues in the ligand. Calculating the binding free energy of the rotated conformation and a corresponding receptor under the conformation according to a scoring function, adaptively determining the step number in the searching process according to the docking complexity, selecting a random conformation to start to search for a plurality of times, and finally screening out a plurality of conformations which are scored before; the main control flow is put on a main core to run, the conformation search process is put on a secondary core to calculate, each secondary core circularly executes the energy calculation of conformation, and finally, the calculation result is returned to the main core, and then the optimal conformation is screened, so that the internal parallel acceleration of the drug virtual screening program is realized;

after the internal parallel acceleration of the virtual drug screening program is realized, namely the parallel acceleration of the main computing process in a single docking task is realized in the last step, on the basis, the multi-task level parallel optimization and the optimization of the input and output structure of the virtual drug screening program are further performed by combining the multi-core high-performance parallel characteristic of the batch processing system.

2) Input-output and parallelization of drug high-throughput screening programs first generate an input comprising a plurality of ligand information using an automation module, and then sequentially perform docking calculations of the ligands and receptors in the input, producing an output comprising a plurality of calculations. The method combines a plurality of groups of small-capacity inputs of a virtual drug screening program into a group of multi-ligand large-capacity inputs, and combines multi-path operation outputs into a single-path large-operation output; therefore, the I/O pressure of the multi-core batch processing computing system is reduced, and the storage efficiency of the system is improved.

the step 1 and the step 2 realize two-stage optimization of the virtual drug screening program, so that the speed and the efficiency of marine drug screening are remarkably improved, and the scoring function of the program is optimized aiming at the particularity of the marine drug structure and further improving the precision and the accuracy of the marine drug screening.

3) Scoring

A large number of scoring functions are generated by adopting marine natural products as a ligand data set, training a scoring function by using a linear regression algorithm and adjusting weights of terms such as Gaussian space attraction, secondary space repulsion, vacuum electrostatic interaction, hydrophobic interaction, hydrogen bond interaction, lenard-jones potential energy, ionic bond interaction energy, pi stacking action of aromatic rings and the like in the scoring function. In order to train the scoring function, each crystal structure in the marine natural product small molecule data set is minimized, and then the average RMSD between the ligand posture with the minimum binding free energy and the original structure is calculated, wherein the minimum RMSD value is the scoring function with the best scoring capability.

4) Optimizing molecular structure

Aiming at ligand molecules with better scoring results, a method of combining chemical reaction growth fragments with skeleton transition based on electronic isostere replacement is used for optimizing the molecular structure, and the skeleton transition is mainly divided into 4 types: heterocyclic substitutions, open or closed loops, peptide mimetics, and topology-based transitions. The framework of the lead compound is modified under the guidance of the structure of the receptor protein, the main functional group structure reacting with the receptor protein is reserved, the framework transition is mainly used for finding candidate compounds, but the potential of the candidate compounds cannot be predicted by the framework transition method, so the framework transition is often used in combination with calculation simulation and virtual screening; the isostere would have a similar size, shape, charge distribution and substitution of equivalents of physicochemical properties, including single atoms or entire groups of atoms, which would produce new compounds with similar biological activity to the parent active ingredient; selecting an atom or a group in a ligand molecule as a reaction site, selecting a classical chemical reaction for fragment growth, and finally enumerating and optimizing compounds according to the protein pocket matching characteristics. The molecule designed by the method has a simpler structure than that of a lead compound and has better effect on target spots.

Example 2

The invention relates to a parallel high-precision marine drug screening method based on an isomeric many-core framework, which comprises the steps of constructing a high-performance computing framework of the isomeric many-core, inputting, outputting and parallelizing a drug high-throughput screening program, grading and optimizing a molecular structure;

1) constructing a high-performance computing framework of a heterogeneous many-core, searching the conformation of a ligand after one rotation by adopting an iterative local search global optimization algorithm, calculating the binding free energy of the conformation and a corresponding receptor under the rotating conformation according to a scoring function, adaptively determining the step number in the searching process according to the docking complexity, and selecting a random conformation to start to execute a plurality of searches. Because the calculation amount in the searching process is huge, aiming at the architectural characteristics of a heterogeneous many-core computer system, the main control process is put on the main core to operate, the conformation searching process is put on the auxiliary cores to calculate, each auxiliary core circularly executes the energy calculation of the conformation, the calculation result is finally returned to the main core, and then the optimal conformation is screened. Finally, screening out a plurality of conformations which are scored at the front; the time of a single virtual screening process is improved by about 40 percent.

2) Input-output and parallelization of drug high-throughput screening programs:

(1) first, an input containing a plurality of ligand information is generated using an automation module, and then a docking calculation of the ligand to the receptor in the input is performed sequentially, producing an output containing a plurality of calculation results. And further performing parallel optimization on the virtual drug screening program on the basis of the step 1). The 30000 marine compound ligand files were divided into 3000 large files, each containing 10 ligand information, which were used as input files for the virtual screening program. The docking calculations of the ligand to the receptor in the file are then performed sequentially, resulting in an output file containing a plurality of calculations.

(2) Secondly, the drug screening program is encapsulated using MPI parallel technology. Parallel calculations were performed by opening up 3000 processes, and docking calculations containing 10 ligand information were performed on average per process. The method specifically comprises the following steps: the main process of MPI reads the ligand list file generated in step 2) and distributes each ligand to each sub-process. After receiving the ligand information, the subprocess performs independent docking calculation with the received receptor. And after the computation is finished, continuing to send a request to the main process until a signal for stopping computation is received (after the ligand list is distributed, the main process sends a signal for stopping computation to the requested sub-process). The running time of the whole program is shortened from the sum of the original 30000 times of virtual screening process time to the maximum time of 10 times of butt joint calculation executed by each process.

3) And scoring, wherein a marine natural product is used as a ligand data set, a linear regression algorithm is used for training a scoring function, and a large number of scoring functions are generated by adjusting weights of terms in the scoring function, such as Gaussian space attraction, secondary space repulsion, vacuum electrostatic interaction, hydrophobic interaction, hydrogen bond interaction, lenard-jones potential energy, ionic bond interaction energy, pi stacking action of aromatic rings and the like. And then calculating the average RMSD between the ligand posture with the minimum binding free energy and the original structure, wherein the minimum RMSD value is the scoring function with the best scoring capability. The test shows that the optimized scoring function has higher accuracy.

4) Optimizing the molecular structure, and selecting the ligand molecular conformation with the score value less than-8 to optimize the molecular structure on the basis of the steps. The skeleton of the lead compound is modified under the guidance of the structure of the receptor protein, and the main functional group structure reacting with the receptor protein is reserved. Selecting an atom or a group in a ligand molecule as a reaction site, selecting a classical chemical reaction for fragment growth, and finally enumerating and optimizing compounds according to the protein pocket matching characteristics.

This implementation optimized the STAT3 inhibitor shikonin using the methods described above. STAT3 inhibitors, both synthetic and naturally occurring, share a common point-structural features that contain a naphthoquinone scaffold, as do shikonin. Firstly, alkannin is docked into a binding site of STAT3, the interaction of the alkannin and the STAT3 is analyzed, and based on the interaction, a plurality of alkannin derivatives with various structures are designed by taking a hydroxyl group on an alkannin branched chain as a starting point and marking an SHK chain as a substitution site. The shikonin derivative is then docked into the binding site of STAT3 again, and the higher scoring molecule is selected to calculate the RMSD value of the conformation and the conformation of the lead compound, and the higher scoring molecule is found to have a smaller RMSD value (i.e. a better scoring function). And selecting molecules with higher scores to carry out synthesis and biological activity evaluation, and finally finding that the newly designed molecules have better inhibitory activity on MDA-MB-231 cells and have potential anti-breast cancer activity. The molecule designed by the method has a simpler structure than that of the lead compound and has better effect on the target. The shikonin derivative has a molecular structural formula shown in figure 4.

In conclusion, the present invention optimizes the molecular structure by using the existing marine compound molecules as ligand data and combining the computer-aided drug design and bioinformatics methods. The invention starts from the existing virtual screening algorithm, utilizes the strong computing power of a heterogeneous many-core architecture system, and reduces the pressure of the system and improves the storage efficiency of the system by optimizing an I/O architecture. By using MPI parallel technology, the parallel efficiency of screening is improved. By optimizing the scoring function of the algorithm and the process of searching for conformation, the accuracy of the docking result is improved, the search time of molecular conformation in the docking process is shortened, and the docking efficiency is improved, so that the effect and efficiency of virtual screening based on the method are improved.

The present invention is not limited to the above examples, and any modifications, substitutions, additions, etc. made within the spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. A parallel high-precision marine drug screening method based on an isomeric many-core framework is characterized by comprising the steps of constructing a heterogeneous many-core high-performance computing framework, inputting and outputting and parallelizing a drug high-throughput screening program, grading and optimizing a molecular structure;

the high-performance computing architecture for constructing the heterogeneous many cores comprises the following steps: in the virtual screening calculation process, an iterative local search global optimization algorithm is adopted to search the conformation of the ligand after one rotation; calculating the binding free energy of the rotated conformation and a corresponding receptor under the conformation according to a scoring function, adaptively determining the number of steps in the searching process according to the docking complexity, and selecting a random conformation to start to search for a plurality of times; the main control process is put on a main core to run, the conformation search process is put on a secondary core to calculate, each secondary core circularly executes the energy calculation of conformation, the calculation result is finally returned to the main core, then the optimal conformation is screened, and finally a plurality of conformations which are graded before are screened out, so that the internal parallel acceleration of the drug virtual screening program is realized;

the input, the output and the parallelization of the drug high-throughput screening program are as follows: firstly, an automation module is used for generating input containing a plurality of ligand information, and then the butt joint calculation of the ligand and the receptor in the input is sequentially executed to generate output containing a plurality of calculation results;

the optimized molecular structure is as follows: aiming at ligand molecules with better scoring results, a method of combining chemical reaction growth fragments with skeleton transition based on electronic isostere replacement is used for optimizing the molecular structure, a lead compound skeleton is modified under the guidance of a receptor protein structure, a main functional group structure reacting with the receptor protein is reserved, a certain atom or a certain group in the ligand molecules is selected as a reaction site, classical chemical reaction is selected for fragment growth, and finally, compounds are enumerated according to protein pocket matching characteristics and optimized.

2. The method for parallel high-precision marine drug screening based on the heterogeneous many-core architecture as claimed in claim 1, wherein the rotation includes changes in the position and direction of the ligand and changes in the torsion values of actively rotatable bonds and flexible residues in the ligand.