CN116504302A - Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry - Google Patents

Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry Download PDF

Info

Publication number
CN116504302A
CN116504302A CN202310736846.0A CN202310736846A CN116504302A CN 116504302 A CN116504302 A CN 116504302A CN 202310736846 A CN202310736846 A CN 202310736846A CN 116504302 A CN116504302 A CN 116504302A
Authority
CN
China
Prior art keywords
hbv
capsid protein
capsid
hepatitis
screening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310736846.0A
Other languages
Chinese (zh)
Other versions
CN116504302B (en
Inventor
克里斯托夫布奇
熊有金
王毅庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310736846.0A priority Critical patent/CN116504302B/en
Publication of CN116504302A publication Critical patent/CN116504302A/en
Application granted granted Critical
Publication of CN116504302B publication Critical patent/CN116504302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C10/00Computational theoretical chemistry, i.e. ICT specially adapted for theoretical aspects of quantum chemistry, molecular mechanics, molecular dynamics or the like
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a novel method for designing and virtually screening a hepatitis B virus capsid assembly regulator from scratch based on a generation model and computational chemistry, which comprises the following steps: predicting HBV full-length wild type core capsid protein dimer structure; generating a candidate small molecule database by using a pre-trained GENTRL generation model; performing skeleton transition screening by using the similarity of the five principle combination structures of the quasi drugs; screening small molecules with excellent binding modes; the free energy of binding between the small molecule and HBV capsid protein is calculated, and capsid assembly regulator with anti-HBV activity is screened. According to the invention, through framework transition, molecular docking and capsid protein stability analysis based on molecular dynamics simulation, effective information HBV capsid assembly regulator is screened, new molecules are effectively generated, candidate molecule chemical space distribution is increased, potential capsid assembly regulator can be more accurately captured through CTD stability analysis, and the discovery speed of lead compounds is obviously accelerated.

Description

Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry
Technical Field
The invention relates to the field of virtual screening of lead compounds, in particular to a novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry.
Background
Chronic hepatitis b virus (Hepatitis B virus, HBV) is an infectious virus, with about 3 hundred million HBV carriers worldwide, and 100 million people each year dying from cirrhosis, hepatocellular carcinoma and its complications due to HBV infection. The existing anti-HBV drugs on the market are mainly interferon and nucleoside analogues, wherein the interferon acts on the transcription process, and the nucleoside analogues act on the reverse transcription process. Both drugs cannot eradicate intracellular cccDNA, so long-term administration is required, interferon is expensive and has side effects, nucleoside analogues are easy to generate drug resistance, the prognosis quality of HBV infection is not ideal, and development of novel anti-HBV drugs is urgently needed.
The HBV capsid core protein monomer is composed of 183 amino acids, comprises two different domains, residues 1-149 are nitrogen end domains, namely NTD,150-183 are carbon end domains, namely CTD, and the CTD is very flexible and plays a plurality of roles in HBV life cycle. CTD has many arginine-rich regions that interact with RNA to initiate capsid assembly, and existing capsid protein structures are partially deleted in CTD, and the mechanism of action of full-length proteins with CAMs has not been explored.
It has been shown that capsid assembly modifiers (Capsid Assembly Modulator, CAMs) act on capsid proteins as novel anti-HBV agents, promoting capsid assembly or abnormal capsid assembly, which, when the virus enters the cell, can accelerate capsid assembly during the virus-to-cell or capsid assembly, exposing DNA to the cytosol for enzymatic degradation, which provides the opportunity to eradicate intracellular cccDNA, thereby radically treating hepatitis b.
Virtual screening is a screening means of accelerating lead compounds combining structure biology and computational chemistry, is based on the drug design guidance thought of ligand structures and receptor structures, and can accelerate screening and discovery of target active molecules based on the existing small molecule data and protein structures. In the prior art, CAMs have been found to be slow by virtue of the known transformation of the backbone derivatization. The predicted binding mode between small molecules and capsid proteins is mainly to carry out molecular docking by using the capsid protein structure of the deletion CTD and the small molecules, and the action mechanism of the small molecules and the capsid proteins is not clear although the binding site is clear, and the docking fraction and the activity of the small molecules have no correlation. Therefore, the prior art has limited capacity of screening CAMs and does not unify and define screening standards, so that the construction of a brand new virtual screening method of potential anti-HBV drugs has great significance.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a novel method for designing and virtually screening the novel hepatitis B virus capsid assembly regulator from the head based on a generation model and computational chemistry, based on a novel mechanism of action of CAMs and HBV capsid proteins, uses a GENTRL molecular generation model to learn the characteristics of the existing target molecules, generates a candidate small molecule database with target attributes, realizes the novel CAMs from the head design and virtually screening through framework transition, molecular docking, molecular dynamics simulation and free energy calculation, and can effectively overcome the technical problems existing in the prior art.
For this purpose, the invention adopts the following specific technical scheme:
novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generative model and computational chemistry, the method comprising the steps of:
s1, constructing a full-length hepatitis B capsid protein structure: acquiring the amino acid sequence of the full-length wild type core capsid protein of the hepatitis B virus, and predicting the dimer structure of the full-length wild type core capsid protein of the HBV;
s2, generating and constructing a candidate small molecule database: training a GENTRL generation model by using the obtained training compound set, and generating a candidate small molecule database by using the pre-trained GENTRL generation model;
s3, constructing and screening a skeleton transition model: preliminary screening is carried out on a candidate micromolecule database by using five principles of quasi drugs, euclidean distance between a database molecule and a target molecule is calculated based on WHALES descriptors, and skeleton transition screening is carried out according to structural similarity;
s4, activity screening and binding mode prediction based on molecular docking: docking the small molecules with HBV capsid protein by utilizing molecular docking software, predicting the combination mode of the small molecules and HBV capsid protein, and screening the small molecules with excellent combination mode;
s5, predicting and screening based on molecular dynamics simulation structure-activity relationship: the stability of the small molecules to HBV capsid protein carbon end domain is analyzed by utilizing molecular dynamics simulation software combined track analysis package, a structure-activity relation model is constructed, the EC50 of the small molecules is predicted, the combined free energy of the small molecules and HBV capsid protein is calculated, and the capsid assembly regulator with anti-HBV activity is screened.
Preferably, the obtaining the amino acid sequence of the full-length wild-type core capsid protein of hepatitis b virus and predicting the dimer structure of the full-length wild-type core capsid protein of HBV comprises the steps of:
s11, obtaining the amino acid sequence of the full-length wild type core capsid protein of the hepatitis B virus from NCBI biological information database;
s12, predicting the dimer structure of HBV full-length wild type core capsid protein by using a homomultimer prediction model of Alpha Fold2, and performing energy optimization.
Preferably, the training compound set is obtained based on ChEMBL and ZINC databases;
the training data in the training compound set comprises an HBV capsid assembly modifier, a common capsid modifier and an extended connectivity fingerprint of ZINC random molecules;
the training of the GENTRL generation model comprises a variational self-encoder, a hidden space probability distribution, a generator and a reward function based on an SVM classification algorithm.
The molecular activity threshold of the capsid assembly regulator and the common capsid regulator aiming at HBV is set to 10000nM, the molecular weight and the lipid water distribution coefficient distribution of ZINC random molecules are consistent with the capsid assembly regulator, and the expanded connectivity fingerprint selects a Morgan fingerprint with radius of 2 and number of bits of 2048.
Preferably, the preliminary screening of the candidate small molecule database by using the penta-principle of quasi drugs, calculating Euclidean distance between the database molecules and target molecules based on WHALS descriptors, and performing skeleton transition screening according to structural similarity comprises the following steps:
s31, utilizing the five principles of Li Binsi-based drugs to perform preliminary screening on a candidate small molecule database, and generating a 3D structure for small molecules by using RDkit and OPENBABEL;
s32, calculating a small molecule 3D descriptor by using WHALS, obtaining Euclidean distance between each database molecule and the target compound, and performing skeleton transition according to Euclidean distance sequencing.
Preferably, the docking of the small molecule with HBV capsid protein using molecular docking software comprises:
the method comprises the steps of using Alpha Fold2 predicted HBV full-length wild type core capsid protein dimer as a receptor structure, and using Chimer and Maestro to perform three-dimensional structure optimization, hydrogenation, atom charge amount calculation and other pretreatment on the receptor;
pretreatment of small molecule ligands using RDKit and OPENBABEL;
the docking software is SMINA, scores are carried out on each small molecule according to affinity, 9 different docking postures are generated on each molecule, the first scoring is used as the docking score of the molecule, and the first 10 compounds are selected for subsequent molecular dynamics simulation screening.
The invention also relates to a novel mechanism of interaction of all novel CAMs and HBV capsid proteins, the mechanism is specifically that the stability of CTD is the key of HBV capsid protein assembly rate, CAMs can be combined on HBV capsid protein active sites, and HBV capsid protein assembly rate is accelerated by stabilizing the CTD of HBV capsid protein, so that blank HBV capsid is formed, and HBV replication is inhibited.
Preferably, the mechanism is validated using 30ns molecular dynamics simulations and trajectory analysis using five small molecules known as CAMs, including AT-130, GLP-26, NVR-3-778, BAY-41-4109, and SPA, to calculate CTD stability.
Preferably, the analysis of the stability of small molecules to HBV capsid protein carbon end domains using molecular dynamics simulation software in combination with trajectory analysis package comprises the steps of:
preparing a simulation input file by using CHARMM-GUI, simulating 30ns by using CHARMM36 molecular force field and OPENMM software, and generating a 300-frame track file;
converting the interaction of HBV capsid protein and a small molecule into a dcd file comprising a 3D trajectory, wherein the dcd file comprises positions of 300 frames during each atomic simulation of HBV capsid protein and ligand;
calculating a stability index of the carbon end domain of HBV capsid protein by MDtraj reading the dcd file, wherein the stability index is RMSF and RMSD of residues 150-183:
wherein N is the total number of atoms,the square sum of the position offset of the ith atom of the current frame and the ith atom of the target frame comprises the square sum of the position offset of the X-axis, the Y-axis and the Z-axis, T is the analog total duration, and +.>Cartesian coordinates of atoms at time tj, +.>Is the cartesian coordinates of an atom at the initial moment.
Preferably, the calculation of the stability is based on a molecular dynamics simulation system based on the binding of a pre-large amount of HBV capsid protein to known CAMs, a new mechanism of action of CAMs with HBV capsid protein is found, and is that CAMs accelerate capsid assembly by stabilizing HBV capsid protein carbon end domains.
Preferably, the structure-activity relationship model uses a carbon end domain RMSD of a ligand-free protein mimetic system to perform t-test with a carbon end domain RMSD of a small molecule ligand-bound protein mimetic system, calculates a p-value, and predicts a small molecule EC50 by the p-value:
in the method, in the process of the invention,and->Is the mean of two samples RMSD, m and n are the sizes of two data sets, +.>And->Is an unbiased estimate of the variance of the two data sets, t is calculated by a formula, a P value is calculated by using a t-test table, and a molecule with a t-test P value less than 0.05 is selected to enter a subsequent free energy calculation step.
Preferably, the calculation of the binding free energy is based on a dcd file and a simulated input file generated by simulation, the binding free energy of the small molecule ligand and HBV capsid protein is calculated by using Parmed and AMBER, and the final lead compound is screened for biological activity verification by comparing with the binding free energy of the known capsid assembly modifier.
Wherein, the calculation equation of the combined free energy is:
in the method, in the process of the invention,: solvent system protein receptor-ligand binding free energy;
: vacuum system protein receptor-ligand binding free energy;
: solvent system protein-ligand complex solvation free energy;
: solvent system ligand solvation free energy;
: solvent system protein acceptor solvation free energy.
Preferably, the binding free energy is compared with the binding free energy of the known capsid assembly modifier GLP-26 and the final lead compound is screened for biological activity verification.
The beneficial effects of the invention are as follows: based on the existing small molecule data, a new small molecule candidate data set which never appears is generated by using a GENTRL model through constructing a de novo design screening method, and effective information HBV capsid assembly regulator is screened through framework transition, molecular docking and capsid protein stability analysis based on molecular dynamics simulation. According to the method, new molecules are effectively generated, chemical spatial distribution of candidate molecules is increased, potential capsid assembly regulators can be more accurately captured through CTD stability analysis, and the discovery speed of lead compounds is remarkably accelerated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a novel hepatitis B virus capsid assembly modifier de novo design and virtual screening method based on generative model and computational chemistry in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of a novel design of a novel hepatitis B virus capsid assembly modifier de novo and virtual screening method based on generative model and computational chemistry in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of full length wild type HBV capsid protein in a novel design from scratch and virtual screening method of a hepatitis B virus capsid assembly modifier based on generative model and computational chemistry according to an embodiment of the present invention;
FIG. 4 is a diagram showing the construction of a GENTRL production model in a novel hepatitis B virus capsid assembly modifier de novo design and virtual screening method based on production model and computational chemistry according to an embodiment of the present invention;
FIG. 5 is a flow chart of backbone transitions in a novel design from scratch and virtual screening method for hepatitis B virus capsid assembly modifier based on generative model and computational chemistry in accordance with an embodiment of the present invention;
FIG. 6 is a visual representation of novel mechanisms of CAMs and HBV capsid proteins in a de novo design and virtual screening method for novel hepatitis B virus capsid assembly modifiers based on generative model and computational chemistry in accordance with an embodiment of the present invention;
FIG. 7 is a flow chart of a molecular dynamics simulation method in a novel design and virtual screening method of a hepatitis B virus capsid assembly modifier based on generative model and computational chemistry in accordance with an embodiment of the present invention;
FIG. 8 is a schematic diagram showing calculation of CTD stability in a novel hepatitis B virus capsid assembly modifier de novo design and virtual screening method based on generative model and computational chemistry in accordance with an embodiment of the present invention;
FIG. 9 is a flow chart showing the method of generating model and computational chemistry based novel hepatitis B virus capsid assembly modifier from the de novo design and virtual screening method according to the present invention.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.
In accordance with embodiments of the present invention, novel methods for de novo design and virtual screening of hepatitis B virus capsid assembly modulators based on generative model and computational chemistry are provided.
The present invention will now be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1 to 9, a novel de novo design and virtual screening method for hepatitis b virus capsid assembly regulator based on generative model and computational chemistry according to an embodiment of the present invention, the method comprising the steps of:
s1, constructing a full-length hepatitis B capsid protein structure: the method for obtaining the amino acid sequence of the full-length wild type core capsid protein of the hepatitis B virus and predicting the dimer structure of the full-length wild type core capsid protein of the HBV comprises the following steps:
s11, obtaining the amino acid sequence of the full-length wild type core capsid protein of the hepatitis B virus from NCBI biological information database;
s12, predicting the dimer structure of HBV full-length wild type core capsid protein by using a homomultimer prediction model of Alpha Fold2, and performing energy optimization;
wherein, HBV capsid protein is a key component of HBV structure, HBV capsid is a 20-face body with a size of about 22nm, each core protein monomer is composed of 183 amino acid groups, and is divided into two domains, namely Nitrogen Terminal Domain (NTD): 1-149; carbon end domain (CTD): 150-183. HBV capsid proteins that have been resolved at present contain only amino acids 1-155.
Using the Alpha Fold2 multimer model, two monomer amino acid sequences were input, the multimer_model_max_num_recycles parameter of the model was set to 3, then the model was performed, the model output was followed by selection of the first-ranked model result, and the full-length HBV capsid protein results constructed are shown in fig. 3.
S2, generating and constructing a candidate small molecule database: training a GENTRL generation model by using the obtained training compound set, and generating a candidate small molecule database by using the pre-trained GENTRL generation model;
the training data in the training compound set comprises an HBV capsid assembly modifier, a common capsid modifier and an extended connectivity fingerprint of ZINC random molecules; the training of the GENTRL generation model comprises a variational self-encoder, a hidden space probability distribution, a generator and a reward function based on an SVM classification algorithm. The molecular activity threshold of the capsid assembly regulator and the common capsid regulator aiming at HBV is set to 10000nM, the molecular weight and the lipid water distribution coefficient distribution of ZINC random molecules are consistent with the capsid assembly regulator, and the expanded connectivity fingerprint selects a Morgan fingerprint with radius of 2 and number of bits of 2048.
In order to generate novel small molecular structures which do not exist in reality and ensure that the small molecules have properties similar to those of target analysis, the invention designs a molecular generation algorithm model, and the model structure is shown in figure 4.
Wherein the training data is obtained by collecting known assembly regulator for HBV capsid from the chumbl database as training data 1 and common capsid assembly regulator as training data 2. The average of the molecular weight and the lipid water distribution coefficient of dataset 1 was then calculated using RDKit, and 10 ten thousand random molecules conforming to the average distribution were screened from the ZINC dataset using the average as training data 3. All three data sets were removed of structures containing atoms other than carbon, nitrogen, oxygen, sulfur, fluorine, chlorine, bromine and hydrogen, and conventional pharmacokinetic filters MCF and pans were used to exclude compounds with potentially toxic and reactive groups, and then the molecules in all data sets were subjected to a unified SMILES normalization such that all molecules were generated in the same SMILES encoding direction.
Next, training a variational self-encoder and a priori distributed lipid-water distribution coefficient (lovp) and a synthetic difficulty coefficient (SAscore) on the three data sets is an important molecular property for judging whether a molecule has a drug-like property, and is important in the fields of drug discovery, agricultural chemicals discovery and the like, so that the molecular property in the model is selected from lipid-water distribution coefficient (penolized lovp) containing penalty terms, and the calculation formula is as follows:
wherein rings6 is a "penalty" for molecules with more than 6 atoms in the molecular carbocycle, avoiding indiscriminate formation of unrealistic macrocycles. Firstly, training a ZINC molecular data set to enable a model to learn the characteristics of a conventional son; and secondly, training the data set 1 and the data set 2 simultaneously so that the model can learn the special target characteristics. By training the model, a mapping relationship from chemical space to hidden space is obtained. This mapping also relates the relationship between molecules and their properties.
In this example, the model before and after optimization was randomly sampled 50000 times from the hidden space obtained by training, the structure containing atoms other than carbon, nitrogen, oxygen, sulfur, fluorine, chlorine, bromine and hydrogen was removed, and a candidate small molecule dataset was obtained using a conventional pharmaceutical chemistry filter MCF which pans were used to exclude compounds with potentially toxic and reactive groups.
S3, constructing and screening a skeleton transition model: preliminary screening is carried out on a candidate small molecule database by using a quasi-drug penta, euclidean distance between a database molecule (namely a molecule in the candidate small molecule database) and a target molecule (namely a known capsid assembly regulator AT-130) is calculated based on a WHALES descriptor, and skeleton transition screening is carried out according to structural similarity;
the method for screening the candidate micromolecule database by utilizing the penta of the quasi-drugs comprises the following steps of calculating Euclidean distance between a database molecule and a target molecule by using a WHALES descriptor, and carrying out skeleton transition screening according to structural similarity:
s31, utilizing the five principles of Li Binsi-based drugs to perform preliminary screening on a candidate small molecule database, and generating a 3D structure for small molecules by using RDkit and OPENBABEL;
s32, calculating a small molecule 3D descriptor by using WHALS, obtaining Euclidean distance between each database molecule and the target compound, and performing skeleton transition according to Euclidean distance sequencing.
The flow of the backbone transition model is shown in FIG. 5, and the present invention uses RDkit to calculate the hydrogen bond donor, hydrogen bond acceptor, molecular weight, intramolecular alternative bonding and lipid partition coefficients of candidate small molecules prior to backbone transition.
Preferably, the five-element screening threshold of the quasi-drug is that the hydrogen bond donor is less than or equal to 5, the hydrogen bond acceptor is less than or equal to 10, the molecular weight is less than or equal to 500Da, the number of rotatable bonds in the molecule is less than or equal to 10 and the lipid water distribution coefficient is less than or equal to 5.
Optionally, when constructing a framework transition model, a 3D structure model needs to be built for each molecule, the invention uses RDKit package and OPENBABEL to generate a three-dimensional structure for the molecule screened by the quasi-drug principle, optimizes the three-dimensional structure by using an eded molecular function and MMFF994 molecular force field, and then calculates the garteiger charge of each atom in the molecule. Using AT-130 as the target molecule, the charge and three-dimensional structure of the AT-130 molecule are prepared in the same manner. The do_whales module is then used to calculate the mahalanobis distance and the WHALES descriptor for the template target molecule and candidate database molecules.
According to WHALS descriptors of target molecules and candidate database molecules, euclidean distance calculation module Euclidean_distances is used for calculating Euclidean distances of the target molecules and the candidate database molecules, sorting is carried out according to the distances from small to large, and the compounds with 20% of the top ranking are selected for molecular docking screening.
S4, activity screening and binding mode prediction based on molecular docking: docking the small molecules with HBV capsid protein by utilizing molecular docking software, predicting the combination mode of the small molecules and HBV capsid protein, and screening the small molecules with excellent combination mode;
wherein, the docking of the small molecules with HBV capsid protein using molecular docking software comprises:
the method comprises the steps of using Alpha Fold2 predicted HBV full-length wild type core capsid protein dimer as a receptor structure, and using Chimer and Maestro to perform three-dimensional structure optimization, hydrogenation, atom charge amount calculation and other pretreatment on the receptor; pretreatment of small molecule ligands using RDKit and OPENBABEL;
the docking software is SMINA, scores are carried out on each small molecule according to affinity, 9 different docking postures are generated on each molecule, the first scoring is used as the docking score of the molecule, and the first 10 compounds are selected for subsequent molecular dynamics simulation screening.
Preferably, the protein receptor of the present invention is obtained from a model of full-length wild-type HBV capsid protein structure predicted by Alpha Fold2, and the monomer length is a dimer structure of 183 amino acid residues. The docking binding pocket was set using the position of small molecule ligand binding in the PDB ID 5T2P structure.
Pretreatment of proteins protein receptors were hydrogenated and Gasteiger charged by the dock Pre module of Chimera software, followed by Minimization of protein energy using the Minimization module. Pretreatment of a small molecular receptor generates a 2D structure of a molecule from the molecules Smiles through an encoding function of RDkit and carries out hydrogenation, the maxAttempts parameter is 100, the random seed is 0xf00D, then the 2D structure is optimized into a 3D structure by using a UFF force field, and the maximum iteration number is 1000. The small molecule ligands are charged and converted using OPENBABEL. In the molecular docking process, the Vina and the expansion program SMINA of Vina are used for docking, the random seed is 0, the docking site is selected as an original small molecule ligand binding site, the exhaustiveness parameter is 24, each small molecule is scored according to the affinity, and the 10 compounds with the top scoring rank are taken for subsequent molecular dynamics simulation screening.
S5, predicting and screening based on molecular dynamics simulation structure-activity relationship: the stability of the small molecules to HBV capsid protein carbon end domain is analyzed by utilizing molecular dynamics simulation software combined track analysis package, a structure-activity relation model is constructed, the EC50 of the small molecules is predicted, the combined free energy of the small molecules and HBV capsid protein is calculated, and the capsid assembly regulator with anti-HBV activity is screened.
Preferably, the input structure of the molecular dynamics simulation of the present invention is the protein-ligand complex predicted by S4, and the molecular dynamics simulation steps are shown in fig. 7. The preparation of the molecular simulation input file is completed by the solution builder of the CHARMM-GUI online server, a proper period boundary and a water box are created for each biological system, the water box boundary is set to be more than 10A from the protein boundary, the water box is filled with a TIP3P water molecule solvent model, and meanwhile K+Cl < - > is added to neutralize redundant charges in the system, so that the final concentration of K+Cl < - > is kept to be 0.15M.
The molecular simulation process is completed by OPENMM software, the non-bonding method parameter is Particle-Mesh Ewald (PME), the hydrogen bond parameter is selected from restricts, the simulation temperature is selected from 303.15 Kelvin, and the simulation pressure is selected from normal atmospheric pressure. The code of the molecular simulation is completely written by a Python programming language, python packages such as OPENMM, pandas, numpy and the like, the capability of the system is minimized by using a minimization function, then non-limiting simulation of 30ns is completed by using OPENMM, 15000000 steps are performed, a track file is intercepted every 50000 steps, 300 frames of the track file are generated, interaction of protein and small molecules is converted into a dcd file containing 3D tracks, the dcd file contains positions of 300 frames in each atomic simulation process of HBV capsid protein and ligand, and energy in the simulation process is recorded every 1000 steps.
Preferably, all molecular system track files generated by OPENMM simulation are completed by MDTraj, MDAnalysis, pandas, numpy, matplotlib and other packet analysis in Python language, and due to the existence of cycle boundary conditions, the generated track needs to be subjected to structural centering by MDTraj, and the MDtraj is used for reading dcd files to calculate the stability index of the capsid protein carbon end domain. The stability index is RMSF and RMSD of residues 150-183:
wherein N is the total number of atoms,the square sum of the position offset of the ith atom of the current frame and the ith atom of the target frame comprises the square sum of the position offset of the X-axis, the Y-axis and the Z-axis, T is the analog total duration, and +.>Cartesian coordinates of atoms at time tj, +.>Is the cartesian coordinates of an atom at the initial moment.
All trajectories are superimposed with the original structure and RMSDs between all atoms are calculated, while CA atoms in different amino acids are selected to calculate RMSFs for different amino acids. RMSD of CTD is obtained by calculating RMSD of CTD between adjacent frame structures over successive time periods, the calculation process is shown in fig. 8.
Preferably, the structure-activity relationship model uses a carbon end domain RMSD of a ligand-free protein mimetic system and a carbon end domain RMSD of a small molecule ligand-binding protein mimetic system to perform t-test, calculates p-value, and predicts small molecule EC50 through the p-value:
in the method, in the process of the invention,and->Is the mean of two samples RMSD, m and n are the sizes of two data sets, +.>And->Is an unbiased estimate of the variance of the two data sets, t is calculated by a formula, a P value is calculated by using a t-test table, and a molecule with a t-test P value less than 0.05 is selected to enter a subsequent free energy calculation step.
Preferably, the stability index of CTD is obtained by the calculation, and the novel mechanism of action of CAMs and capsid proteins shown in FIG. 6 is used for carrying out 30ns molecular dynamics simulation and trajectory analysis on five small molecules of AT-130, GLP-26, NVR-3-778, BAY-41-4109 and SPA, constructing a small molecule structure-activity relationship regression model, and predicting the EC50 of candidate small molecules through the regression model.
The protein-ligand binding free energy calculation flow is shown in FIG. 9, and psfs and crd generated by CHARMM-GUI are converted into prmtop and inprd files of a simulation system by using Parmed starting from the simulation input files. Then, the ligand residue numbers to be separated, solvent and ion residue names are designated, the ante-MMPBSA is used for generating receptor, ligand, complex, prmtop and inprd files of the solvent, finally MMPBSA calculation script is used for calculating the free energy of protein receptor-ligand binding, the track of 150-300 frames is calculated, and the calculation interval is 2 frames. And (3) calculating the binding free energy of the candidate small molecule ligand-protein, comparing the binding free energy with the GLP 26-protein, screening the candidate small molecule which solves the binding free energy of the target molecule to be a lead compound, and carrying out subsequent biological experiment verification.
Wherein, the calculation equation of the combined free energy is:
in the method, in the process of the invention,: solvent system protein receptor-ligand binding free energy;
: vacuum system protein receptor-ligand binding free energy;
: solvent system protein-ligand complex solvation free energy;
: solvent system ligand solvation free energy;
: solvent system protein acceptor solvation free energy.
In summary, by means of the above technical solution of the present invention, a method for de novo design and virtual screening of HBV capsid assembly modulators is constructed by a genetrl generation model and a variety of qualitative and quantitative structure-activity relationship analysis techniques based on framework transitions, molecular docking, molecular dynamics simulation, ligand-protein binding free energy calculation.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (9)

1. Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generating model and computational chemistry, characterized in that the method comprises the following steps:
s1, constructing a full-length hepatitis B capsid protein structure: acquiring the amino acid sequence of the full-length wild type core capsid protein of the hepatitis B virus, and predicting the dimer structure of the full-length wild type core capsid protein of the HBV;
s2, generating and constructing a candidate small molecule database: training a GENTRL generation model by using the obtained training compound set, and generating a candidate small molecule database by using the pre-trained GENTRL generation model;
s3, constructing and screening a skeleton transition model: preliminary screening is carried out on a candidate micromolecule database by using five principles of quasi drugs, euclidean distance between a database molecule and a target molecule is calculated based on WHALES descriptors, and skeleton transition screening is carried out according to structural similarity;
s4, activity screening and binding mode prediction based on molecular docking: docking the small molecules with HBV capsid protein by utilizing molecular docking software, predicting the combination mode of the small molecules and HBV capsid protein, and screening the small molecules with excellent combination mode;
s5, predicting and screening based on molecular dynamics simulation structure-activity relationship: the stability of the small molecules to HBV capsid protein carbon end domain is analyzed by utilizing molecular dynamics simulation software combined track analysis package, a structure-activity relation model is constructed, the EC50 of the small molecules is predicted, the combined free energy of the small molecules and HBV capsid protein is calculated, and the capsid assembly regulator with anti-HBV activity is screened.
2. The de novo design and virtual screening method of novel hepatitis b virus capsid assembly modifier based on generative model and computational chemistry of claim 1, wherein the obtaining of amino acid sequence of full length wild type core capsid protein of hepatitis b virus and predicting HBV full length wild type core capsid protein dimer structure comprises the steps of:
s11, obtaining the amino acid sequence of the full-length wild type core capsid protein of the hepatitis B virus from NCBI biological information database;
s12, predicting the dimer structure of HBV full-length wild type core capsid protein by using a homomultimer prediction model of Alpha Fold2, and performing energy optimization.
3. The novel model and computational chemistry based de novo design and virtual screening method of hepatitis b virus capsid assembly modulators of claim 1, wherein the training compound set is obtained based on ChEMBL and ZINC databases;
the training data in the training compound set comprises an HBV capsid assembly modifier, a common capsid modifier and an extended connectivity fingerprint of ZINC random molecules;
the training of the GENTRL generation model comprises a variational self-encoder, a hidden space probability distribution, a generator and a reward function based on an SVM classification algorithm.
4. The de novo design and virtual screening method of novel hepatitis b virus capsid assembly regulator based on generation model and computational chemistry according to claim 1, wherein the preliminary screening of candidate small molecule databases using quasi-drug penta, the calculation of euclidean distance between database molecules and target molecules using WHALES descriptors, and the framework transition screening according to structural similarity comprises the steps of:
s31, utilizing the five principles of Li Binsi-based drugs to perform preliminary screening on a candidate small molecule database, and generating a 3D structure for small molecules by using RDkit and OPENBABEL;
s32, calculating a small molecule 3D descriptor by using WHALS, obtaining Euclidean distance between each database molecule and the target compound, and performing skeleton transition according to Euclidean distance sequencing.
5. The de novo design and virtual screening method of novel hepatitis b virus capsid assembly modifiers based on generative model and computational chemistry of claim 1, wherein the docking of small molecules with HBV capsid proteins using molecular docking software comprises:
pretreatment of the receptor using Alpha Fold2 predicted HBV full-length wild-type core capsid protein dimer as the receptor structure using chirea and Maestro;
pretreatment of small molecule ligands using RDKit and OPENBABEL;
and the docking software is SMINA, scoring is carried out on each small molecule according to the affinity, and the first 10 compounds of the scoring rank are taken for carrying out subsequent molecular dynamics simulation screening.
6. The novel de novo design and virtual screening method of hepatitis b virus capsid assembly regulator based on generative model and computational chemistry of claim 1, wherein the analysis of the stability of small molecules to HBV capsid protein carbon end domains by molecular dynamics simulation software in combination with trajectory analysis package comprises the steps of:
preparing a simulation input file by using CHARMM-GUI, simulating 30ns by using CHARMM36 molecular force field and OPENMM software, and generating a 300-frame track file;
converting the interaction of HBV capsid protein and a small molecule into a dcd file comprising a 3D trajectory, wherein the dcd file comprises positions of 300 frames during each atomic simulation of HBV capsid protein and ligand;
calculating a stability index of the carbon end domain of HBV capsid protein by MDtraj reading the dcd file, wherein the stability index is RMSF and RMSD of residues 150-183:
wherein N is the total number of atoms,dical for the ith atom of the current frame and the ith atom of the target frameThe sum of squares of the position offsets of the coordinates, including the sum of squares of the position offsets of the X axis, the Y axis and the Z axis, T is the total analog duration, +.>Cartesian coordinates of atoms at time tj, +.>Is the cartesian coordinates of an atom at the initial moment.
7. The de novo design and virtual screening method of novel hepatitis b virus capsid assembly modifiers based on generative model and computational chemistry according to claim 6 wherein the calculation of stability is based on molecular dynamics simulation systems based on the binding of a pre-large number of HBV capsid proteins to known CAMs, a new mechanism of action of CAMs with HBV capsid proteins is found and is CAMs accelerating capsid assembly by stabilizing HBV capsid protein carbon end domains.
8. The novel design and virtual screening method of hepatitis b virus capsid assembly regulator based on generation model and computational chemistry according to claim 1, wherein the structure-activity relationship model uses the carbon end domain RMSD of ligand-free protein mimetic system and the carbon end domain RMSD of small molecule ligand-binding protein mimetic system for t-test, calculating p-value, and predicting small molecule EC50 through p-value.
9. The novel design from scratch and virtual screening method of hepatitis b virus capsid assembly modifier based on generation model and computational chemistry according to claim 1, wherein the calculation of the binding free energy is based on a dcd file and a simulated input file generated by simulation, and the binding free energy of small molecule ligand and HBV capsid protein is calculated using Parmed and AMBER, compared with the binding free energy of known capsid assembly modifiers, and the final lead compound is screened for biological activity verification.
CN202310736846.0A 2023-06-21 2023-06-21 Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry Active CN116504302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310736846.0A CN116504302B (en) 2023-06-21 2023-06-21 Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310736846.0A CN116504302B (en) 2023-06-21 2023-06-21 Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry

Publications (2)

Publication Number Publication Date
CN116504302A true CN116504302A (en) 2023-07-28
CN116504302B CN116504302B (en) 2023-11-17

Family

ID=87323355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310736846.0A Active CN116504302B (en) 2023-06-21 2023-06-21 Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry

Country Status (1)

Country Link
CN (1) CN116504302B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150218182A1 (en) * 2011-08-02 2015-08-06 Indiana University Research And Technology Corporation Modulators of virus assembly as antiviral agents
WO2020255013A1 (en) * 2019-06-18 2020-12-24 Janssen Sciences Ireland Unlimited Company Combination of hepatitis b virus (hbv) vaccines and capsid assembly modulators being amide derivatives
US20210285000A1 (en) * 2020-03-05 2021-09-16 Janssen Pharmaceuticals, Inc. Combination therapy for treating hepatitis b virus infection
CN114317832A (en) * 2022-01-28 2022-04-12 徐州医科大学 Method for detecting HBV core protein allosteric modulator related drug resistance locus
US20220249647A1 (en) * 2019-06-18 2022-08-11 Janssen Sciences Ireland Unlimited Company Combination of hepatitis b virus (hbv) vaccines and dihydropyrimidine derivatives as capsid assembly modulators
CN115282278A (en) * 2022-07-13 2022-11-04 山东大学 Application of cholesterol regulator as antigen presentation promoter in treatment of hepatitis B
US20220370447A1 (en) * 2019-09-20 2022-11-24 Hoffmann-La Roche Inc. Method of treating hbv infection using a core protein allosteric modulator
CN115938488A (en) * 2022-11-28 2023-04-07 四川大学 Method for identifying protein allosteric modulator based on deep learning and computational simulation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150218182A1 (en) * 2011-08-02 2015-08-06 Indiana University Research And Technology Corporation Modulators of virus assembly as antiviral agents
WO2020255013A1 (en) * 2019-06-18 2020-12-24 Janssen Sciences Ireland Unlimited Company Combination of hepatitis b virus (hbv) vaccines and capsid assembly modulators being amide derivatives
US20220249647A1 (en) * 2019-06-18 2022-08-11 Janssen Sciences Ireland Unlimited Company Combination of hepatitis b virus (hbv) vaccines and dihydropyrimidine derivatives as capsid assembly modulators
US20220370447A1 (en) * 2019-09-20 2022-11-24 Hoffmann-La Roche Inc. Method of treating hbv infection using a core protein allosteric modulator
US20210285000A1 (en) * 2020-03-05 2021-09-16 Janssen Pharmaceuticals, Inc. Combination therapy for treating hepatitis b virus infection
CN114317832A (en) * 2022-01-28 2022-04-12 徐州医科大学 Method for detecting HBV core protein allosteric modulator related drug resistance locus
CN115282278A (en) * 2022-07-13 2022-11-04 山东大学 Application of cholesterol regulator as antigen presentation promoter in treatment of hepatitis B
CN115938488A (en) * 2022-11-28 2023-04-07 四川大学 Method for identifying protein allosteric modulator based on deep learning and computational simulation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KIM, HYEJIN 等: "Current Progress in the Development of Hepatitis B Virus Capsid Assembly Modulators: Chemical Structure, Mode-of-Action and Efficacy", 《MOLECULES》, vol. 26, no. 24, pages 1 - 19 *
杨璐 等: "乙型肝炎病毒衣壳蛋白装配调节剂研究进展", 《中国药理学通报》, vol. 35, no. 11, pages 1481 - 1487 *

Also Published As

Publication number Publication date
CN116504302B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
Marks et al. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction
WO2023134063A1 (en) Comparative learning-based method, apparatus, and device for predicting properties of drug molecule
WO1993020525A1 (en) Method of searching the structure of stable biopolymer-ligand molecule composite
JP2004503038A (en) Method for determining three-dimensional protein structure from primary protein sequence
CN115985384A (en) Target polypeptide design method and system based on reinforcement learning and molecular simulation
CN104031118A (en) Novel affinity peptide ligand of murine polyoma capsomere as well as designing and screening method thereof
CN116504302B (en) Novel hepatitis B virus capsid assembly regulator de novo design and virtual screening method based on generation model and computational chemistry
WO2007112110A2 (en) Forward synthetic synthon generation and its use to identify molecules similar in 3 dimensional shape to pharmaceutical lead compounds
US8886505B2 (en) Method of predicting protein-ligand docking structure based on quantum mechanical scoring
CN110289055A (en) Prediction technique, device, computer equipment and the storage medium of drug targets
Shen et al. zPoseScore model for accurate and robust protein–ligand docking pose scoring in CASP15
Olson et al. Enhancing sampling of the conformational space near the protein native state
Dhakal et al. Predicting Protein-Ligand Binding Structure Using E (n) Equivariant Graph Neural Networks
JPWO2019235567A1 (en) Protein interaction analyzer and analysis method
Bravi Development and use of machine learning algorithms in vaccine target selection
CN114842924A (en) Optimized de novo drug design method
Mishra et al. Artificial intelligence: a new era in drug discovery
CN110428875B (en) Cytochrome P450 metabolic site prediction method of small molecule drug
Wang et al. SAPocket: Finding pockets on protein surfaces with a focus towards position and voxel channels
CN116665807B (en) Molecular intelligent generation method, device, equipment and medium based on diffusion model
KR101273732B1 (en) Protein-ligand docking method using 3-dimensional molecular alignment
CN117174164B (en) Method for screening lead compounds based on predicted protein-small molecule binding posture
Bartuzi et al. Illuminating the “twilight zone”: advances in difficult protein modeling
Le Grand The application of the genetic algorithm to protein tertiary structure prediction
Yuan et al. A survey of computational methods for protein structure prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant