WO2024063583A1

WO2024063583A1 - Method for generating derivatives using binding pocket structure of target protein through artificial intelligence drug discovery platform

Info

Publication number: WO2024063583A1
Application number: PCT/KR2023/014453
Authority: WO
Inventors: 정종선; 홍종희; 윤승민; 김보수
Original assignee: (주)신테카바이오
Priority date: 2022-09-21
Filing date: 2023-09-21
Publication date: 2024-03-28

Abstract

The present invention relates to a method for generating various forms of derivatives from selected active substances (Hit-compound), which is an in-silico prescreening technique applicable to the process of discovering new drug active substances in computer-aided drug discovery (CADD) or AI drug platforms for development of new drugs, the method comprising the steps of: (A) selecting an anchor atom as a substitution target binding site in the binding structure of a compound to be analyzed; (B) calculating a pocket space within a binding pocket of a target protein; (C) generating derivatives; and (D) filtering and selecting the generated derivatives. According to the present invention, the use of the artificial intelligence drug discovery platform in exploring candidate substances allows for the improvement of discovered active substances (hit compounds) and the generation of various derivatives for training the analysis AI algorithms.

Description

Method for generating derivatives using the binding pocket structure of the target protein through an artificial intelligence new drug platform

The present invention is an in silico prescreening technology applied to the process of discovering active substances for new drugs using CADD (Computer aided drug discovery) or AI drug platform for new drug development. Various types of derivatives are discovered from selected hit-compounds. It is about how to create .

In general, new drug development is carried out through a process of discovery and screening of candidate substances, followed by optimization, non-clinical testing/toxicity testing, and clinical trials. Recently, in order to reduce the time and cost required to discover new drug candidates, , computing analysis technologies (AI, etc.) are being applied.

Accordingly, various analysis tools are currently being used in the field of candidate material analysis systems (computational screening), and representative analysis tools are shown in [Table 1] below.

Program nameProgram name	TypeType	PrinciplePrinciple	Analysis time for a poseAnalysis time for a pose
AutoDockAutoDock	3D docking3D docking	Grid based semi-empherical scoring function whichconsidersvanderWaalsforces,electrostaticinteractions,anddesolvationGrid based semi-empherical scoring function whichconsidersVanderWaalsforces,electrostaticinteractions,anddesolvation	Minutes to HoursMinutes to Hours

GlideGlide	3D docking3D docking	empirical scoring function (GlideScore) which considers terms like Coulombic, van der Waals, and solvation effectsempirical scoring function (GlideScore) which considers terms like Coulombic, van der Waals, and solvation effects	0.2-2.4 min0.2-2.4min
GOLD (Genetic Optimisation for Ligand Docking)GOLD (Genetic Optimization for Ligand Docking)	3D docking3D docking	ChemScore based scoring function considering multiple binding modes.ChemScore based scoring function considering multiple binding modes.	Minutes to HoursMinutes to Hours
FEP+FEP+	Molecular dynamicsMolecular dynamics	Free Energy PerturbationFree Energy Perturbation	Hours to Days (depending on system size & setup)Hours to Days (depending on system size & setup)
PLUMED/GROMACSPLUMED/GROMACS	Molecular dynamicsMolecular dynamics	MetadynamicsMetadynamics	Hours to DaysHours to Days
GaussianGaussian	QM/MMQM/MM	Quantum Mechanics/Molecular MechanicsQuantum Mechanics/Molecular Mechanics	Hours to Days (depending on QM region size)Hours to Days (depending on QM region size)
AMBERAMBER	Molecular dynamicsMolecular dynamics	Alchemical Free Energy (e.g., thermodynamic integration)Alchemical Free Energy (e.g., thermodynamic integration)	Hours to DaysHours to Days
DeepChemDeepChem	Deep LearningDeep Learning	Neural Networks for Molecular SystemsNeural Networks for Molecular Systems	Seconds to Minutes (once trained)Seconds to Minutes (once trained)
gninagnina	Deep Learning Deep Learning	3D convolutional neural network based affinity prediction3D convolutional neural network based affinity prediction	2.5 minutes (once trained)2.5 minutes (once trained)
Phase Discovery Studio MOEPhase Discovery Studio MOE	Pharmacophore modelingPharmacophore modeling	Generate pharmacophore models from known active compounds or protein structures.Generate pharmacophore models from known active compounds or protein structures.	Minutes to HoursMinutes to Hours
LigandScoutLigandScout	Pharmacophore ensemble approachPharmacophore ensemble approach	Generate ensemble of pharmacophores from multiple active ligands or protein conformations.Generate ensemble of pharmacophores from multiple active ligands or protein conformations.	Hours Hours

ROCSROCS	3D align 2D fingerprints Force field3D align 2D fingerprints Force field	Rapid overlay of chemical structures using shape and chemical features for virtual screening.Rapid overlay of chemical structures using shape and chemical features for virtual screening.	Minutes to HoursMinutes to Hours

Meanwhile, in the process of virtually screening drug candidates from compounds, a pre-screening step based on chemical properties or structural similarity and 3D docking are used. It is divided into an in-depth screening step that utilizes protein-ligand interaction information.

Since the pre-screening step is generally used for the purpose of reducing the number of candidates from large-scale substances, a simple number-based discrimination algorithm such as rule of 5 (Guideline for drug design, Lipinski) is used. Because the information used for screening is limited, it is generally known to be reliable within the range of maintaining the screening rate at 10%.

However, as computing technology develops, the demand for calculations on billions of analysis targets is increasing, resulting in the need to upgrade existing screening strategies.

To solve this problem, pre-screening methods such as comparative analysis of the similarity of features including chemical properties or screening methods based on similarity between substances as a method of characterizing two-dimensional patterns of molecular structures are used. ) was studied as an advanced strategy, but there were limitations in that the accuracy (T/P, true/positive) ratio among the selection results was not high and the structure of all materials could not be patterned.

Meanwhile, in the case of 3D and 4D based protein-ligand binding affinity prediction technologies, which are known to have relatively high screening reliability, the analysis time required ranges from minutes to several hours, making it difficult to apply them to screening large-scale analysis targets.

The present invention was created to solve the above problems. In discovering candidate substances through an artificial intelligence new drug platform, the present invention is intended to improve the derived effective substances (hit compounds) and learn the analysis AI algorithm. The aim is to provide a method for generating derivatives using the binding pocket structure of a target protein that can generate various derivatives.

In addition, the present invention provides a method for generating a derivative using the binding pocket structure of the target protein, which reflects the shape of the binding space of the target protein to improve binding ability and generate a derivative with improved actual binding potential when producing a derivative. This is what we want to provide.

According to the features of the present invention for achieving the above-mentioned object, the present invention includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed; (B) calculating the pocket space inside the binding pocket in the target protein; (C) producing a derivative; And (D) filtering and selecting the produced derivative.

And the step (A) includes (A1) calculating an interaction (binding information) profile between the compound and the target protein; (A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion; (A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments; (A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. It may be performed including the step of:

Additionally, the interaction efficiency may be calculated as the average value of the bond energy of each atom constituting the atomic fragment.

And the step (B) includes: (B1) extracting a region of a preset size centered on the anchor site of the target protein to create a cylinder filter (sylinder); (B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms; (B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter; (B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots; (B5) clustering (GMM clustering) in the dot space unit; (B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region to derive only some of the clustering regions and derive them as the pocket space (target volume) within the target protein. and calculating the size of the pocket area; It may also be performed including.

In addition, step (C) includes (C1) selecting an R-group whose size corresponds to the size of the pocket space (target volume) in the target protein calculated in step (B); (C2) It may also be performed including the step of binding the selected R-group to the anchor of the scaffold to generate a derivative.

And in the step (C2), the bonding position of the R-group bonded to the anchor of the scaffold is changed to produce derivatives with a plurality of different bonding structures for the same R-group. It may be possible.

In addition, in the step (C2), the bond form (angle) of the R-group and the scaffold can be varied to produce derivatives with a plurality of different bond forms for the R-group having the same bond structure. It may be possible.

And the binding form (angle) of the R-group and the scaffold is obtained by extracting the linker created by extracting the adjacent part from the anchor part of the target protein, and the binding form of the linking group and the R-group It can also be created by changing .

In addition, the step (D) may include linker filtering in which the linkage form of the linker and the R-group is filtered by comparing it with an actual substance database.

And in the step (D), the derivative is bound to the cylinder filter (cylinder) for each bond type of the R-group of the derivative, and then the derivative is filtered according to the amount of collision generated in the pocket. ) may also be included.

The following effects can be expected from the method of generating a derivative using the binding pocket structure of the target protein through the artificial intelligence new drug platform according to the present invention as seen above.

In other words, the present invention has the effect of improving the derived hit compound and generating various derivatives for learning the analysis AI algorithm when discovering candidate substances through the artificial intelligence new drug platform.

In addition, in the present invention, when producing a derivative, the binding ability is improved by reflecting the shape of the binding space of the target protein, and there is an effect of generating a derivative with an improved actual binding possibility.

In addition, the derivative created by the invention is derived from a binding form (pose) within the pocket, which has the effect of improving the possibility of deriving the optimal binding form when molecular dynamics simulation is performed on an artificial intelligence new drug platform.

1 is a configuration diagram showing the overall configuration of an artificial intelligence drug platform (AI-drug platform) to which the present invention is applied.

Figure 2 is a conceptual diagram showing the cloud service structure of an artificial intelligence new drug platform to which the present invention is applied.

Figure 3 is a conceptual diagram showing the effective substance discovery process of the artificial intelligence new drug platform to which the present invention is applied.

Figure 4 is a conceptual diagram showing the lead material discovery process of the artificial intelligence new drug platform to which the present invention is applied.

Figure 5 is a flowchart showing a method for discovering a lead material through the generation of a derivative using the binding pocket structure of a target protein according to a specific embodiment of the present invention.

Figure 6 is a conceptual diagram showing a method of selecting an anchor atom, which is a position to be replaced, in the derivative generation process according to the present invention.

Figures 7 to 9 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention.

10 and 11 are conceptual diagrams showing a method of generating a derivative by selecting an R-group in the derivative generation process according to the present invention.

Figure 12 is an example diagram illustrating the bonding end filtering process in which the bonding form of the bonding group and the R-group is filtered by comparing it with the bonding form of the existing material in the derivative generation process according to the present invention.

Figure 13 is an example diagram illustrating a combination type filtering process in which filtering is performed according to whether or not a collision occurs within the pocket of the derivative generated in the derivative generation process according to the present invention.

In order to achieve the above-mentioned object, the present invention includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed; (B) calculating the pocket space inside the binding pocket in the target protein; (C) producing a derivative; And (D) filtering and selecting the produced derivative; wherein the step (A) is performed to determine the interaction (binding information) profile between the (A1) compound and the target protein. a calculating step; (A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion; (A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments; (A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. Step (B) includes: (B1) generating a cylinder filter by extracting a region of a preset size centered on the anchor site of the target protein; (B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms; (B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter; (B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots; (B5) clustering (GMM clustering) in the dot space unit; (B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region, and only some of the clustering regions are derived to be derived as a pocket space (target volume) within the target protein. and calculating the size of the pocket area; It can be performed including.

Hereinafter, we will look at a method for producing a derivative using the binding pocket structure of a target protein according to a specific example of the present invention with reference to the attached drawings.

Prior to the description, the effects, features, and methods of achieving the present invention will become clear in the examples described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the disclosure of the present invention is complete and to provide common knowledge in the technical field to which the present invention pertains. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

In describing the embodiments of the present invention, if it is judged that a detailed description of a known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted, and the terms described below will be used in the embodiments of the present invention. These are terms defined in consideration of the function of and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification.

The combination of each block in the attached block diagram and each step in the flow chart may be performed by computer program instructions (execution engine), and these computer program instructions can be installed on a processor of a general-purpose computer, special-purpose computer, or other programmable data processing equipment. Since it can be mounted, the instructions executed through a processor of a computer or other programmable data processing equipment create a means of performing the functions described in each block of the block diagram or each step of the flow diagram.

In addition, computer program instructions can also be mounted on a computer or other programmable data processing equipment, so a series of operation steps are performed on the computer or other programmable data processing equipment to create a process that is executed by the computer and runs on the computer or other program. Instructions that perform possible data processing equipment may also provide steps for executing functions described in each block of the block diagram and each step of the flow diagram.

Additionally, each block or each step may represent a module, segment, or portion of code containing one or more executable instructions for executing specified logical functions, and in some alternative embodiments, the blocks or steps referred to in the blocks or steps may represent a portion of code. It is also possible for functions to occur out of order.

In the field of artificial intelligence new drug platforms to which the present invention is applied, most technical terms are not defined in Korean and have English names used as general names. In the case of technical terms written in Korean, the English names are commonly used as general names in the technical field. It must be interpreted according to the meaning of the name.

Before explaining the method for generating a derivative using the binding pocket structure of the target protein according to the present invention, the entire artificial intelligence new drug platform to which the present invention is applied will be described.

Figure 1 is a configuration diagram showing the overall configuration of an artificial intelligence new drug platform (AI-drug platform) to which the present invention is applied, and Figure 2 is a conceptual diagram showing the cloud service structure of an artificial intelligence new drug platform to which the present invention is applied. Figure 3 is a conceptual diagram showing the active material discovery process of the artificial intelligence new drug platform to which the present invention is applied, and Figure 4 is a conceptual diagram showing the lead material discovery process of the artificial intelligence new drug platform to which the present invention is applied.

The AI-drug platform to which the present invention is applied is basically a platform that performs the entire process of discovering new drug candidates in the preclinical stage, and applicants can be serviced through the cloud (STB CLOUD).

At this time, new drugs include synthetic new drugs (small molecules) and antibody drugs, and the artificial intelligence new drug platform (AI-drug platform) according to the present invention provides a discovery process for all of them.

Meanwhile, for this purpose, the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in FIG. 1, includes a hit material automated discovery platform, a lead material automated discovery platform, and a drug reaction (ADMET) , Absorption, Distribution, Metabolism, Excretion & Toxicity) and an automated analysis platform.

In other words, the artificial intelligence new drug platform (AI-drug platform) according to the present invention performs the entire new drug development process of selecting active substances, discovering lead substances among them, and then selecting candidate substances through drug reaction analysis. It is an artificial intelligence platform designed to do this.

Figure 2 shows the cloud service process of the AI-drug platform according to the present invention. As shown, the present invention discovers effective substances, generates lead substances, and ADMET/PK. It provides all areas of the drug discovery and development process, from pharmacogenetics to biomarkers.

In addition, in order to operate the platform for each discovery stage of new drug development, the artificial intelligence new drug platform (AI-drug platform) according to the present invention includes three individual artificial intelligence systems: generative artificial intelligence system (GPT/BERT), A 3D structural artificial intelligence system (ED-CNN) and a molecular dynamics analysis system (Auto-MD simulation) are applied.

And, using the artificial intelligence systems of the AI-drug platform, each hit material automated discovery platform, lead material automated discovery platform, and drug reaction (ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) as a specific method to run an automated analysis platform, discovering effective substances through 3D structural information between proteins and ligands (hereinafter referred to as 'DMC-PRE', the technical name coined by the applicant), central atom vector-based protein - Analysis of docking structure between ligands (hereinafter referred to as 'GAP-Dock', the technical name of the applicant), prediction of optimized binding structure between proteins and compounds using a 3D-CNN learning model (hereinafter referred to as 'DMC-SCR', the technical name of the applicant) , generation of derivatives through the binding pocket structure of the target protein (hereinafter referred to as 'LEAD-GEN', the technical name of the applicant), analysis of protein-compound interaction stability through molecular dynamics simulation data (hereinafter referred to as 'DMC-MD', the technical name of the applicant) ) and the generated artificial intelligence model learned using 3D interaction data between proteins and compounds (hereinafter referred to as '3bmGPT', the technical name coined by the applicant) is applied.

Here, the DMC-PRE and GAP-Dock are technologies applied in advance to discover active substances through the automated discovery platform for hit substances, and DMC-SCR is applied to the molecular dynamics analysis system (Auto-MD simulation). , It is a technology applied later to discover active substances through the hit material automated discovery platform, and LEAD-GEN is a technology applied to discover lead materials through the lead material automated discovery platform.

And DMC-MD is applied to the molecular dynamics analysis system (Auto-MD simulation), the hit material automated discovery platform, the lead material automated discovery platform, and drug reaction (ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) is a technology that verifies the combined stability of results derived from an automated analysis platform, and 3bmGPT is applied to a generative artificial intelligence system (GPT/BERT) to identify active substances through the automated discovery platform for hit substances. It is a technology that selects the analyte target for calculation.

Specifically, looking at the process of discovering effective substances of the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in Figure 3, the analyte substances selected through the 3bmGPT are classified into the DMC-PRE and GAP- Dock is applied for preliminary screening, DMC-SCR is applied for in-depth screening, and DMC-MD is applied to verify binding stability to derive effective substances.

And looking at the lead material discovery process of the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in Figure 4, the LEAD-GEN is applied to discover the lead material, and DMC-MD is applied. By verifying the binding stability, the lead material is derived.

The present invention relates to a derivative generation method (LEAT-GEN) applied to the artificial intelligence new drug platform as described above, and will be described in detail below.

Figure 5 is a flowchart showing a method of discovering a leader material through the generation of a derivative using the binding pocket structure of the target protein according to a specific embodiment of the present invention, and Figure 6 shows the anchor (anchor), which is the position to be replaced in the derivative generation process according to the present invention. It is a conceptual diagram showing a method of selecting an anchor atom, and Figures 7 to 9 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention, and Figures 10 and 11 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention. It is a conceptual diagram showing a method of generating a derivative by selecting an R-group in the derivative generation process, and Figure 12 shows the bond form of the binding group and the R-group being filtered by comparing it with the bond form of the existing material in the derivative generation process according to the present invention. This is an example diagram showing the combined end filtering process.

First, the method for generating a derivative using the binding pocket structure of the target protein according to the present invention largely includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed, and (B) within the target protein. It is performed including calculating the pocket space inside the binding pocket, (C) generating a derivative, and (D) filtering and selecting the generated derivative.

To describe each of these performance processes, first, (A) the anchor atom selection step is a process of selecting a binding site for substitution of the effective substance (Hit Compound) to be analyzed, as shown in FIG. 6.

Specifically, as shown in 'A' of Figure 6, when a compound that is an effective substance is bound to a target protein, the interaction (binding information) between the compound and the target protein Calculate the profile.

At this time, the interaction (combination information) profile can be acquired with the applicant's software ENVA, but it is also possible to obtain it through other commercially available software (FEP+, MMPBSA, etc.).

Then, as shown in 'B' of Figure 6, from the interaction profile between the compound and the target protein, one of the bonds of the mother compound that binds to the target protein (dingle) The bomd is cut to create atomic fragments on both sides of the cut portion.

Such cuts and creation of atomic fragments are performed throughout the single bonds of the mother compound, and atomic fragments twice the number of single bonds are created. It is calculated.

And, among these calculated atomic fragments, only atomic fragments containing atoms within a preset range (1 to 12) are selected, and the remaining atomic fragments are excluded from the analysis target.

In other words, the purpose of calculating the atomic fragment is to derive an atomic fragment that has little effect on the bond between the mother compound and the target protein and is therefore valuable as a replacement target. If the number of atoms constituting the fragment is too small, there is a small possibility that a new compound will be derived through substitution. Conversely, if the number of atoms constituting the fragment is too large, there is a risk that the properties of the compound will be significantly different after substitution. Because there is a castle.

Next, for the selected atomic fragments, as shown in 'B' in FIG. 6, the interaction efficiency is calculated, and the atomic fragments that have a low impact on the bonding interaction are cut. ) is selected as the anchor.

At this time, the interaction efficiency can be calculated as the average value of the binding energy of each atom constituting the atomic fragment.

Here, the compound from which the atomic fragments are cut is called a scaffold.

Next, (B) the step of calculating the pocket space within the target protein, as shown in Figures 7 to 9, when the scaffold from which the atomic fragments have been removed is bound to the pocket of the target protein, the pocket Calculate the internal space.

For this purpose, in the present invention, as shown in FIG. 7, a cylindrical region of a preset size (length 10Å, radius 10Å) is extracted centering on the anchor site of the target protein to create a cylinder filter (sylinder). do.

The size of the cylinder filter can be changed depending on the structure of the scaffold and computational system resources, but it is preferably set to sufficiently cover the available area of the binding pocket of the scaffold.

Meanwhile, dots arranged at equal intervals are set on the cylinder filter, and the dots are bent by interaction energy with protein atoms. In 'A' of Figure 7, the display points included in the cylinder filter (cylinder) are color-coded according to the interaction energy with protein atoms.

In the case of the cylinder filter shown in FIG. 7, the display point within 0.7r is displayed as a red zone (clash area) according to the r value calculated from the vdw radius (r) calculation formula shown in 'B' in FIG. 7. The marking points within 0.7r ~ 1r were marked as yellow zone (buffer area), the marking points within 1r ~ 1.3r were marked as green zone (contact area), and the marking points within 1.3r ~ 3 Å were marked as gray zone ( close area).

At this time, it can be judged that the gray zone with low interaction energy is highly likely to be applied as a pocket region with adequate free space where compound and bonding can be induced.

Thereafter, the anchor portion of the scaffold is placed close to the anchor portion of the cylinder filter. At this time, it is preferable that the scaffold bonding axis direction maintains the original bonding direction.

This process is shown in 'A' and 'B' of FIG. 8.

Next, as shown in 'C' of FIG. 8, the areas (Red zone, Yellow zone, Green zone) with high interaction energy among the dots are excluded from the cylinder filter (sylinder). And, among the dots, only areas (Gray zone, Dark Gray zone) with low interaction energy are left.

And the remaining areas are clustered (GMM clustering) in spatial units, as shown in 'D' of FIG. 8.

Next, among the clustered areas, only some of the crusting areas are derived according to the proximity to the scaffold anchor and the size of the clustered area.

In the case of 'E' in FIG. 8, an embodiment in which the two largest clustering regions are selected among the crusting regions connected to the scaffold anchor is shown.

In this way, as shown in 'F' of FIG. 8, the selected clustering region is derived as a pocket space (target volume) within the target protein.

In this way, the reason for deriving the pocket space (target volume) within the target protein is to select an R-group (Replace atom group) to replace the atom fragment, and R depending on the size of the pocket space (target volume) -This is to select and select groups.

Figure 9 shows an example of deriving the pocket space (target volume) within the actual target protein using the method described above.

In the case of the example shown in Figure 9, for compound 6op0, 9 anchors were derived and analyzed, and an analysis result (answer) matching the actual number of ligand atoms of the compound was derived.

Here, the size is indicated by dividing the number of display points (dots) by 300, reflecting that the volume of the heavy atom (C, N, O) is an area of about 300 display points, based on the spacing of the display points (dots). .

Next, in the step (C) of generating a derivative, as shown in FIG. 10, an R-group that matches the size of the pocket space (target volume) in the target protein calculated in the step (B) is selected.

Here, R-group (Replace atom group) is an atomic group that replaces the atomic fragment removed from the scaffold, and is selected from a database storing various atomic configurations.

Afterwards, the selected R-group is bound to the anchor of the scaffold to generate a derivative.

At this time, as shown in FIG. 10, the position where one selected R-group is bonded to the anchor of the scaffold is changed to create derivatives with various bonding structures.

In addition, even in the case of derivatives with the same bonding structure, as shown in FIG. 11, a number of derivatives are created by varying the bonding form (bonding angle) of the R-group and the scaffold.

In this case, the creation of derivatives with various binding forms by changing the binding form of the R-group and the scaffold is created by extracting the linker of the anchor portion of the target protein, as shown in Figure 11. .

Here, the linker is obtained by extracting only the portion adjacent to the anchor, and can be created by extracting only a preset number of bonding atoms from the anchor.

In the example shown in Figure 11, an example in which three atomic bonds are extracted from the anchor to the bonding end is shown.

Afterwards, the bond form of the binding group and R-group is varied to produce derivatives with various bond forms.

Lastly, the (D) derivative filtering step refers to filtering the bond form of the Anger portion of the generated derivative to exclude derivatives with a bond form that is unlikely to exist.

In the present invention, the derivative filtering is largely carried out by two methods. The first method is linker filtering, which filters the bond form of the linking group and the R-group by comparing it with an existing substance database, and the second method is linking group filtering of the derivative. Shape filtering is where the derivatives are filtered according to the amount of collision generated within the pocket after binding to the cylinder filter according to the bond type of the R-group.

As shown in FIG. 12, the linker filtering is performed by comparing various linkage forms of the linker and R-group with the linkage structure of existing substances in existing compound (and protein) databases (ChEMBL, etc.) Select in preparation.

Specifically, binding structures with the same composition as the linking group (linker) and R-group are selected from the compound database, their bonding forms are analyzed, and the linking form (linker) and the linking form that exist in the actual material are selected from the compound database. Only derivatives with a bound R-group are selected.

At this time, the process of calculating the binding form of a specific binding portion from actual compound data can be calculated by SMARTS, as shown in FIG. 12.

Next, as shown in FIG. 13, the bond shape filtering (shape filtering) is performed by binding the derivative to the cylinder filter (cylinder) for each R-group bond type, and then measuring the amount of collision generated in the pocket. Calculate

Here, the collision refers to a collision between an atom of the R-group and an atom of the target protein.

The rights of the present invention are not limited to the embodiments described above but are defined by the claims, and those skilled in the art can make various changes and modifications within the scope of the claims. This is self-evident.

The present invention is an in silico prescreening technology applied to the process of discovering active substances for new drugs using CADD (Computer aided drug discovery) or AI drug platform for new drug development, and is used to detect various types of derivatives from selected hit-compounds. The present invention relates to a method of generating a method for discovering candidate substances through an artificial intelligence new drug platform, improving derived effective substances (hit compounds) and generating various derivatives for learning analytical AI algorithms. It works.

Claims

(A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed;

(B) calculating the pocket space inside the binding pocket in the target protein;

(C) producing a derivative; and

(D) A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, comprising the step of filtering and selecting the generated derivative.
According to claim 1,

The step (A) is,

(A1) calculating an interaction (binding information) profile between the compound and the target protein;

(A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion;

(A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments;

(A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, which is performed including the step of:
According to claim 2,

The interaction efficiency is,

A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, characterized in that it is calculated as the average value of the binding energy of each atom constituting the atomic fragment.
According to claim 1,

In step (B),

(B1) generating a cylinder filter by extracting a region of a preset size centered on the compound binding site (anchor site) of the target protein;

(B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms;

(B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter;

(B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots;

(B5) clustering (GMM clustering) in the dot space unit;

(B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region to derive only some of the clustering regions and derive them as the pocket space (target volume) within the target protein. and calculating the size of the pocket area; A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, characterized in that it is performed including.
According to claim 4,

In step (C),

(C1) selecting an R-group whose size corresponds to the size of the pocket space (target volume) in the target protein calculated in step (B);

(C2) Binding pocket structure of the target protein through the artificial intelligence new drug platform, which is performed including the step of binding the selected R-group to the anchor of the scaffold to create a derivative. Derivative generation method using .
According to claim 5,

In step (C2),

An artificial intelligence new drug platform characterized by generating derivatives with multiple different binding structures for the same R-group by varying the binding position of the R-group bound to the anchor of the scaffold. A method of generating a derivative using the binding pocket structure of the target protein.
According to claim 6,

In step (C2),

An artificial intelligence new drug platform characterized by creating derivatives with multiple different bond forms for R-groups with the same bond structure by varying the bond form (angle) of the R-group and scaffold. A method of generating a derivative using the binding pocket structure of a target protein.
According to claim 7,

The bond form (angle) of the R-group and scaffold is,

A binding pocket for a target protein through an artificial intelligence new drug platform, which is created by extracting a linker created by extracting a portion adjacent to the anchor portion of the target protein and changing the binding form of the linker and the R-group. Method for creating derivatives using structure.
According to claim 1,

In step (D),

A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, comprising filtering the binding form of the binding group and the R-group by comparing it with an existing substance database. .
According to claim 1,

In step (D),

After binding the derivative to the cylinder filter according to the bond type of the R-group, the derivative is filtered according to the amount of collision generated in the pocket. A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform.