WO2024063583A1 - Method for generating derivatives using binding pocket structure of target protein through artificial intelligence drug discovery platform - Google Patents

Method for generating derivatives using binding pocket structure of target protein through artificial intelligence drug discovery platform Download PDF

Info

Publication number
WO2024063583A1
WO2024063583A1 PCT/KR2023/014453 KR2023014453W WO2024063583A1 WO 2024063583 A1 WO2024063583 A1 WO 2024063583A1 KR 2023014453 W KR2023014453 W KR 2023014453W WO 2024063583 A1 WO2024063583 A1 WO 2024063583A1
Authority
WO
WIPO (PCT)
Prior art keywords
binding
target protein
derivative
artificial intelligence
group
Prior art date
Application number
PCT/KR2023/014453
Other languages
French (fr)
Korean (ko)
Inventor
정종선
홍종희
윤승민
김보수
Original Assignee
(주)신테카바이오
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)신테카바이오 filed Critical (주)신테카바이오
Priority claimed from KR1020230126606A external-priority patent/KR20240040666A/en
Priority claimed from KR1020230126611A external-priority patent/KR20240040671A/en
Priority claimed from KR1020230126608A external-priority patent/KR20240040668A/en
Priority claimed from KR1020230126609A external-priority patent/KR20240040669A/en
Priority claimed from KR1020230126607A external-priority patent/KR20240040667A/en
Priority claimed from KR1020230126610A external-priority patent/KR20240040670A/en
Publication of WO2024063583A1 publication Critical patent/WO2024063583A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/10Analysis or design of chemical reactions, syntheses or processes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Definitions

  • the present invention is an in silico prescreening technology applied to the process of discovering active substances for new drugs using CADD (Computer aided drug discovery) or AI drug platform for new drug development.
  • CADD Computer aided drug discovery
  • AI drug platform for new drug development.
  • Various types of derivatives are discovered from selected hit-compounds. It is about how to create .
  • LigandScout Pharmacophore ensemble approach Generate ensemble of pharmacophores from multiple active ligands or protein conformations.
  • a pre-screening step based on chemical properties or structural similarity and 3D docking are used. It is divided into an in-depth screening step that utilizes protein-ligand interaction information.
  • the pre-screening step is generally used for the purpose of reducing the number of candidates from large-scale substances, a simple number-based discrimination algorithm such as rule of 5 (Guideline for drug design, Lipinski) is used. Because the information used for screening is limited, it is generally known to be reliable within the range of maintaining the screening rate at 10%.
  • rule of 5 Guideline for drug design, Lipinski
  • pre-screening methods such as comparative analysis of the similarity of features including chemical properties or screening methods based on similarity between substances as a method of characterizing two-dimensional patterns of molecular structures are used.
  • accuracy T/P, true/positive
  • the present invention was created to solve the above problems.
  • the present invention is intended to improve the derived effective substances (hit compounds) and learn the analysis AI algorithm.
  • the aim is to provide a method for generating derivatives using the binding pocket structure of a target protein that can generate various derivatives.
  • the present invention provides a method for generating a derivative using the binding pocket structure of the target protein, which reflects the shape of the binding space of the target protein to improve binding ability and generate a derivative with improved actual binding potential when producing a derivative. This is what we want to provide.
  • the present invention includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed; (B) calculating the pocket space inside the binding pocket in the target protein; (C) producing a derivative; And (D) filtering and selecting the produced derivative.
  • the step (A) includes (A1) calculating an interaction (binding information) profile between the compound and the target protein; (A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion; (A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments; (A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. It may be performed including the step of:
  • the interaction efficiency may be calculated as the average value of the bond energy of each atom constituting the atomic fragment.
  • the step (B) includes: (B1) extracting a region of a preset size centered on the anchor site of the target protein to create a cylinder filter (sylinder); (B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms; (B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter; (B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots; (B5) clustering (GMM clustering) in the dot space unit; (B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region to derive only some of the clustering regions and derive them as the pocket space (target volume) within the target protein. and calculating the size of the pocket area; It may also be performed including.
  • step (C) includes (C1) selecting an R-group whose size corresponds to the size of the pocket space (target volume) in the target protein calculated in step (B); (C2) It may also be performed including the step of binding the selected R-group to the anchor of the scaffold to generate a derivative.
  • step (C2) the bonding position of the R-group bonded to the anchor of the scaffold is changed to produce derivatives with a plurality of different bonding structures for the same R-group. It may be possible.
  • the bond form (angle) of the R-group and the scaffold can be varied to produce derivatives with a plurality of different bond forms for the R-group having the same bond structure. It may be possible.
  • the binding form (angle) of the R-group and the scaffold is obtained by extracting the linker created by extracting the adjacent part from the anchor part of the target protein, and the binding form of the linking group and the R-group It can also be created by changing .
  • step (D) may include linker filtering in which the linkage form of the linker and the R-group is filtered by comparing it with an actual substance database.
  • the derivative is bound to the cylinder filter (cylinder) for each bond type of the R-group of the derivative, and then the derivative is filtered according to the amount of collision generated in the pocket. ) may also be included.
  • the present invention has the effect of improving the derived hit compound and generating various derivatives for learning the analysis AI algorithm when discovering candidate substances through the artificial intelligence new drug platform.
  • the binding ability is improved by reflecting the shape of the binding space of the target protein, and there is an effect of generating a derivative with an improved actual binding possibility.
  • the derivative created by the invention is derived from a binding form (pose) within the pocket, which has the effect of improving the possibility of deriving the optimal binding form when molecular dynamics simulation is performed on an artificial intelligence new drug platform.
  • AI-drug platform an artificial intelligence drug platform
  • Figure 2 is a conceptual diagram showing the cloud service structure of an artificial intelligence new drug platform to which the present invention is applied.
  • Figure 3 is a conceptual diagram showing the effective substance discovery process of the artificial intelligence new drug platform to which the present invention is applied.
  • Figure 4 is a conceptual diagram showing the lead material discovery process of the artificial intelligence new drug platform to which the present invention is applied.
  • Figure 5 is a flowchart showing a method for discovering a lead material through the generation of a derivative using the binding pocket structure of a target protein according to a specific embodiment of the present invention.
  • Figure 6 is a conceptual diagram showing a method of selecting an anchor atom, which is a position to be replaced, in the derivative generation process according to the present invention.
  • Figures 7 to 9 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention.
  • 10 and 11 are conceptual diagrams showing a method of generating a derivative by selecting an R-group in the derivative generation process according to the present invention.
  • Figure 12 is an example diagram illustrating the bonding end filtering process in which the bonding form of the bonding group and the R-group is filtered by comparing it with the bonding form of the existing material in the derivative generation process according to the present invention.
  • Figure 13 is an example diagram illustrating a combination type filtering process in which filtering is performed according to whether or not a collision occurs within the pocket of the derivative generated in the derivative generation process according to the present invention.
  • the present invention includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed; (B) calculating the pocket space inside the binding pocket in the target protein; (C) producing a derivative; And (D) filtering and selecting the produced derivative; wherein the step (A) is performed to determine the interaction (binding information) profile between the (A1) compound and the target protein.
  • a calculating step comprising: (A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion; (A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments; (A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor.
  • Step (B) includes: (B1) generating a cylinder filter by extracting a region of a preset size centered on the anchor site of the target protein; (B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms; (B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter; (B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots; (B5) clustering (GMM clustering) in the dot space unit; (B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region, and only some of the clustering regions are derived to be derived as a pocket space (target volume) within the target protein. and calculating the size of the pocket area; It can be performed including.
  • each block in the attached block diagram and each step in the flow chart may be performed by computer program instructions (execution engine), and these computer program instructions can be installed on a processor of a general-purpose computer, special-purpose computer, or other programmable data processing equipment. Since it can be mounted, the instructions executed through a processor of a computer or other programmable data processing equipment create a means of performing the functions described in each block of the block diagram or each step of the flow diagram.
  • computer program instructions can also be mounted on a computer or other programmable data processing equipment, so a series of operation steps are performed on the computer or other programmable data processing equipment to create a process that is executed by the computer and runs on the computer or other program. Instructions that perform possible data processing equipment may also provide steps for executing functions described in each block of the block diagram and each step of the flow diagram.
  • each block or each step may represent a module, segment, or portion of code containing one or more executable instructions for executing specified logical functions, and in some alternative embodiments, the blocks or steps referred to in the blocks or steps may represent a portion of code. It is also possible for functions to occur out of order.
  • Figure 1 is a configuration diagram showing the overall configuration of an artificial intelligence new drug platform (AI-drug platform) to which the present invention is applied
  • Figure 2 is a conceptual diagram showing the cloud service structure of an artificial intelligence new drug platform to which the present invention is applied.
  • Figure 3 is a conceptual diagram showing the active material discovery process of the artificial intelligence new drug platform to which the present invention is applied
  • Figure 4 is a conceptual diagram showing the lead material discovery process of the artificial intelligence new drug platform to which the present invention is applied.
  • the AI-drug platform to which the present invention is applied is basically a platform that performs the entire process of discovering new drug candidates in the preclinical stage, and applicants can be serviced through the cloud (STB CLOUD).
  • new drugs include synthetic new drugs (small molecules) and antibody drugs, and the artificial intelligence new drug platform (AI-drug platform) according to the present invention provides a discovery process for all of them.
  • the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in FIG. 1, includes a hit material automated discovery platform, a lead material automated discovery platform, and a drug reaction (ADMET) , Absorption, Distribution, Metabolism, Excretion & Toxicity) and an automated analysis platform.
  • a hit material automated discovery platform a lead material automated discovery platform
  • ADMET drug reaction
  • Absorption, Distribution, Metabolism, Excretion & Toxicity Absorption, Distribution, Metabolism, Excretion & Toxicity
  • the artificial intelligence new drug platform (AI-drug platform) according to the present invention performs the entire new drug development process of selecting active substances, discovering lead substances among them, and then selecting candidate substances through drug reaction analysis. It is an artificial intelligence platform designed to do this.
  • Figure 2 shows the cloud service process of the AI-drug platform according to the present invention. As shown, the present invention discovers effective substances, generates lead substances, and ADMET/PK. It provides all areas of the drug discovery and development process, from pharmacogenetics to biomarkers.
  • the artificial intelligence new drug platform includes three individual artificial intelligence systems: generative artificial intelligence system (GPT/BERT), A 3D structural artificial intelligence system (ED-CNN) and a molecular dynamics analysis system (Auto-MD simulation) are applied.
  • generative artificial intelligence system GPS/BERT
  • ED-CNN 3D structural artificial intelligence system
  • Auto-MD simulation a molecular dynamics analysis system
  • each hit material automated discovery platform, lead material automated discovery platform, and drug reaction (ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) as a specific method to run an automated analysis platform, discovering effective substances through 3D structural information between proteins and ligands (hereinafter referred to as 'DMC-PRE', the technical name coined by the applicant), central atom vector-based protein - Analysis of docking structure between ligands (hereinafter referred to as 'GAP-Dock', the technical name of the applicant), prediction of optimized binding structure between proteins and compounds using a 3D-CNN learning model (hereinafter referred to as 'DMC-SCR', the technical name of the applicant) , generation of derivatives through the binding pocket structure of the target protein (hereinafter referred to as 'LEAD-GEN', the technical name of the applicant), analysis of protein-compound interaction stability through molecular dynamics simulation data (hereinafter referred to as referred to as 'DMC-PRE', the technical
  • DMC-PRE and GAP-Dock are technologies applied in advance to discover active substances through the automated discovery platform for hit substances
  • DMC-SCR is applied to the molecular dynamics analysis system (Auto-MD simulation).
  • Auto-MD simulation It is a technology applied later to discover active substances through the hit material automated discovery platform
  • LEAD-GEN is a technology applied to discover lead materials through the lead material automated discovery platform.
  • DMC-MD is applied to the molecular dynamics analysis system (Auto-MD simulation), the hit material automated discovery platform, the lead material automated discovery platform, and drug reaction (ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) is a technology that verifies the combined stability of results derived from an automated analysis platform, and 3bmGPT is applied to a generative artificial intelligence system (GPT/BERT) to identify active substances through the automated discovery platform for hit substances. It is a technology that selects the analyte target for calculation.
  • GPS/BERT generative artificial intelligence system
  • the analyte substances selected through the 3bmGPT are classified into the DMC-PRE and GAP- Dock is applied for preliminary screening, DMC-SCR is applied for in-depth screening, and DMC-MD is applied to verify binding stability to derive effective substances.
  • the LEAD-GEN is applied to discover the lead material, and DMC-MD is applied. By verifying the binding stability, the lead material is derived.
  • the present invention relates to a derivative generation method (LEAT-GEN) applied to the artificial intelligence new drug platform as described above, and will be described in detail below.
  • LEAT-GEN derivative generation method
  • Figure 5 is a flowchart showing a method of discovering a leader material through the generation of a derivative using the binding pocket structure of the target protein according to a specific embodiment of the present invention
  • Figure 6 shows the anchor (anchor), which is the position to be replaced in the derivative generation process according to the present invention.
  • It is a conceptual diagram showing a method of selecting an anchor atom
  • Figures 7 to 9 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention
  • Figures 10 and 11 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention.
  • FIG. 12 It is a conceptual diagram showing a method of generating a derivative by selecting an R-group in the derivative generation process, and Figure 12 shows the bond form of the binding group and the R-group being filtered by comparing it with the bond form of the existing material in the derivative generation process according to the present invention.
  • the method for generating a derivative using the binding pocket structure of the target protein largely includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed, and (B) within the target protein. It is performed including calculating the pocket space inside the binding pocket, (C) generating a derivative, and (D) filtering and selecting the generated derivative.
  • the anchor atom selection step is a process of selecting a binding site for substitution of the effective substance (Hit Compound) to be analyzed, as shown in FIG. 6.
  • the interaction (combination information) profile can be acquired with the applicant's software ENVA, but it is also possible to obtain it through other commercially available software (FEP+, MMPBSA, etc.).
  • Such cuts and creation of atomic fragments are performed throughout the single bonds of the mother compound, and atomic fragments twice the number of single bonds are created. It is calculated.
  • the purpose of calculating the atomic fragment is to derive an atomic fragment that has little effect on the bond between the mother compound and the target protein and is therefore valuable as a replacement target. If the number of atoms constituting the fragment is too small, there is a small possibility that a new compound will be derived through substitution. Conversely, if the number of atoms constituting the fragment is too large, there is a risk that the properties of the compound will be significantly different after substitution. Because there is a castle.
  • the interaction efficiency is calculated, and the atomic fragments that have a low impact on the bonding interaction are cut. ) is selected as the anchor.
  • the interaction efficiency can be calculated as the average value of the binding energy of each atom constituting the atomic fragment.
  • the compound from which the atomic fragments are cut is called a scaffold.
  • a cylindrical region of a preset size (length 10 ⁇ , radius 10 ⁇ ) is extracted centering on the anchor site of the target protein to create a cylinder filter (sylinder). do.
  • the size of the cylinder filter can be changed depending on the structure of the scaffold and computational system resources, but it is preferably set to sufficiently cover the available area of the binding pocket of the scaffold.
  • dots arranged at equal intervals are set on the cylinder filter, and the dots are bent by interaction energy with protein atoms.
  • the display points included in the cylinder filter (cylinder) are color-coded according to the interaction energy with protein atoms.
  • the display point within 0.7r is displayed as a red zone (clash area) according to the r value calculated from the vdw radius (r) calculation formula shown in 'B' in FIG. 7.
  • the marking points within 0.7r ⁇ 1r were marked as yellow zone (buffer area), the marking points within 1r ⁇ 1.3r were marked as green zone (contact area), and the marking points within 1.3r ⁇ 3 ⁇ were marked as gray zone ( close area).
  • the gray zone with low interaction energy is highly likely to be applied as a pocket region with adequate free space where compound and bonding can be induced.
  • the anchor portion of the scaffold is placed close to the anchor portion of the cylinder filter. At this time, it is preferable that the scaffold bonding axis direction maintains the original bonding direction.
  • the areas (Red zone, Yellow zone, Green zone) with high interaction energy among the dots are excluded from the cylinder filter (sylinder). And, among the dots, only areas (Gray zone, Dark Gray zone) with low interaction energy are left.
  • GMM clustering In spatial units, as shown in 'D' of FIG. 8.
  • the selected clustering region is derived as a pocket space (target volume) within the target protein.
  • the reason for deriving the pocket space (target volume) within the target protein is to select an R-group (Replace atom group) to replace the atom fragment, and R depending on the size of the pocket space (target volume) -This is to select and select groups.
  • Figure 9 shows an example of deriving the pocket space (target volume) within the actual target protein using the method described above.
  • the size is indicated by dividing the number of display points (dots) by 300, reflecting that the volume of the heavy atom (C, N, O) is an area of about 300 display points, based on the spacing of the display points (dots). .
  • an R-group that matches the size of the pocket space (target volume) in the target protein calculated in the step (B) is selected.
  • R-group (Replace atom group) is an atomic group that replaces the atomic fragment removed from the scaffold, and is selected from a database storing various atomic configurations.
  • the selected R-group is bound to the anchor of the scaffold to generate a derivative.
  • the position where one selected R-group is bonded to the anchor of the scaffold is changed to create derivatives with various bonding structures.
  • the creation of derivatives with various binding forms by changing the binding form of the R-group and the scaffold is created by extracting the linker of the anchor portion of the target protein, as shown in Figure 11. .
  • the linker is obtained by extracting only the portion adjacent to the anchor, and can be created by extracting only a preset number of bonding atoms from the anchor.
  • the bond form of the binding group and R-group is varied to produce derivatives with various bond forms.
  • the (D) derivative filtering step refers to filtering the bond form of the Anger portion of the generated derivative to exclude derivatives with a bond form that is unlikely to exist.
  • the derivative filtering is largely carried out by two methods.
  • the first method is linker filtering, which filters the bond form of the linking group and the R-group by comparing it with an existing substance database
  • the second method is linking group filtering of the derivative.
  • Shape filtering is where the derivatives are filtered according to the amount of collision generated within the pocket after binding to the cylinder filter according to the bond type of the R-group.
  • the linker filtering is performed by comparing various linkage forms of the linker and R-group with the linkage structure of existing substances in existing compound (and protein) databases (ChEMBL, etc.) Select in preparation.
  • binding structures with the same composition as the linking group (linker) and R-group are selected from the compound database, their bonding forms are analyzed, and the linking form (linker) and the linking form that exist in the actual material are selected from the compound database. Only derivatives with a bound R-group are selected.
  • the bond shape filtering (shape filtering) is performed by binding the derivative to the cylinder filter (cylinder) for each R-group bond type, and then measuring the amount of collision generated in the pocket.
  • the collision refers to a collision between an atom of the R-group and an atom of the target protein.
  • the present invention is an in silico prescreening technology applied to the process of discovering active substances for new drugs using CADD (Computer aided drug discovery) or AI drug platform for new drug development, and is used to detect various types of derivatives from selected hit-compounds.
  • the present invention relates to a method of generating a method for discovering candidate substances through an artificial intelligence new drug platform, improving derived effective substances (hit compounds) and generating various derivatives for learning analytical AI algorithms. It works.

Abstract

The present invention relates to a method for generating various forms of derivatives from selected active substances (Hit-compound), which is an in-silico prescreening technique applicable to the process of discovering new drug active substances in computer-aided drug discovery (CADD) or AI drug platforms for development of new drugs, the method comprising the steps of: (A) selecting an anchor atom as a substitution target binding site in the binding structure of a compound to be analyzed; (B) calculating a pocket space within a binding pocket of a target protein; (C) generating derivatives; and (D) filtering and selecting the generated derivatives. According to the present invention, the use of the artificial intelligence drug discovery platform in exploring candidate substances allows for the improvement of discovered active substances (hit compounds) and the generation of various derivatives for training the analysis AI algorithms.

Description

인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법Method for generating derivatives using the binding pocket structure of the target protein through an artificial intelligence new drug platform
본 발명은 신약개발을 위한 CADD(Computer aided drug discovery) 또는 AI drug platform의 신약 유효물질 발굴 과정에 적용되는 in silico prescreening 기술로, 선별된 유효물질(Hit-compound)로부터 다양한 형태의 유도체(Derivative)를 생성하는 방법에 관한 것이다.The present invention is an in silico prescreening technology applied to the process of discovering active substances for new drugs using CADD (Computer aided drug discovery) or AI drug platform for new drug development. Various types of derivatives are discovered from selected hit-compounds. It is about how to create .
일반적으로 신약개발은 후보물질 발굴과 스크리닝 과정을 거쳐, 최적화 과정, 비임상시험/독성시험 및 임상시험 등의 과정을 통해 이루어지는데, 최근에는 신약 후보물질 발굴에 소요되는 시간과 비용을 절감하기 위하여, 컴퓨팅 분석 기술(AI 등)이 적용되고 있다. In general, new drug development is carried out through a process of discovery and screening of candidate substances, followed by optimization, non-clinical testing/toxicity testing, and clinical trials. Recently, in order to reduce the time and cost required to discover new drug candidates, , computing analysis technologies (AI, etc.) are being applied.
이에, 현재 후보물질 분석 시스템(computational screening)분야에서는 다양한 분석도구들이 사용되고 있고, 대표적인 분석도구들은 아래 [표 1]과 같다.Accordingly, various analysis tools are currently being used in the field of candidate material analysis systems (computational screening), and representative analysis tools are shown in [Table 1] below.
Program nameProgram name TypeType PrinciplePrinciple Analysis time for a poseAnalysis time for a pose
AutoDockAutoDock 3D docking3D docking Grid based semi-empherical scoring function
whichconsidersvanderWaalsforces,electrostaticinteractions,anddesolvation
Grid based semi-empherical scoring function
whichconsidersVanderWaalsforces,electrostaticinteractions,anddesolvation
Minutes to HoursMinutes to Hours
GlideGlide 3D docking3D docking empirical scoring function (GlideScore)
which considers terms like Coulombic, van der Waals, and solvation effects
empirical scoring function (GlideScore)
which considers terms like Coulombic, van der Waals, and solvation effects
0.2-2.4 min0.2-2.4min
GOLD (Genetic Optimisation for Ligand Docking)GOLD (Genetic Optimization for Ligand Docking) 3D docking3D docking ChemScore based scoring function considering multiple binding modes.ChemScore based scoring function considering multiple binding modes. Minutes to HoursMinutes to Hours
FEP+FEP+ Molecular dynamicsMolecular dynamics Free Energy PerturbationFree Energy Perturbation Hours to Days (depending on system size & setup)Hours to Days (depending on system size & setup)
PLUMED/GROMACSPLUMED/GROMACS Molecular dynamicsMolecular dynamics MetadynamicsMetadynamics Hours to DaysHours to Days
GaussianGaussian QM/MMQM/MM Quantum Mechanics/Molecular MechanicsQuantum Mechanics/Molecular Mechanics Hours to Days (depending on QM region size)Hours to Days (depending on QM region size)
AMBERAMBER Molecular dynamicsMolecular dynamics Alchemical Free Energy (e.g., thermodynamic integration)Alchemical Free Energy (e.g., thermodynamic integration) Hours to DaysHours to Days
DeepChemDeepChem Deep LearningDeep Learning Neural Networks for Molecular SystemsNeural Networks for Molecular Systems Seconds to Minutes (once trained)Seconds to Minutes (once trained)
gninagnina Deep Learning Deep Learning 3D convolutional neural network based affinity prediction3D convolutional neural network based affinity prediction 2.5 minutes
(once trained)
2.5 minutes
(once trained)
Phase
Discovery Studio
MOE
Phase
Discovery Studio
MOE
Pharmacophore modelingPharmacophore modeling Generate pharmacophore models from known active compounds or protein structures.Generate pharmacophore models from known active compounds or protein structures. Minutes to HoursMinutes to Hours
LigandScoutLigandScout Pharmacophore ensemble approachPharmacophore ensemble approach Generate ensemble of pharmacophores from multiple active ligands or protein conformations.Generate ensemble of pharmacophores from multiple active ligands or protein conformations. Hours Hours
ROCSROCS 3D align
2D fingerprints
Force field
3D align
2D fingerprints
Force field
Rapid overlay of chemical structures using shape and chemical features for virtual screening.Rapid overlay of chemical structures using shape and chemical features for virtual screening. Minutes to HoursMinutes to Hours
한편, 화합물(compound)로부터 신약 후보물질(drug candidate)을 가상으로 스크리닝 하는 과정에서는, 화학적 성질(chemical property)이나 구조적 유사성에 기반한 선행 스크리닝(pre-screening) 단계와 3차원 결합(3D docking)된 단백질-리간드 간 상호작용(phrotein-ligand interaction) 정보를 활용하는 심층 스크리닝(screening) 단계로 구분된다. Meanwhile, in the process of virtually screening drug candidates from compounds, a pre-screening step based on chemical properties or structural similarity and 3D docking are used. It is divided into an in-depth screening step that utilizes protein-ligand interaction information.
선행 스크리닝(Pre-screening) 단계는 일반적으로 대규모의 물질로부터 후보군의 수를 줄이는 목적으로 사용되고 있기 때문에 rule of 5(약물 디자인을 위한 가이드라인, Lipinski)등과 같이 숫자 기반의 간단한 판별 알고리즘을 사용하나, 스크리닝에 사용 되는 정보가 제한적이기 때문에 통상적으로 스크리닝 비율을 10% 수준으로 유지하는 범위에서 신뢰도를 보이는 것으로 알려져 있다. Since the pre-screening step is generally used for the purpose of reducing the number of candidates from large-scale substances, a simple number-based discrimination algorithm such as rule of 5 (Guideline for drug design, Lipinski) is used. Because the information used for screening is limited, it is generally known to be reliable within the range of maintaining the screening rate at 10%.
그러나 컴퓨팅 기술이 발전하면서 수십억 개 단위의 분석대상에 대한 연산에 대한 수요가 증가하고 있으며, 이로 인해 기존의 스크리닝(screening) 전략을 고도화 해야하는 필요성이 발생되었다.However, as computing technology develops, the demand for calculations on billions of analysis targets is increasing, resulting in the need to upgrade existing screening strategies.
이를 해결하기 위하여 화학적 성질(chemical property)을 포함하는 특성(feature)들의 유사도를 비교 분석하거나 분자구조의 2차원적 패턴을 특성화하는 방법으로 물질간 유사도에 기반한 스크리닝 방법 등이 선행 스크리닝(pre-screening)의 고도화 전략으로 연구되었으나 선별결과 중 정확도(T/P, true/positive) 비율이 높지 않고 모든 물질의 구조를 패턴화할 수 없다는 한계가 있었다.To solve this problem, pre-screening methods such as comparative analysis of the similarity of features including chemical properties or screening methods based on similarity between substances as a method of characterizing two-dimensional patterns of molecular structures are used. ) was studied as an advanced strategy, but there were limitations in that the accuracy (T/P, true/positive) ratio among the selection results was not high and the structure of all materials could not be patterned.
한편, 비교적 스크리닝의 신뢰도가 높다고 알려진 3D 및 4D 기반의 protein-ligand binding affinity 예측 기술들의 경우, 분석소요시간이 수분에서 수시간 까지 소요되기 때문에 대규모 분석 대상을 스크리닝하는데는 적용하기 어려운 문제점이 있었다.Meanwhile, in the case of 3D and 4D based protein-ligand binding affinity prediction technologies, which are known to have relatively high screening reliability, the analysis time required ranges from minutes to several hours, making it difficult to apply them to screening large-scale analysis targets.
본 발명은 상기와 같은 문제점을 해결하기 위하여 안출 된 것으로, 본 발명은 인공지능 신약 플랫폼을 통해 후보물질을 발굴함에 있어, 도출된 유효물질(hit compound)을 개선하고, 분석 AI 알고리즘을 학습하기 위한 다양한 유도체를 생성할 수 있는 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법을 제공하고자 하는 것이다.The present invention was created to solve the above problems. In discovering candidate substances through an artificial intelligence new drug platform, the present invention is intended to improve the derived effective substances (hit compounds) and learn the analysis AI algorithm. The aim is to provide a method for generating derivatives using the binding pocket structure of a target protein that can generate various derivatives.
그리고 본 발명은 유도체를 생성함에 있어, 표적 단백질의 결합 공간의 형태를 반영하여 결합능력이 개선되고, 실제 결합 가능성이 향상된 유도체를 생성할 수 있도록 하는 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법을 제공하고자 하는 것이다.In addition, the present invention provides a method for generating a derivative using the binding pocket structure of the target protein, which reflects the shape of the binding space of the target protein to improve binding ability and generate a derivative with improved actual binding potential when producing a derivative. This is what we want to provide.
상기한 바와 같은 목적을 달성하기 위한 본 발명의 특징에 따르면, 본 발명은 (A) 분석대상 화합물의 결합구조에서 치환대상 결합부인 앵커(Anchor atom)를 선정하는 단계와; (B) 표적 단백질 내의 결합포켓 내부의 포켓 공간을 산출하는 단계와; (C) 유도체를 생성하는 단계; 그리고 (D) 생성된 유도체를 필터링하여 선별하는 단계;를 포함하여 수행된다.According to the features of the present invention for achieving the above-mentioned object, the present invention includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed; (B) calculating the pocket space inside the binding pocket in the target protein; (C) producing a derivative; And (D) filtering and selecting the produced derivative.
그리고 상기 (A) 단계는, (A1) 화합물과 표적단백질(target protein) 사이의 상호작용(interaction, 결합정보) 프로파일을 산출하는 단계와; (A2) 화합물과 표적단백질(target protein) 사이의 단일결합(single bond)을 개별적으로 절단(cut)하여, 절단부 양측의 원자편(fragment)들을 생성하는 단계와; (A3) 상기 생성된 원자편의 구성 원자수에 따라 원자편들을 필터링하는 단계와; (A4) 필터링된 원자편(fragment)들에 대하여, 상호작용 관련도(Interaction Efficiency)를 산출하여, 결합 상호작용에 관련도가 기 설정된 값 이하인 원자편의 절단(cut)부를 앵커(anchor)로 선정하는 단계를 포함하여 수행될 수 있다.And the step (A) includes (A1) calculating an interaction (binding information) profile between the compound and the target protein; (A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion; (A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments; (A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. It may be performed including the step of:
또한, 상기 상호작용 관련도(Interaction Efficiency)는, 상기 원자편(fragment)을 구성하는 각 원자들의 결합에너지의 평균값으로 산출될 수도 있다.Additionally, the interaction efficiency may be calculated as the average value of the bond energy of each atom constituting the atomic fragment.
그리고 상기 (B) 단계는, (B1) 표적 단백질을 앵커(anchor)부위를 중심으로, 기 설정된 크기의 영역을 추출하여 실린더 필터(sylinder)를 생성하는 단계와; (B2) 상기 실린더 필터(sylinder)에 등간격으로 배열된 표시지점(dot)들을 설정하고, 상기 표시지점(dot)들을 단백질 원자의 상호작용(interaction) 에너지에 의해 구분하는 단계와; (B3) 상기 화합물에서 상기 원자편(fragment)이 제거된 스케폴드(scaffold)의 앵커(anchor)부위를 실린더 필터(sylinder)의 앵커(anchor)부위에 접근시켜 배치하는 단계와; (B4) 상기 표시지점(dot) 중 상호작용(interaction) 에너지가 기 설정된 값 이상인 영역을 실린더 필터(sylinder)에서 제외시키는 단계와; (B5) 상기 표시지점(dot) 공간 단위로 클러스팅(GMM clustering)하는 단계와; (B6) 클러스팅된 영역들을 상기 스케폴드(scaffold) 앵커(anchor)와의 인접도 및 클러스팅된 영역의 크기에 따라 선별하여 일부 크러스팅 영역만을 도출하여 표적 단백질 내의 포켓 공간(target volume)으로 도출하고, 상기 포켓 영역의 크기를 산출하는 단계; 를 포함하여 수행될 수도 있다.And the step (B) includes: (B1) extracting a region of a preset size centered on the anchor site of the target protein to create a cylinder filter (sylinder); (B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms; (B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter; (B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots; (B5) clustering (GMM clustering) in the dot space unit; (B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region to derive only some of the clustering regions and derive them as the pocket space (target volume) within the target protein. and calculating the size of the pocket area; It may also be performed including.
또한, 상기 (C) 단계는, (C1) 상기 (B)단계에서 산출된 표적 단백질 내의 포켓 공간(target volume) 크기게 대응하는 크기의 R-group을 선택하는 단계와; (C2) 선택된 R-group을 스케폴드(scaffold)의 앵커(anchor)에 결합하여, 유도체(derivative)를 생성하는 단계를 포함하여 수행될 수도 있다.In addition, step (C) includes (C1) selecting an R-group whose size corresponds to the size of the pocket space (target volume) in the target protein calculated in step (B); (C2) It may also be performed including the step of binding the selected R-group to the anchor of the scaffold to generate a derivative.
그리고 상기 (C2)단계는, 상기 스케폴드(scaffold)의 앵커(anchor)와 결합되는 R-group의 결합 위치를 달리하여, 동일한 R-group에 대하여 복수의 서로 다른 결합구조를 갖는 유도체를 생성할 수도 있다.And in the step (C2), the bonding position of the R-group bonded to the anchor of the scaffold is changed to produce derivatives with a plurality of different bonding structures for the same R-group. It may be possible.
또한, 상기 (C2)단계는, R-group과 스케폴드(scaffold)의 결합 형태(각도)를 달리하여, 동일한 결합구조를 갖는 R-group에 대하여 복수의 서로 다른 결합 형태를 갖는 유도체를 생성할 수도 있다.In addition, in the step (C2), the bond form (angle) of the R-group and the scaffold can be varied to produce derivatives with a plurality of different bond forms for the R-group having the same bond structure. It may be possible.
그리고 상기 R-group과 스케폴드(scaffold)의 결합 형태(각도)는, 표적 단백질의 앵커부분으로부터 근접한 부분을 추출하여 생성된 결합단(linker)을 추출고, 상기 결합단과 R-group의 결합 형태를 변화시켜 생성될 수도 있다.And the binding form (angle) of the R-group and the scaffold is obtained by extracting the linker created by extracting the adjacent part from the anchor part of the target protein, and the binding form of the linking group and the R-group It can also be created by changing .
또한, 상기 (D) 단계는, 상기 결합단과 상기 R-group의 결합 형태를 실존 물질 데이터베이스와 대비하여 필터링하는 결합단(Linker) 필터링을 포함할 수도 있다.In addition, the step (D) may include linker filtering in which the linkage form of the linker and the R-group is filtered by comparing it with an actual substance database.
그리고 상기 (D) 단계는, 상기 유도체의 R-group의 결합 형태 별로 상기 실린더 필터(sylinder)에 결합시킨 후, 포켓내에 생성되는 충돌(clash)량에 따라 유도체를 필터링하는 결합형태 필터링(Shape ftering)을 포함할 수도 있다.And in the step (D), the derivative is bound to the cylinder filter (cylinder) for each bond type of the R-group of the derivative, and then the derivative is filtered according to the amount of collision generated in the pocket. ) may also be included.
위에서 살핀 바와 같은 본 발명에 의한 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법에서는 다음과 같은 효과를 기대할 수 있다. The following effects can be expected from the method of generating a derivative using the binding pocket structure of the target protein through the artificial intelligence new drug platform according to the present invention as seen above.
즉, 본 발명에서는 인공지능 신약 플랫폼을 통해 후보물질을 발굴함에 있어, 도출된 유효물질(hit compound)을 개선하고, 분석 AI 알고리즘을 학습하기 위한 다양한 유도체를 생성할 수 있는 효과가 있다.In other words, the present invention has the effect of improving the derived hit compound and generating various derivatives for learning the analysis AI algorithm when discovering candidate substances through the artificial intelligence new drug platform.
그리고 본 발명에서는 유도체를 생성함에 있어, 표적 단백질의 결합 공간의 형태를 반영하여, 결합능력이 개선되고, 실제 결합 가능성이 향상된 유도체를 생성할 수 있는 효과가 있다.In addition, in the present invention, when producing a derivative, the binding ability is improved by reflecting the shape of the binding space of the target protein, and there is an effect of generating a derivative with an improved actual binding possibility.
또한, 발명에의해 생성된 유도체는 포켓 내의 결합 가능한 결합 형태(pose)가 함께 도출되므로, 인공지능 신약 플랫폼에서 분자동역학적 시뮬레이션이 수행되는 경우, 최적 결합 형태의 도출 가능성을 향상시키는 효과가 있다.In addition, the derivative created by the invention is derived from a binding form (pose) within the pocket, which has the effect of improving the possibility of deriving the optimal binding form when molecular dynamics simulation is performed on an artificial intelligence new drug platform.
도 1은 본 발명이 적용되는 인공지능 신약플랫폼(AI-drug platform)의 전체 구성을 도시한 구성도.1 is a configuration diagram showing the overall configuration of an artificial intelligence drug platform (AI-drug platform) to which the present invention is applied.
도 2는 본 발명이 적용되는 인공지능 신약플랫폼의 클라우드 서비스 구조를 도시한 개념도.Figure 2 is a conceptual diagram showing the cloud service structure of an artificial intelligence new drug platform to which the present invention is applied.
도 3은 본 발명이 적용되는 인공지능 신약플랫폼의 유효물질 발굴과정을 도시한 개념도.Figure 3 is a conceptual diagram showing the effective substance discovery process of the artificial intelligence new drug platform to which the present invention is applied.
도 4는 본 발명이 적용되는 인공지능 신약플랫폼의 선도물질 발굴과정을 도시한 개념도.Figure 4 is a conceptual diagram showing the lead material discovery process of the artificial intelligence new drug platform to which the present invention is applied.
도 5는 본 발명의 구체적인 실시예에 의한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성을 통한 선도물질 발굴방법을 도시한 흐름도.Figure 5 is a flowchart showing a method for discovering a lead material through the generation of a derivative using the binding pocket structure of a target protein according to a specific embodiment of the present invention.
도 6은 본 발명에 의한 유도체 생성과정에서 치환대상 위치인 앵커(Anchor atom)을 선정하는 방법을 도시한 개념도.Figure 6 is a conceptual diagram showing a method of selecting an anchor atom, which is a position to be replaced, in the derivative generation process according to the present invention.
도 7 내지 9는 본 발명에 의한 유도체 생성과정에서 결합 포켓의 크기를 산출하는 과정을 도시한 개념도.Figures 7 to 9 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention.
도 10 및 도 11은 본 발명에 의한 유도체 생성과정에서 R-group을 선택하여 유도체를 생성하는 방법을 도시한 개념도.10 and 11 are conceptual diagrams showing a method of generating a derivative by selecting an R-group in the derivative generation process according to the present invention.
도 12는 본 발명에 의한 유도체 생성과정에서 결합단과 R-group의 결합 형태를 기존 물질을 결합형태와 대비하여 필터링하는 결합단 필터링과정을 도시한 예시도.Figure 12 is an example diagram illustrating the bonding end filtering process in which the bonding form of the bonding group and the R-group is filtered by comparing it with the bonding form of the existing material in the derivative generation process according to the present invention.
도 13은 본 발명에 의한 유도체 생성과정에서 생성된 유도체의 포켓내에 충돌(clash) 여부에 따라 필터링하는 결합형태 필터링과정을 도시한 예시도.Figure 13 is an example diagram illustrating a combination type filtering process in which filtering is performed according to whether or not a collision occurs within the pocket of the derivative generated in the derivative generation process according to the present invention.
상기한 바와 같은 목적을 달성하기 위하여, 본 발명은 (A) 분석대상 화합물의 결합구조에서 치환대상 결합부인 앵커(Anchor atom)를 선정하는 단계와; (B) 표적 단백질 내의 결합포켓 내부의 포켓 공간을 산출하는 단계와; (C) 유도체를 생성하는 단계; 그리고 (D) 생성된 유도체를 필터링하여 선별하는 단계;를 포함하여 수행되되, 상기 (A) 단계는, (A1) 화합물과 표적단백질(target protein) 사이의 상호작용(interaction, 결합정보) 프로파일을 산출하는 단계와; (A2) 화합물과 표적단백질(target protein) 사이의 단일결합(single bond)을 개별적으로 절단(cut)하여, 절단부 양측의 원자편(fragment)들을 생성하는 단계와; (A3) 상기 생성된 원자편의 구성 원자수에 따라 원자편들을 필터링하는 단계와; (A4) 필터링된 원자편(fragment)들에 대하여, 상호작용 관련도(Interaction Efficiency)를 산출하여, 결합 상호작용에 관련도가 기 설정된 값 이하인 원자편의 절단(cut)부를 앵커(anchor)로 선정하는 단계를 포함하여 수행되고, 상기 (B) 단계는, (B1) 표적 단백질을 앵커(anchor)부위를 중심으로, 기 설정된 크기의 영역을 추출하여 실린더 필터(sylinder)를 생성하는 단계와; (B2) 상기 실린더 필터(sylinder)에 등간격으로 배열된 표시지점(dot)들을 설정하고, 상기 표시지점(dot)들을 단백질 원자의 상호작용(interaction) 에너지에 의해 구분하는 단계와; (B3) 상기 화합물에서 상기 원자편(fragment)이 제거된 스케폴드(scaffold)의 앵커(anchor)부위를 실린더 필터(sylinder)의 앵커(anchor)부위에 접근시켜 배치하는 단계와; (B4) 상기 표시지점(dot) 중 상호작용(interaction) 에너지가 기 설정된 값 이상인 영역을 실린더 필터(sylinder)에서 제외시키는 단계와; (B5) 상기 표시지점(dot) 공간 단위로 클러스팅(GMM clustering)하는 단계와; (B6) 클러스팅된 영역들을 상기 스케폴드(scaffold) 앵커(anchor)와의 인접도 및 클러스팅된 영역의 크기에 따라 선별하여 일부 크러스팅 영역만을 도출하여 표적 단백질 내의 포켓 공간(target volume)으로 도출하고, 상기 포켓 영역의 크기를 산출하는 단계; 를 포함하여 수행될 수 있다.In order to achieve the above-mentioned object, the present invention includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed; (B) calculating the pocket space inside the binding pocket in the target protein; (C) producing a derivative; And (D) filtering and selecting the produced derivative; wherein the step (A) is performed to determine the interaction (binding information) profile between the (A1) compound and the target protein. a calculating step; (A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion; (A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments; (A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. Step (B) includes: (B1) generating a cylinder filter by extracting a region of a preset size centered on the anchor site of the target protein; (B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms; (B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter; (B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots; (B5) clustering (GMM clustering) in the dot space unit; (B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region, and only some of the clustering regions are derived to be derived as a pocket space (target volume) within the target protein. and calculating the size of the pocket area; It can be performed including.
이하에서는 첨부된 도면을 참조하여 본 발명의 구체적인 실시예에 의한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법을 살펴보기로 한다.Hereinafter, we will look at a method for producing a derivative using the binding pocket structure of a target protein according to a specific example of the present invention with reference to the attached drawings.
설명에 앞서 먼저, 본 발명의 효과, 특징 및 이를 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예에서 명확해진다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Prior to the description, the effects, features, and methods of achieving the present invention will become clear in the examples described in detail below along with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various different forms. The present embodiments are merely provided to ensure that the disclosure of the present invention is complete and to provide common knowledge in the technical field to which the present invention pertains. It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.
본 발명의 실시 예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이며, 후술되는 용어들은 본 발명의 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, if it is judged that a detailed description of a known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description will be omitted, and the terms described below will be used in the embodiments of the present invention. These are terms defined in consideration of the function of and may vary depending on the intention or custom of the user or operator. Therefore, the definition should be made based on the contents throughout this specification.
첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들(실행 엔진)에 의해 수행될 수도 있으며, 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다.The combination of each block in the attached block diagram and each step in the flow chart may be performed by computer program instructions (execution engine), and these computer program instructions can be installed on a processor of a general-purpose computer, special-purpose computer, or other programmable data processing equipment. Since it can be mounted, the instructions executed through a processor of a computer or other programmable data processing equipment create a means of performing the functions described in each block of the block diagram or each step of the flow diagram.
그리고, 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성하여 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명되는 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.In addition, computer program instructions can also be mounted on a computer or other programmable data processing equipment, so a series of operation steps are performed on the computer or other programmable data processing equipment to create a process that is executed by the computer and runs on the computer or other program. Instructions that perform possible data processing equipment may also provide steps for executing functions described in each block of the block diagram and each step of the flow diagram.
또한, 각 블록 또는 각 단계는 특정된 논리적 기능들을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있으며, 몇 가지 대체 실시 예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능하다.Additionally, each block or each step may represent a module, segment, or portion of code containing one or more executable instructions for executing specified logical functions, and in some alternative embodiments, the blocks or steps referred to in the blocks or steps may represent a portion of code. It is also possible for functions to occur out of order.
그리고 본 발명이 적용되는 인공지능 신약 플랫폼 분야에서는 국문으로 정의되지 않고 영문명칭이 일반명칭으로 사용되는 기술용어가 대다수인 바, 국문으로 병기된 기술용어의 경우, 기술분야에서 일반명칭으로 통용되는 영문명칭의 의미로 해석되어야 한다.In the field of artificial intelligence new drug platforms to which the present invention is applied, most technical terms are not defined in Korean and have English names used as general names. In the case of technical terms written in Korean, the English names are commonly used as general names in the technical field. It must be interpreted according to the meaning of the name.
본 발명에 의한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법을 설명하기에 앞서, 본 발명이 적용되는 전체 인공지능 신약 플랫폼에 대하여 설명하기로 한다.Before explaining the method for generating a derivative using the binding pocket structure of the target protein according to the present invention, the entire artificial intelligence new drug platform to which the present invention is applied will be described.
도 1은 본 발명이 적용되는 인공지능 신약플랫폼(AI-drug platform)의 전체 구성을 도시한 구성도이고, 도 2는 본 발명이 적용되는 인공지능 신약플랫폼의 클라우드 서비스 구조를 도시한 개념도이며, 도 3은 본 발명이 적용되는 인공지능 신약플랫폼의 유효물질 발굴과정을 도시한 개념도이고, 도 4는 본 발명이 적용되는 인공지능 신약플랫폼의 선도물질 발굴과정을 도시한 개념도이다.Figure 1 is a configuration diagram showing the overall configuration of an artificial intelligence new drug platform (AI-drug platform) to which the present invention is applied, and Figure 2 is a conceptual diagram showing the cloud service structure of an artificial intelligence new drug platform to which the present invention is applied. Figure 3 is a conceptual diagram showing the active material discovery process of the artificial intelligence new drug platform to which the present invention is applied, and Figure 4 is a conceptual diagram showing the lead material discovery process of the artificial intelligence new drug platform to which the present invention is applied.
본 발명의 적용되는 인공지능 신약플랫폼(AI-drug platform)은 기본적으로 전임상 단계에서 신약 후보물질을 발굴하는 전체 과정을 수행하는 플랫폼으로, 출원인은 클라우드를 통해 서비스(STB CLOUD)될 수 있다.The AI-drug platform to which the present invention is applied is basically a platform that performs the entire process of discovering new drug candidates in the preclinical stage, and applicants can be serviced through the cloud (STB CLOUD).
이때, 신약이라 함은 합성신약(Small Molecule) 및 항체신약(Antibody)을 포함하는 것으로 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)은 이들 모두에 대한 발굴과정을 제공한다.At this time, new drugs include synthetic new drugs (small molecules) and antibody drugs, and the artificial intelligence new drug platform (AI-drug platform) according to the present invention provides a discovery process for all of them.
한편, 이를 위하여, 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)은 도 1에 도시된 바와 같이, 유효(hit)물질 자동화 발굴 플랫폼, 선도(lead)물질 자동화 발굴 플랫폼 및 약물반응(ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) 자동화 분석 플랫폼을 포함하여 구성된다.Meanwhile, for this purpose, the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in FIG. 1, includes a hit material automated discovery platform, a lead material automated discovery platform, and a drug reaction (ADMET) , Absorption, Distribution, Metabolism, Excretion & Toxicity) and an automated analysis platform.
즉, 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)은 유효물질을 선별하고, 이 중 선도물질을 발굴한 후, 약물반응 분석을 통해 후보물질을 선택하는 신약 개발과정의 전 과정을 수행하도록 구성괸 인공지능 플랫폼이다.In other words, the artificial intelligence new drug platform (AI-drug platform) according to the present invention performs the entire new drug development process of selecting active substances, discovering lead substances among them, and then selecting candidate substances through drug reaction analysis. It is an artificial intelligence platform designed to do this.
도 2에는 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)의 클라우드 서비스 과정이 도시되어 있는데, 이에 도시된 바와 같이, 본 발명은 유효물질을 발굴하고, 선도물질을 생성하며, ADMET/PK부터 약물유전학 바이오마커에 이르기까지 약물 발견 및 개발 프로세스의 모든 영역을 제공한다.Figure 2 shows the cloud service process of the AI-drug platform according to the present invention. As shown, the present invention discovers effective substances, generates lead substances, and ADMET/PK. It provides all areas of the drug discovery and development process, from pharmacogenetics to biomarkers.
또한, 이들 신약 개발의 각 발굴 단계의 플랫폼을 운영하기 위하여, 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)은 3개의 개별적인 인공지능 시스템인, 생성형 인공지능 시스템(GPT/BERT), 3차원 구조 인공지능 시스템(ED-CNN), 분자동역학 분석 시스템(Auto-MD simulation)이 적용된다.In addition, in order to operate the platform for each discovery stage of new drug development, the artificial intelligence new drug platform (AI-drug platform) according to the present invention includes three individual artificial intelligence systems: generative artificial intelligence system (GPT/BERT), A 3D structural artificial intelligence system (ED-CNN) and a molecular dynamics analysis system (Auto-MD simulation) are applied.
그리고, 상기 인공지능 신약플랫폼(AI-drug platform)의 상기 인공지능시스템들을 이용하여, 각 유효(hit)물질 자동화 발굴 플랫폼, 선도(lead)물질 자동화 발굴 플랫폼 및 약물반응(ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) 자동화 분석 플랫폼을 실행하기 위한 구체적인 방법으로, 단백질-리간드 간 3차원 구조정보를 통한 유효물질 발굴(이하 출원인 조어 기술명인 'DMC-PRE'라 한다), 중심원자 벡터 기반 단백질-리간드 간 도킹구조 분석(이하 출원인 조어 기술명인 'GAP-Dock'이라 한다), 3D-CNN 학습모델을 이용한 단백질-화합물 간 최적화 결합구조 예측(이하 출원인 조어 기술명인 'DMC-SCR'이라 한다), 표적 단백질의 결합 포켓 구조를 통한 유도체 생성(이하 출원인 조어 기술명인 'LEAD-GEN'이라 한다), 분자동역학 시뮬레이션 데이터를 통한 단백질-화합물 상호 결합 안정성 분석(이하 출원인 조어 기술명인 'DMC-MD'라 한다) 및 단백질-화합물 간 3차원 상호작용 데이터를 이용하여 학습된 생성된 인공지능 모델(이하 출원인 조어 기술명인 '3bmGPT'라 한다)이 적용된다.And, using the artificial intelligence systems of the AI-drug platform, each hit material automated discovery platform, lead material automated discovery platform, and drug reaction (ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) as a specific method to run an automated analysis platform, discovering effective substances through 3D structural information between proteins and ligands (hereinafter referred to as 'DMC-PRE', the technical name coined by the applicant), central atom vector-based protein - Analysis of docking structure between ligands (hereinafter referred to as 'GAP-Dock', the technical name of the applicant), prediction of optimized binding structure between proteins and compounds using a 3D-CNN learning model (hereinafter referred to as 'DMC-SCR', the technical name of the applicant) , generation of derivatives through the binding pocket structure of the target protein (hereinafter referred to as 'LEAD-GEN', the technical name of the applicant), analysis of protein-compound interaction stability through molecular dynamics simulation data (hereinafter referred to as 'DMC-MD', the technical name of the applicant) ) and the generated artificial intelligence model learned using 3D interaction data between proteins and compounds (hereinafter referred to as '3bmGPT', the technical name coined by the applicant) is applied.
여기서, 상기 DMC-PRE 및 GAP-Dock은 상기 유효(hit)물질 자동화 발굴 플랫폼을 통해 유효물질을 발굴함에 선행적용되는 기술이고, DMC-SCR은 분자동역학 분석 시스템(Auto-MD simulation)에 적용되어, 상기 유효(hit)물질 자동화 발굴 플랫폼을 통해 유효물질을 발굴함에 후행적용되는 기술이며, LEAD-GEN은 선도(lead)물질 자동화 발굴 플랫폼을 통해 선도물질을 발굴함에 적용되는 기술이다.Here, the DMC-PRE and GAP-Dock are technologies applied in advance to discover active substances through the automated discovery platform for hit substances, and DMC-SCR is applied to the molecular dynamics analysis system (Auto-MD simulation). , It is a technology applied later to discover active substances through the hit material automated discovery platform, and LEAD-GEN is a technology applied to discover lead materials through the lead material automated discovery platform.
그리고 DMC-MD는 상기 분자동역학 분석 시스템(Auto-MD simulation)에 적용되어, 상기 유효(hit)물질 자동화 발굴 플랫폼, 선도(lead)물질 자동화 발굴 플랫폼 및 약물반응(ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) 자동화 분석 플랫폼에서 도출된 결과물의 결합 안정성을 검증하는 기술이고, 3bmGPT는 생성형 인공지능 시스템(GPT/BERT)에 적용되어, 상기 유효(hit)물질 자동화 발굴 플랫폼을 통해 유효물질을 산출함에 있어 분석대상물질을 선별하는 기술이다.And DMC-MD is applied to the molecular dynamics analysis system (Auto-MD simulation), the hit material automated discovery platform, the lead material automated discovery platform, and drug reaction (ADMET, Absorption, Distribution, Metabolism, Excretion & Toxicity) is a technology that verifies the combined stability of results derived from an automated analysis platform, and 3bmGPT is applied to a generative artificial intelligence system (GPT/BERT) to identify active substances through the automated discovery platform for hit substances. It is a technology that selects the analyte target for calculation.
구체적으로, 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)의 유효물질 발굴과정을 살피면, 도 3에 도시된 바와 같이, 상기 3bmGPT를 통해 선별한 분석대상 물질을 상기 DMC-PRE 및 GAP-Dock을 적용하여 선행 스크리닝하고, 상기 DMC-SCR을 적용하여 심층 스크리닝을 한후, DMC-MD를 적용하여 결합 안정성을 검증하여 유효물질을 도출한다.Specifically, looking at the process of discovering effective substances of the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in Figure 3, the analyte substances selected through the 3bmGPT are classified into the DMC-PRE and GAP- Dock is applied for preliminary screening, DMC-SCR is applied for in-depth screening, and DMC-MD is applied to verify binding stability to derive effective substances.
그리고 본 발명에 의한 인공지능 신약플랫폼(AI-drug platform)의 선도물질 발굴과정을 살피면, 도 4에 도시된 바와 같이, 상기 LEAD-GEN을 적용하여, 선도물질을 발굴하고, DMC-MD를 적용하여 결합 안정성을 검증하여 선도물질을 도출한다.And looking at the lead material discovery process of the artificial intelligence new drug platform (AI-drug platform) according to the present invention, as shown in Figure 4, the LEAD-GEN is applied to discover the lead material, and DMC-MD is applied. By verifying the binding stability, the lead material is derived.
본 발명은 전술한 바와 같은 인공지능 신약 플랫폼에 적용되는 유도체 생성방법(LEAT-GEN)에 관한 것으로, 이하에서 상세히 설명하기로 한다.The present invention relates to a derivative generation method (LEAT-GEN) applied to the artificial intelligence new drug platform as described above, and will be described in detail below.
도 5는 본 발명의 구체적인 실시예에 의한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성을 통한 선도물질 발굴방법을 도시한 흐름도이고, 도 6은 본 발명에 의한 유도체 생성과정에서 치환대상 위치인 앵커(Anchor atom)을 선정하는 방법을 도시한 개념도이며, 도 7 내지 9는 본 발명에 의한 유도체 생성과정에서 결합 포켓의 크기를 산출하는 과정을 도시한 개념도이고, 도 10 및 도 11은 본 발명에 의한 유도체 생성과정에서 R-group을 선택하여 유도체를 생성하는 방법을 도시한 개념도이며, 도 12는 본 발명에 의한 유도체 생성과정에서 결합단과 R-group의 결합 형태를 기존 물질을 결합형태와 대비하여 필터링하는 결합단 필터링과정을 도시한 예시도이다.Figure 5 is a flowchart showing a method of discovering a leader material through the generation of a derivative using the binding pocket structure of the target protein according to a specific embodiment of the present invention, and Figure 6 shows the anchor (anchor), which is the position to be replaced in the derivative generation process according to the present invention. It is a conceptual diagram showing a method of selecting an anchor atom, and Figures 7 to 9 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention, and Figures 10 and 11 are conceptual diagrams showing the process of calculating the size of the binding pocket in the derivative generation process according to the present invention. It is a conceptual diagram showing a method of generating a derivative by selecting an R-group in the derivative generation process, and Figure 12 shows the bond form of the binding group and the R-group being filtered by comparing it with the bond form of the existing material in the derivative generation process according to the present invention. This is an example diagram showing the combined end filtering process.
먼저, 본 발명에 의한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성방법은 크게 (A) 분석대상 화합물의 결합구조에서 치환대상 결합부인 앵커(Anchor atom)를 선정하는 단계와, (B) 표적 단백질 내의 결합포켓 내부의 포켓 공간을 산출하는 단계와, (C) 유도체를 생성하는 단계와, (D) 생성된 유도체를 필터링하여 선별하는 단계;를 포함하여 수행된다.First, the method for generating a derivative using the binding pocket structure of the target protein according to the present invention largely includes the steps of (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed, and (B) within the target protein. It is performed including calculating the pocket space inside the binding pocket, (C) generating a derivative, and (D) filtering and selecting the generated derivative.
이들 각 수행과정에 대하여 설명하면, 먼저, (A) 앵커(Anchor atom) 선정 단계는, 도 6에 도시된 바와 같이, 분석대상인 유효물질(Hit-Compound)의 치환대상 결합부을 선정하는 과정이다.To describe each of these performance processes, first, (A) the anchor atom selection step is a process of selecting a binding site for substitution of the effective substance (Hit Compound) to be analyzed, as shown in FIG. 6.
구체적으로, 도 6의 'A'에 도시된 바와 같이, 유효물질인 화합물이 표적단백질(target protein)내에 결합되어 있는 경우, 화합물과 표적단백질(target protein) 사이의 상호작용(interaction, 결합정보) 프로파일을 산출한다. Specifically, as shown in 'A' of Figure 6, when a compound that is an effective substance is bound to a target protein, the interaction (binding information) between the compound and the target protein Calculate the profile.
이때, 상기 상호작용(interaction, 결합정보) 프로파일은 출원인의 소프트웨어 ENVA로 획득될 수 있으나, 상용화된 다른 소프트웨어(FEP+, MMPBSA 등)를 통해 획득하는 것도 가능하다.At this time, the interaction (combination information) profile can be acquired with the applicant's software ENVA, but it is also possible to obtain it through other commercially available software (FEP+, MMPBSA, etc.).
이후, 도 6의 'B'에 도시된 바와 같이, 화합물과 표적단백질(target protein)의 상호작용 프로파일로부터, 표적단백질(target protein)과 결합되는 화합물(Mother compound) 결합(bond) 중 하나(dingle bomd)를 절단(cut)하여, 절단부 양측의 원자편(fragment)을 생성한다.Then, as shown in 'B' of Figure 6, from the interaction profile between the compound and the target protein, one of the bonds of the mother compound that binds to the target protein (dingle) The bomd is cut to create atomic fragments on both sides of the cut portion.
이와 같은, 절단(cut) 및 원자편(fragment)의 생성은 화합물(Mother compound) 단일결합(single bond) 전체에 걸쳐 수행되며, 단일결합(single bond) 개수의 2배의 원자편(fragment)이 산출된다.Such cuts and creation of atomic fragments are performed throughout the single bonds of the mother compound, and atomic fragments twice the number of single bonds are created. It is calculated.
그리고, 이들 산출된 원자편(fragment) 중 기 설정된 범위(1~12개) 내의 원자가 포함된 원자편 만을 선택하고, 나머지 원자편은 분석대상에서 제외한다.And, among these calculated atomic fragments, only atomic fragments containing atoms within a preset range (1 to 12) are selected, and the remaining atomic fragments are excluded from the analysis target.
즉, 상기 원자편(fragment)을 산출하는 것은, 화합물(Mother compound)과 표적단백질(target protein) 사이의 결합에 영향이 적어, 치환 대상으로 가치가 있는 원자편을 도출하기 위한 것으로, 원자편(fragment)을 구성하는 원자의 수가 너무 적은 경우, 치환을 통해 새로운 화합물이 도출될 가능성이 적고, 반대로 상기 원자편(fragment)을 구성하는 원자의 수가 너무 많은 경우, 치환 후 화합물의 특성이 현저히 달라질 우려성이 있기 때문이다.In other words, the purpose of calculating the atomic fragment is to derive an atomic fragment that has little effect on the bond between the mother compound and the target protein and is therefore valuable as a replacement target. If the number of atoms constituting the fragment is too small, there is a small possibility that a new compound will be derived through substitution. Conversely, if the number of atoms constituting the fragment is too large, there is a risk that the properties of the compound will be significantly different after substitution. Because there is a castle.
다음으로, 선택된 원자편(fragment)들에 대하여, 도 6의 'B'에 도시된 바와 같이, 상호작용 관련도(Interaction Efficiency)를 산출하여, 결합 상호작용에 영향도가 낮은 원자편의 절단(cut)부를 앵커(anchor)로 선정한다.Next, for the selected atomic fragments, as shown in 'B' in FIG. 6, the interaction efficiency is calculated, and the atomic fragments that have a low impact on the bonding interaction are cut. ) is selected as the anchor.
이때, 상기 상호작용 관련도(Interaction Efficiency)는 상기 원자편(fragment)을 구성하는 각 원자들의 결합에너지의 평균값으로 산출될 수 있다.At this time, the interaction efficiency can be calculated as the average value of the binding energy of each atom constituting the atomic fragment.
여기서, 원자편(fragment)이 절단된 화합물을 스케폴드(scaffold)라 한다.Here, the compound from which the atomic fragments are cut is called a scaffold.
다음으로, (B) 표적 단백질 내의 포켓 공간을 산출하는 단계는 도 7 내지 9에 도시된 바와 같이, 원자편(fragment)이 제거된 스케폴드(scaffold)가 표적단백질의 포켓내에 결합되는 경우, 포켓 내부의 공간을 산출하는 것이다.Next, (B) the step of calculating the pocket space within the target protein, as shown in Figures 7 to 9, when the scaffold from which the atomic fragments have been removed is bound to the pocket of the target protein, the pocket Calculate the internal space.
이를 위해, 본 발명에서는 도 7에 도시된 바와 같이, 표적 단백질을 앵커(anchor)부위를 중심으로, 기 설정된 크기(길이 10Å, 반지름 10Å)의 원통 영역을 추출하여, 실린더 필터(sylinder)를 생성한다. For this purpose, in the present invention, as shown in FIG. 7, a cylindrical region of a preset size (length 10Å, radius 10Å) is extracted centering on the anchor site of the target protein to create a cylinder filter (sylinder). do.
상기 실린더 필터(sylinder)의 크기는 스케폴드(scaffold)의 구조 및 연산 시스템 자원에 따라 변경이 가능하지만, 상기 스케폴드(scaffold)의 결합 포켓의 가능 영역이 충분히 커버되도록 설정되는 것이 바람직하다.The size of the cylinder filter can be changed depending on the structure of the scaffold and computational system resources, but it is preferably set to sufficiently cover the available area of the binding pocket of the scaffold.
한편, 상기 실린더 필터에는 등간격으로 배열된 표시지점(dot)들이 설정되고, 상기 표시지점(dot)들은 단백질 원자와의 상호작용(interaction) 에너지에 의해 구부된다. 도 7의 'A'에서는 상기 실린더 필터(sylinder)에 포함된 표시지점들이 단백질 원자와의 상호작용 에너지에 따라 색상으로 구분되었다.Meanwhile, dots arranged at equal intervals are set on the cylinder filter, and the dots are bent by interaction energy with protein atoms. In 'A' of Figure 7, the display points included in the cylinder filter (cylinder) are color-coded according to the interaction energy with protein atoms.
도 7에 도시된 실린더 필터(sylinder)의 경우, 도 7의 'B'에 도시한 vdw radius(r) 산출식에서 산출된 r값에 따라, 0.7r 이내 표시지점을 Red zone(clash 영역)으로 표시하였고, 0.7r ~ 1r 내의 표시지점을 Yellow zone(buffer 영역)으로 표시하였으며, 1r ~ 1.3r 내의 표시지점을 Green zone(contact 영역)으로 표시하였고, 1.3r ~ 3 Å 내의 표시지점을 Gray zone(close 영역)으로 표시하였다.In the case of the cylinder filter shown in FIG. 7, the display point within 0.7r is displayed as a red zone (clash area) according to the r value calculated from the vdw radius (r) calculation formula shown in 'B' in FIG. 7. The marking points within 0.7r ~ 1r were marked as yellow zone (buffer area), the marking points within 1r ~ 1.3r were marked as green zone (contact area), and the marking points within 1.3r ~ 3 Å were marked as gray zone ( close area).
이때, 상호작용 에너지가 적은Gray zone이 화합물과 결합이 유도될 수 있는 적당한 여유공간이 확보된 포켓영역으로 적용될 가능성이 높다고 판단될 수 있다.At this time, it can be judged that the gray zone with low interaction energy is highly likely to be applied as a pocket region with adequate free space where compound and bonding can be induced.
이후 상기 스케폴드(scaffold)의 앵커(anchor)부위를 실린더 필터(sylinder)의 앵커(anchor)부위에 접근시켜 배치한다. 이때, 상기 스케폴드(scaffold) 결합 축 방향은 원래 결합 방향을 유지하는 것이 바람직하다.Thereafter, the anchor portion of the scaffold is placed close to the anchor portion of the cylinder filter. At this time, it is preferable that the scaffold bonding axis direction maintains the original bonding direction.
이와 같은 과정이 도 8의 'A' 및 'B'에 도시되어 있다.This process is shown in 'A' and 'B' of FIG. 8.
다음으로, 도 8의 'C'에 도시된 바와 같이, 상기 표시지점(dot) 중 상호작용(interaction) 에너지가 큰 영역(Red zone, Yellow zone, Green zone)은 상기 실린더 필터(sylinder)에서 제외시키고, 상기 표시지점(dot) 중 상호작용(interaction) 에너지가 작은 영역(Gray zone, Dark Gray zone)만을 남긴다.Next, as shown in 'C' of FIG. 8, the areas (Red zone, Yellow zone, Green zone) with high interaction energy among the dots are excluded from the cylinder filter (sylinder). And, among the dots, only areas (Gray zone, Dark Gray zone) with low interaction energy are left.
그리고 남은 영역들을 도 8의 'D'에 도시된 바와 같이, 공간단위로 클러스팅(GMM clustering)한다.And the remaining areas are clustered (GMM clustering) in spatial units, as shown in 'D' of FIG. 8.
다음으로, 클러스팅된 영역들 중 상기 스케폴드(scaffold) 앵커(anchor)와의 인접도 및 클러스팅된 영역의 크기에 따라 일부 크러스팅 영역만을 도출한다.Next, among the clustered areas, only some of the crusting areas are derived according to the proximity to the scaffold anchor and the size of the clustered area.
도 8의 'E'의 경우, 상기 스케폴드(scaffold) 앵커(anchor)와 연결된 크러스팅 영역들 중 크기가 가장 큰 2개의 클러스팅 영역이 선별된 실시예가 도시되어 있다.In the case of 'E' in FIG. 8, an embodiment in which the two largest clustering regions are selected among the crusting regions connected to the scaffold anchor is shown.
이와 같이, 도 8의 'F'에 도시된 바와 같이, 선별된 클러스팅 영역을 표적 단백질 내의 포켓 공간(target volume)으로 도출한다.In this way, as shown in 'F' of FIG. 8, the selected clustering region is derived as a pocket space (target volume) within the target protein.
이와 같이, 표적 단백질 내의 포켓 공간(target volume)을 도출하는 이유는 상기 원자편(fragment)을 치환할 R-group(Replace atom group)을 선택함에 잇어, 포켓 공간(target volume)의 크기에 따라 R-group을 선별하여 선택하기 위함이다.In this way, the reason for deriving the pocket space (target volume) within the target protein is to select an R-group (Replace atom group) to replace the atom fragment, and R depending on the size of the pocket space (target volume) -This is to select and select groups.
도 9에는 전술한 바와 같은 방법으로, 실제 표적 단백질 내의 포켓 공간(target volume)을 도출한 예가 도시되어 있다.Figure 9 shows an example of deriving the pocket space (target volume) within the actual target protein using the method described above.
도 9에 도시된 실시예의 경우, 화합물 6op0에 대하여, 9개의 앵커(anchor)를 도출하여, 분석한 결과, 실제 해당 화합물의 ligand atom 개수와 일치하는 분석결과(answer)가 도출되었다.In the case of the example shown in Figure 9, for compound 6op0, 9 anchors were derived and analyzed, and an analysis result (answer) matching the actual number of ligand atoms of the compound was derived.
여기서, 표시지점(dot)의 개수를 300으로 나누어 크기를 표시한 것은, Heavy atom (C, N, O)의 volume이 표시지점(dot)들의 간격 기준으로, 표시지점 300 정도의 영역임을 반영한 것이다.Here, the size is indicated by dividing the number of display points (dots) by 300, reflecting that the volume of the heavy atom (C, N, O) is an area of about 300 display points, based on the spacing of the display points (dots). .
다음으로, (C) 유도체를 생성하는 단계는, 도 10에 도시된 바와 같이, 상기 제(B)단계에서 산출된 표적 단백질 내의 포켓 공간(target volume) 크기게 맞는 R-group을 선택한다.Next, in the step (C) of generating a derivative, as shown in FIG. 10, an R-group that matches the size of the pocket space (target volume) in the target protein calculated in the step (B) is selected.
여기서, R-group(Replace atom group)은 스케폴드(scaffold)에서 제거된 원자편(fragment)을 치환하는 원자그룹으로, 다양한 원자구성이 저장된 데이터베이스로부터 선택된다.Here, R-group (Replace atom group) is an atomic group that replaces the atomic fragment removed from the scaffold, and is selected from a database storing various atomic configurations.
이후, 선택된 R-group을 스케폴드(scaffold)의 앵커(anchor)에 결합하여, 유도체(derivative)를 생성한다.Afterwards, the selected R-group is bound to the anchor of the scaffold to generate a derivative.
이때, 도 10에 도시된 바와 같이, 선택된 하나의 R-group은 상기 스케폴드(scaffold)의 앵커(anchor)에 결합되는 위치를 달리하여 다양한 결합구조의 유도체가 생성된다.At this time, as shown in FIG. 10, the position where one selected R-group is bonded to the anchor of the scaffold is changed to create derivatives with various bonding structures.
또한, 동일한 결합구조를 갖는 유도체의 경우에도, 도 11에 도시된 바와 같이, R-group과 스케폴드(scaffold)의 결합 형태(결합 각도)를 달리하여, 다수의 유도체를 생성한다.In addition, even in the case of derivatives with the same bonding structure, as shown in FIG. 11, a number of derivatives are created by varying the bonding form (bonding angle) of the R-group and the scaffold.
이 경우, R-group과 스케폴드(scaffold)의 결합 형태를 변경하여 다양한 결합형태의 유도체 생성은, 도 11에 도시된 바와 같이, 표적 단백질의 앵커부분의 결합단(linker)을 추출하여 생성한다.In this case, the creation of derivatives with various binding forms by changing the binding form of the R-group and the scaffold is created by extracting the linker of the anchor portion of the target protein, as shown in Figure 11. .
여기서, 상기 결합단(linker)은 앵커로부터 근접한 부분만을 추출한 것으로, 앵커로부터 기 설정된 개수의 결합 원자까지만 추출되어 생성될 수 있다.Here, the linker is obtained by extracting only the portion adjacent to the anchor, and can be created by extracting only a preset number of bonding atoms from the anchor.
도 11에 도시된 예에서는, 앵커로부터 3개의 원자결합이 상기 결합단으로 추출된 실시예가 도시되어 있다.In the example shown in Figure 11, an example in which three atomic bonds are extracted from the anchor to the bonding end is shown.
이후, 상기 결합단과 R-group의 결합 형태를 다양화하여, 다양한 결합 형태를 갖는 유도체들을 생성한다.Afterwards, the bond form of the binding group and R-group is varied to produce derivatives with various bond forms.
마지막으로, (D) 유도체 필터링 단계는, 생성된 유도체의 앵거부분의 결합형태를 필터링하여, 실존 가능성이 낮은 결합형태를 갖는 유도체를 제외시키는 것을 말한다.Lastly, the (D) derivative filtering step refers to filtering the bond form of the Anger portion of the generated derivative to exclude derivatives with a bond form that is unlikely to exist.
본 발명에서 상기 유도체 필터링은 크게 두가지 방법에 의해 이루어지는데, 첫번째 방법은 상기 결합단과 상기 R-group의 결합 형태를 실존 물질 데이터베이스와 대비하여 필터링하는 결합단(Linker) 필터링이고, 두번째 방법은 유도체의 R-group의 결합 형태 별로 상기 실린더 필터(sylinder)에 결합시킨 후, 포켓내에 생성되는 충돌(clash)량에 따라 유도체를 필터링하는 결합형태 필터링(Shape ftering)이다. In the present invention, the derivative filtering is largely carried out by two methods. The first method is linker filtering, which filters the bond form of the linking group and the R-group by comparing it with an existing substance database, and the second method is linking group filtering of the derivative. Shape filtering is where the derivatives are filtered according to the amount of collision generated within the pocket after binding to the cylinder filter according to the bond type of the R-group.
상기 결합단(Linker) 필터링은 도 12에 도시된 바와 같이, 상기 결합단(linker)과 R-group의 다양한 결합 형태들을 실존하는 화합물(및 단백질) 데이터베이스(ChEMBL 등)의 실존물질의 결합구조와 대비하여 선별한다.As shown in FIG. 12, the linker filtering is performed by comparing various linkage forms of the linker and R-group with the linkage structure of existing substances in existing compound (and protein) databases (ChEMBL, etc.) Select in preparation.
구체적으로, 상기 결합단(linker) 및 R-group과 동일한 구성의 결합 구조를 화합물 데이터베이스에서 선별하고, 이들의 결합형태를 분석하여, 실제물질에서 존재하는 결합형태들로 상기 결합단(linker)과 R-group이 결합된 유도체만 선별한다.Specifically, binding structures with the same composition as the linking group (linker) and R-group are selected from the compound database, their bonding forms are analyzed, and the linking form (linker) and the linking form that exist in the actual material are selected from the compound database. Only derivatives with a bound R-group are selected.
이때, 실제 화합물 데이터로부터, 특정 결합부분의 결합형태를 산출하는 과정은 도 12에 도시된 바와 같이, SMARTS에 의해 산출될 수 있다. At this time, the process of calculating the binding form of a specific binding portion from actual compound data can be calculated by SMARTS, as shown in FIG. 12.
다음으로, 상기 결합형태 필터링(Shape ftering)은 도 13에 도시된 바와 같이, 상기 유도체의 R-group 결합 형태 별로 상기 실린더 필터(sylinder)에 결합시킨 후, 포켓내에 생성되는 충돌(clash)량을 산출한다.Next, as shown in FIG. 13, the bond shape filtering (shape filtering) is performed by binding the derivative to the cylinder filter (cylinder) for each R-group bond type, and then measuring the amount of collision generated in the pocket. Calculate
여기서, 상기 충돌은 R-group의 원자와 표적단백지의 원자가 충돌되는 것을 말한다.Here, the collision refers to a collision between an atom of the R-group and an atom of the target protein.
본 발명의 권리는 위에서 설명된 실시예에 한정되지 않고 청구범위에 기재된 바에 의해 정의되며, 본 발명의 분야에서 통상의 지식을 가진 자가 청구범위에 기재된 권리범위 내에서 다양한 변형과 개작을 할 수 있다는 것은 자명하다.The rights of the present invention are not limited to the embodiments described above but are defined by the claims, and those skilled in the art can make various changes and modifications within the scope of the claims. This is self-evident.
본 발명은 신약개발을 위한 CADD(Computer aided drug discovery) 또는 AI drug platform의 신약 유효물질 발굴 과정에 적용되는 in silico prescreening 기술로, 선별된 유효물질(Hit-compound)로부터 다양한 형태의 유도체(Derivative)를 생성하는 방법에 관한 것으로, 본 발명에서는 인공지능 신약 플랫폼을 통해 후보물질을 발굴함에 있어, 도출된 유효물질(hit compound)을 개선하고, 분석 AI 알고리즘을 학습하기 위한 다양한 유도체를 생성할 수 있는 효과가 있다.The present invention is an in silico prescreening technology applied to the process of discovering active substances for new drugs using CADD (Computer aided drug discovery) or AI drug platform for new drug development, and is used to detect various types of derivatives from selected hit-compounds. The present invention relates to a method of generating a method for discovering candidate substances through an artificial intelligence new drug platform, improving derived effective substances (hit compounds) and generating various derivatives for learning analytical AI algorithms. It works.

Claims (10)

  1. (A) 분석대상 화합물의 결합구조에서 치환대상 결합부인 앵커(Anchor atom)를 선정하는 단계와; (A) selecting an anchor atom, which is a binding site to be replaced, in the binding structure of the compound to be analyzed;
    (B) 표적 단백질 내의 결합포켓 내부의 포켓 공간을 산출하는 단계와; (B) calculating the pocket space inside the binding pocket in the target protein;
    (C) 유도체를 생성하는 단계; 그리고 (C) producing a derivative; and
    (D) 생성된 유도체를 필터링하여 선별하는 단계;를 포함하여 수행됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.(D) A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, comprising the step of filtering and selecting the generated derivative.
  2. 제 1 항에 있어서,According to claim 1,
    상기 (A) 단계는,The step (A) is,
    (A1) 화합물과 표적단백질(target protein) 사이의 상호작용(interaction, 결합정보) 프로파일을 산출하는 단계와;(A1) calculating an interaction (binding information) profile between the compound and the target protein;
    (A2) 화합물과 표적단백질(target protein) 사이의 단일결합(single bond)을 개별적으로 절단(cut)하여, 절단부 양측의 원자편(fragment)들을 생성하는 단계와;(A2) individually cutting the single bond between the compound and the target protein to generate atomic fragments on both sides of the cut portion;
    (A3) 상기 생성된 원자편의 구성 원자수에 따라 원자편들을 필터링하는 단계와;(A3) filtering the atomic fragments according to the number of atoms in the generated atomic fragments;
    (A4) 필터링된 원자편(fragment)들에 대하여, 상호작용 관련도(Interaction Efficiency)를 산출하여, 결합 상호작용에 관련도가 기 설정된 값 이하인 원자편의 절단(cut)부를 앵커(anchor)로 선정하는 단계를 포함하여 수행됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.(A4) For the filtered atomic fragments, calculate the interaction efficiency and select the cut portion of the atomic fragment whose relevance for bonding interaction is less than a preset value as an anchor. A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, which is performed including the step of:
  3. 제 2 항에 있어서,According to claim 2,
    상기 상호작용 관련도(Interaction Efficiency)는,The interaction efficiency is,
    상기 원자편(fragment)을 구성하는 각 원자들의 결합에너지의 평균값으로 산출됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, characterized in that it is calculated as the average value of the binding energy of each atom constituting the atomic fragment.
  4. 제 1 항에 있어서,According to claim 1,
    상기 (B) 단계는,In step (B),
    (B1) 표적 단백질을 화합물 결합부(앵커(anchor)부위)를 중심으로, 기 설정된 크기의 영역을 추출하여 실린더 필터(sylinder)를 생성하는 단계와;(B1) generating a cylinder filter by extracting a region of a preset size centered on the compound binding site (anchor site) of the target protein;
    (B2) 상기 실린더 필터(sylinder)에 등간격으로 배열된 표시지점(dot)들을 설정하고, 상기 표시지점(dot)들을 단백질 원자의 상호작용(interaction) 에너지에 의해 구분하는 단계와;(B2) setting dots arranged at equal intervals on the cylinder filter and distinguishing the dots by interaction energy of protein atoms;
    (B3) 상기 화합물에서 상기 원자편(fragment)이 제거된 스케폴드(scaffold)의 앵커(anchor)부위를 실린더 필터(sylinder)의 앵커(anchor)부위에 접근시켜 배치하는 단계와;(B3) placing the anchor portion of the scaffold from which the atomic fragments have been removed from the compound by approaching the anchor portion of the cylinder filter;
    (B4) 상기 표시지점(dot) 중 상호작용(interaction) 에너지가 기 설정된 값 이상인 영역을 실린더 필터(sylinder)에서 제외시키는 단계와;(B4) excluding from the cylinder filter (sylinder) a region where the interaction energy is greater than a preset value among the dots;
    (B5) 상기 표시지점(dot) 공간 단위로 클러스팅(GMM clustering)하는 단계와;(B5) clustering (GMM clustering) in the dot space unit;
    (B6) 클러스팅된 영역들을 상기 스케폴드(scaffold) 앵커(anchor)와의 인접도 및 클러스팅된 영역의 크기에 따라 선별하여 일부 크러스팅 영역만을 도출하여 표적 단백질 내의 포켓 공간(target volume)으로 도출하고, 상기 포켓 영역의 크기를 산출하는 단계; 를 포함하여 수행됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.(B6) Clustered regions are selected according to the proximity to the scaffold anchor and the size of the clustered region to derive only some of the clustering regions and derive them as the pocket space (target volume) within the target protein. and calculating the size of the pocket area; A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, characterized in that it is performed including.
  5. 제 4 항에 있어서,According to claim 4,
    상기 (C) 단계는,In step (C),
    (C1) 상기 (B)단계에서 산출된 표적 단백질 내의 포켓 공간(target volume) 크기게 대응하는 크기의 R-group을 선택하는 단계와;(C1) selecting an R-group whose size corresponds to the size of the pocket space (target volume) in the target protein calculated in step (B);
    (C2) 선택된 R-group을 스케폴드(scaffold)의 앵커(anchor)에 결합하여, 유도체(derivative)를 생성하는 단계를 포함하여 수행됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.(C2) Binding pocket structure of the target protein through the artificial intelligence new drug platform, which is performed including the step of binding the selected R-group to the anchor of the scaffold to create a derivative. Derivative generation method using .
  6. 제 5 항에 있어서,According to claim 5,
    상기 (C2)단계는,In step (C2),
    상기 스케폴드(scaffold)의 앵커(anchor)와 결합되는 R-group의 결합 위치를 달리하여, 동일한 R-group에 대하여 복수의 서로 다른 결합구조를 갖는 유도체를 생성함을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.An artificial intelligence new drug platform characterized by generating derivatives with multiple different binding structures for the same R-group by varying the binding position of the R-group bound to the anchor of the scaffold. A method of generating a derivative using the binding pocket structure of the target protein.
  7. 제 6 항에 있어서,According to claim 6,
    상기 (C2)단계는,In step (C2),
    R-group과 스케폴드(scaffold)의 결합 형태(각도)를 달리하여, 동일한 결합구조를 갖는 R-group에 대하여 복수의 서로 다른 결합 형태를 갖는 유도체를 생성함을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.An artificial intelligence new drug platform characterized by creating derivatives with multiple different bond forms for R-groups with the same bond structure by varying the bond form (angle) of the R-group and scaffold. A method of generating a derivative using the binding pocket structure of a target protein.
  8. 제 7 항에 있어서,According to claim 7,
    상기 R-group과 스케폴드(scaffold)의 결합 형태(각도)는,The bond form (angle) of the R-group and scaffold is,
    표적 단백질의 앵커부분으로부터 근접한 부분을 추출하여 생성된 결합단(linker)을 추출고, 상기 결합단과 R-group의 결합 형태를 변화시켜 생성됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.A binding pocket for a target protein through an artificial intelligence new drug platform, which is created by extracting a linker created by extracting a portion adjacent to the anchor portion of the target protein and changing the binding form of the linker and the R-group. Method for creating derivatives using structure.
  9. 제 1 항에 있어서,According to claim 1,
    상기 (D) 단계는,In step (D),
    상기 결합단과 상기 R-group의 결합 형태를 실존 물질 데이터베이스와 대비하여 필터링하는 결합단(Linker) 필터링을 포함하여 구성됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform, comprising filtering the binding form of the binding group and the R-group by comparing it with an existing substance database. .
  10. 제 1 항에 있어서,According to claim 1,
    상기 (D) 단계는,In step (D),
    상기 유도체의 R-group의 결합 형태 별로 상기 실린더 필터(sylinder)에 결합시킨 후, 포켓내에 생성되는 충돌(clash)량에 따라 유도체를 필터링하는 결합형태 필터링(Shape ftering)을 포함하여 구성됨을 특징으로 하는 인공지능 신약 플랫폼을 통한 표적 단백질의 결합 포켓 구조를 이용한 유도체 생성 방법.After binding the derivative to the cylinder filter according to the bond type of the R-group, the derivative is filtered according to the amount of collision generated in the pocket. A method of generating a derivative using the binding pocket structure of a target protein through an artificial intelligence new drug platform.
PCT/KR2023/014453 2022-09-21 2023-09-21 Method for generating derivatives using binding pocket structure of target protein through artificial intelligence drug discovery platform WO2024063583A1 (en)

Applications Claiming Priority (24)

Application Number Priority Date Filing Date Title
KR20220119657 2022-09-21
KR10-2022-0119657 2022-09-21
KR10-2022-0119661 2022-09-21
KR20220119666 2022-09-21
KR20220119660 2022-09-21
KR10-2022-0119660 2022-09-21
KR20220119655 2022-09-21
KR10-2022-0119666 2022-09-21
KR10-2022-0119659 2022-09-21
KR20220119659 2022-09-21
KR10-2022-0119655 2022-09-21
KR20220119661 2022-09-21
KR1020230126606A KR20240040666A (en) 2022-09-21 2023-09-21 Hit-compound discovery methods for AI drug platform using 3D Protein-Ligand interaction data
KR10-2023-0126611 2023-09-21
KR1020230126611A KR20240040671A (en) 2022-09-21 2023-09-21 A distributed and parallelized cloud platform system and service method for large-scale workflows
KR10-2023-0126610 2023-09-21
KR1020230126608A KR20240040668A (en) 2022-09-21 2023-09-21 Protein-compound docking stability Aanalysis methods for AI drug platform based on MD-simulation data
KR1020230126609A KR20240040669A (en) 2022-09-21 2023-09-21 Derivative creating methods using docking-pocket structure of target protein for AI drug platform
KR1020230126607A KR20240040667A (en) 2022-09-21 2023-09-21 Prediction method of protein-chemical complex structure using large scale conformer generation and 3D-CNN deep transfer learning model
KR10-2023-0126608 2023-09-21
KR10-2023-0126606 2023-09-21
KR10-2023-0126607 2023-09-21
KR1020230126610A KR20240040670A (en) 2022-09-21 2023-09-21 Aanalysis methods of protein-ligand docking structure based on vector for AI drug platform
KR10-2023-0126609 2023-09-21

Publications (1)

Publication Number Publication Date
WO2024063583A1 true WO2024063583A1 (en) 2024-03-28

Family

ID=90455003

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/014453 WO2024063583A1 (en) 2022-09-21 2023-09-21 Method for generating derivatives using binding pocket structure of target protein through artificial intelligence drug discovery platform

Country Status (1)

Country Link
WO (1) WO2024063583A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200128710A (en) * 2018-03-05 2020-11-16 더 보드 어브 트러스티스 어브 더 리랜드 스탠포드 주니어 유니버시티 A method for improving binding and activity prediction based on machine learning and molecular simulation
KR102296188B1 (en) * 2019-10-21 2021-09-01 주식회사 스탠다임 Methods and apparatus for designing compounds

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20200128710A (en) * 2018-03-05 2020-11-16 더 보드 어브 트러스티스 어브 더 리랜드 스탠포드 주니어 유니버시티 A method for improving binding and activity prediction based on machine learning and molecular simulation
KR102296188B1 (en) * 2019-10-21 2021-09-01 주식회사 스탠다임 Methods and apparatus for designing compounds

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SHI WENTAO, SINGHA MANALI, SRIVASTAVA GOPAL, PU LIMENG, RAMANUJAM J., BRYLINSKI MICHAL: "Pocket2Drug: An Encoder-Decoder Deep Neural Network for the Target-Based Drug Design", FRONTIERS IN PHARMACOLOGY, FRONTIERS RESEARCH FOUNDATION, CH, vol. 13, 11 March 2022 (2022-03-11), CH , XP093148893, ISSN: 1663-9812, DOI: 10.3389/fphar.2022.837715 *
ZHANG HAIPING, GONG XIAOHUA, PENG YUN, SARAVANAN KONDA MANI, BIAN HENGWEI, ZHANG JOHN Z. H., WEI YANJIE, PAN YI, YANG YANG: "An Efficient Modern Strategy to Screen Drug Candidates Targeting RdRp of SARS-CoV-2 With Potentially High Selectivity and Specificity", FRONTIERS IN CHEMISTRY, FRONTIERS MEDIA, LAUSANNE, vol. 10, 12 July 2022 (2022-07-12), Lausanne , pages 933102, XP093148896, ISSN: 2296-2646, DOI: 10.3389/fchem.2022.933102 *
ZHANG HAIPING; SARAVANAN KONDA MANI; YANG YANG; HOSSAIN MD. TOFAZZAL; LI JUNXIN; REN XIAOHU; PAN YI; WEI YANJIE: "Deep Learning Based Drug Screening for Novel Coronavirus 2019-nCov", INTERDISCIPLINARY SCIENCES: COMPUTATIONAL LIFE SCIENCES, INTERNATIONAL ASSOCIATION OF SCIENTISTS IN THE INTERDISCIPLINARY AREAS, CA, vol. 12, no. 3, 1 June 2020 (2020-06-01), CA , pages 368 - 376, XP037212443, ISSN: 1913-2751, DOI: 10.1007/s12539-020-00376-6 *

Similar Documents

Publication Publication Date Title
Nikolsky et al. Biological networks and analysis of experimental data in drug discovery
Terwilliger et al. phenix. mr_rosetta: molecular replacement and model rebuilding with Phenix and Rosetta
Haghighi et al. High-dimensional gene expression and morphology profiles of cells across 28,000 genetic and chemical perturbations
WO2024063583A1 (en) Method for generating derivatives using binding pocket structure of target protein through artificial intelligence drug discovery platform
Xu et al. CPredictor3. 0: detecting protein complexes from PPI networks with expression data and functional annotations
Kwoh et al. Genetic studies of diseases: Network analysis approach for biology
CN115240762A (en) Multi-scale small molecule virtual screening method and system
Fancello et al. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics
WO2024063581A1 (en) Protein-compound optimal binding structure prediction method using large-capacity conformer generation and three-dimensional convolutional deep transfer learning model
WO2024063584A1 (en) Central-atom-vector-based protein-ligand binding structure analysis method of artificial intelligence new drug platform
CN111312342B (en) Electronic structure computer-aided drug design system
CN107832585A (en) A kind of RNAseq data analysing methods
Konc et al. Protein binding sites for drug design
WO2024063582A1 (en) Method for analyzing protein-compound inter-binding stability by artificial intelligence drug discovery platform using molecular dynamics simulation data
Prunier et al. Fast alignment of mass spectra in large proteomics datasets, capturing dissimilarities arising from multiple complex modifications of peptides
Heinzel et al. Data graphs for linking clinical phenotype and molecular feature space
WO2024063580A1 (en) Method for discovering effective substances of artificial intelligence new drug platform reflecting 3d structural information between protein and ligand
Kanehisa Prediction of higher order functional networks from genomic data
Burks et al. Integration of Competing Ancillary Assertions in Genome Assembly.
US8504302B2 (en) Template constrained fragment alignment used to identify fragments of similar shape and activity in drug development
Zhang et al. A nonparametric model for quality control of database search results in shotgun proteomics
KR20240040670A (en) Aanalysis methods of protein-ligand docking structure based on vector for AI drug platform
WO2023177171A1 (en) Retrosynthetic translation method using transformer and atomic environment, and device for performing same
Xie et al. Overview of Machine Learning Methods for Genome-Wide Association Analysis
Anandkumar et al. Computer applications making rapid advances in high throughput microbial proteomics (HTMP)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23868635

Country of ref document: EP

Kind code of ref document: A1