WO2023155724A1 - 设计配体分子的方法和装置 - Google Patents

设计配体分子的方法和装置 Download PDF

Info

Publication number
WO2023155724A1
WO2023155724A1 PCT/CN2023/075067 CN2023075067W WO2023155724A1 WO 2023155724 A1 WO2023155724 A1 WO 2023155724A1 CN 2023075067 W CN2023075067 W CN 2023075067W WO 2023155724 A1 WO2023155724 A1 WO 2023155724A1
Authority
WO
WIPO (PCT)
Prior art keywords
molecular structure
editing
molecular
target
determining
Prior art date
Application number
PCT/CN2023/075067
Other languages
English (en)
French (fr)
Inventor
杨雨薇
卢家睿
张朔
周浩
Original Assignee
北京有竹居网络技术有限公司
脸萌有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司, 脸萌有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023155724A1 publication Critical patent/WO2023155724A1/zh

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs

Definitions

  • Various implementations of the present disclosure relate to the field of computers, and more specifically, to methods, devices, devices and computer storage media for designing ligand molecules.
  • a method for designing a ligand molecule comprises: editing a first 2D molecular structure to determine a second 2D molecular structure, the editing at least comprising: deleting a 2D structural segment from the first 2D molecular structure, or adding a 2D structural segment to the first 2D molecular structure; A first 3D molecular structure corresponding to the 2D molecular structure and editing, determining a group of candidate 3D molecular structures corresponding to the second 2D molecular structure; based on the binding between a group of candidate 3D molecular structures and the target molecule, determining the a second 3D molecular structure corresponding to the second 2D molecular structure; and based on the second 3D molecular structure, determining the target structure of the ligand molecule for the target molecule.
  • editing the first 2D molecular structure comprises: using an operation prediction model and based on feature representations corresponding to the first 2D molecular structure, determining an editing operation to be applied to the first 2D molecular structure; and based on the determined The editing operation edits the first 2D molecular structure.
  • determining the editing operations to be applied to the first 2D molecular structure comprises: using an operation prediction model and based on the feature representation, determining a set of probabilities associated with a predetermined set of editing operations, wherein the set of predetermined editing operations The operation includes: adding a specific 2D structure fragment at a specific atom in the first 2D molecular structure, or deleting a specific bond in the first 2D molecular structure; and based on a set of probabilities, determining from a set of predetermined editing operations to be applied to Editing operations of the first 2D molecular structure.
  • adding a 2D structure fragment includes: selecting a target 2D structure fragment from a fragment library, the fragment library including a plurality of 2D structure fragments; and adding the target 2D structure fragment to a specific atom in the first 2D molecular structure.
  • determining a set of candidate 3D molecular structures corresponding to the second 2D molecular structure comprises: based on editing and using the first 3D molecular structure, determining a set of candidate 3D molecular structures, wherein the set of candidate structures has the same structure as the first 3D molecular structure.
  • the 3D molecular structure corresponds to the partial 3D structure
  • the partial 3D structure corresponds to the partial 2D structure not modified by the editing operation.
  • the editing is to add a target 2D structure fragment to the first 2D molecular structure
  • determining the set of candidate 3D molecular structures comprises: determining a configurational constraint based on the first 3D molecular structure corresponding to the first 2D molecular structure; A plurality of candidate 3D molecular structures corresponding to the editing are generated based on configurational constraints, the configurational constraints are used to limit the degree to which the first 3D molecular structure is adjusted during the process of generating the plurality of candidate 3D molecular structures; and based on the configurational constraints, Energy optimization is performed on a plurality of candidate 3D molecular structures to determine a set of candidate 3D molecular structures.
  • binding is determined based on the free energy of binding between a set of candidate 3D structural fragments and a target molecule.
  • determining the target structure package for the ligand molecule of the target molecule comprising: determining a first evaluation for the second 3D molecular structure, the first evaluation indicating at least one of the following: target binding between the second 3D molecular structure and the target molecule, drug-like QED of the second 3D molecular structure, or the synthesizeability of the second 3D molecular structure; based on the first evaluation and the second evaluation for the first 3D molecular structure, determining the probability that the second 2D molecular structure is accepted; and according to the probability, based on the second 2D molecular structure and the second evaluation Two 3D molecular structures determine the target structure.
  • determining the target structure based on the second 2D molecular structure and the second 3D molecular structure comprises: in response to the first evaluation being superior to the second evaluation, training for predictive editing operations based on edits to the first 2D molecular structure an edited model; using the trained edited model, edit the second 2D molecular structure to determine a third 2D molecular structure; and based on the third 2D molecular structure and the second 3D molecular structure, determine a target structure of a ligand molecule for the target molecule .
  • determining the first estimate for the second 3D molecular structure comprises: based on the target binding, determining a first normalized value, the first normalized value decreases as the free energy of binding indicated by the target binding increases; Drug-likeness, determine a second normalized value, the second normalized value is increased based on the increase of drug-likeness; based on the synthesizable, determine the third normalized value, the third normalized value is decreased based on the synthetic difficulty indicated by the synthesizable small; and based on the first normalized value, the second normalized value, and the third normalized value, determining a first evaluation.
  • determining the first rating based on the first normalized value, the second normalized value, and the third normalized value includes: based on a first weight associated with the first normalized value, a second weight associated with the second normalized value A weight and a third weight associated with the third normalized value, the first evaluation is determined based on the first normalized value, the second normalized value and the third normalized value.
  • the first 2D molecular structure is generated by applying a first number of editing operations to the initial 2D molecular structure, and the probability is also based on the first number.
  • the first 2D molecular structure is generated by applying a first number of editing operations to the initial 2D molecular structure, and determining the target structure of the ligand molecule for the target molecule includes: incrementing the first number to determine a second number; and if the second number reaches a predetermined threshold, determining the second 3D molecular structure as the target structure.
  • an apparatus for designing ligand molecules includes: an editing module configured to edit a first 2D molecular structure to determine a second 2D molecular structure, the editing at least comprising: deleting a 2D structural segment from the first 2D molecular structure, or adding a 2D structure to the first 2D molecular structure a structure fragment; and a generation module configured to determine a set of candidate 3D molecular structures corresponding to the second 2D molecular structure based on the first 3D molecular structure corresponding to the first 2D molecular structure and the edit; and based on the set of candidate 3D molecular structures The combination between the structure and the target molecule determines the second 3D molecular structure corresponding to the second 2D molecular structure; wherein the editing module is further configured to: determine the target of the ligand molecule for the target molecule based on the second 3D molecular structure structure.
  • the editing module is further configured to: using the operation prediction model and based on the feature representation corresponding to the first 2D molecular structure, determine an editing operation to be applied to the first 2D molecular structure; and based on the determined editing To operate, edit the first 2D molecular structure.
  • the editing module is further configured to determine, using the operation prediction model and based on the feature representation, a set of probabilities associated with a set of predetermined editing operations comprising: adding a specific 2D structural fragment at a specific atom in the structure, or deleting a specific bond in the first 2D molecular structure; and based on a set of probabilities, determining an editing operation to be applied to the first 2D molecular structure from a set of predetermined editing operations .
  • the editing module is further configured to: select a target 2D structure fragment from a fragment library, the fragment library comprising a plurality of 2D structure fragments; and add the target 2D structure fragment to a specific atom in the first 2D molecular structure .
  • the generating module is further configured to: determine a set of candidate 3D molecular structures based on editing and utilizing the first 3D molecular structure, wherein the set of candidate structures has a partial 3D structure corresponding to the first 3D molecular structure, a partial The 3D structure corresponds to the part of the 2D structure not modified by editing operations.
  • the editing is to add a target 2D structure fragment to the first 2D molecular structure
  • the generation module is further configured to: determine a configurational constraint based on the first 3D molecular structure corresponding to the first 2D molecular structure; Type constraints, generate and edit multiple candidate 3D molecular structures, configuration constraints are used to limit the first 3D molecular structure in the generated the degree to which the process of the plurality of candidate 3D molecular structures is tuned; and based on the conformational constraints, performing energy optimization on the plurality of candidate 3D molecular structures to determine a set of candidate 3D molecular structures.
  • binding is determined based on the free energy of binding between a set of candidate 3D structural fragments and a target molecule.
  • the editing module is further configured to: determine a first evaluation for the second 3D molecular structure, the first evaluation indicating at least one of: target binding between the second 3D molecular structure and the target molecule , the drug-like QED of the second 3D molecular structure, or the synthesisability of the second 3D molecular structure; based on the first evaluation and the second evaluation for the first 3D molecular structure, determining a probability of acceptance of the second 2D molecular structure; and Probabilistically, the target structure is determined based on the second 2D molecular structure and the second 3D molecular structure.
  • the editing module is further configured to: train an editing model for predicting editing operations based on edits to the first 2D molecular structure in response to the first evaluation being superior to the second evaluation; utilizing the trained editing model , editing the second 2D molecular structure to determine a third 2D molecular structure; and determining a target structure of a ligand molecule for the target molecule based on the third 2D molecular structure and the second 3D molecular structure.
  • the generating module is further configured to: determine a first normalized value based on target binding, the first normalized value decreases as the free energy of binding indicated by target binding increases; determine a second normalized value based on drug-likeness a normalized value, a second normalized value increased based on an increase in drug-likeness; a third normalized value determined based on synthesizability, a third normalized value decreased based on an increased difficulty of synthesis indicated by the synthesizable; and based on the first The normalized value, the second normalized value and the third normalized value determine the first evaluation.
  • the generation module is further configured to: based on the first weight associated with the first normalized value, the second weight associated with the second normalized value, and the third weight associated with the third normalized value, A first evaluation is determined based on the first normalized value, the second normalized value and the third normalized value.
  • the first 2D molecular structure is generated by applying a first number of editing operations to the initial 2D molecular structure, and the probability is also based on the first number.
  • the first 2D molecular structure is a response to the initial 2D molecular structure
  • the first number of editing operations is used to generate, and the editing module is further configured to: increment the first number to determine the second number; and determine the second 3D molecular structure as the target structure if the second number reaches a predetermined threshold.
  • an electronic device including: a memory and a processor; wherein the memory is used to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of the first aspect.
  • a computer-readable storage medium on which one or more computer instructions are stored, wherein one or more computer instructions are executed by a processor to implement the method according to the first aspect of the present disclosure .
  • a computer program product comprising one or more computer instructions, wherein the one or more computer instructions are executed by a processor to implement the method according to the first aspect of the present disclosure.
  • the 3D molecular structure of the previous state can be used to construct a new 3D molecular structure for evaluating whether the edited 3D molecular structure (or its corresponding 2D molecular structure) is acceptable, To determine the target structure of the final ligand molecule.
  • the embodiments of the present disclosure can improve the construction efficiency of the 3D molecular structure, especially the search for the binding configuration between the 3D molecular structure and the target molecule, thereby improving the efficiency of determining the ligand molecule.
  • Figure 1 shows a schematic block diagram of a computing device capable of implementing some embodiments of the present disclosure
  • Figure 2 shows a schematic block diagram of a design module according to some embodiments of the present disclosure
  • Figure 3 shows a schematic diagram of building a 3D molecular structure according to some embodiments of the present disclosure
  • Figure 4 shows an illustration of building a 3D molecular structure according to still other embodiments of the present disclosure. intent.
  • Figure 5 shows a flowchart of an example method for designing ligand molecules according to some embodiments of the present disclosure.
  • a scheme for designing ligand molecules is provided.
  • the first 2D molecular structure can be edited to determine the second 2D molecular structure, wherein the editing at least includes: deleting 2D structural fragments from the first 2D molecular structure, or adding 2D structural fragments to the first 2D molecular structure.
  • a set of candidate 3D molecular structures corresponding to the second 2D molecular structure can be determined, and based on the relationship between the set of candidate 3D molecular structures and the target molecule The binding properties of the determine a second 3D molecular structure corresponding to the second 2D molecular structure.
  • the target structure of the ligand molecule for the target molecule can be determined based on the second 3D molecular structure.
  • Various embodiments of the present disclosure are able to utilize prior state 3D molecular structures to construct new 3D molecular structure for evaluating whether it can be used to identify ligand molecules. Based on this method, the embodiments of the present disclosure can improve the construction efficiency of the 3D molecular structure, especially the search for the binding configuration between the 3D molecular structure and the target molecule, thereby improving the efficiency of determining the ligand molecule.
  • Fig. 1 shows a schematic block diagram of an example device 100 that may be used to implement embodiments of the present disclosure. It should be understood that the device 100 shown in FIG. 1 is exemplary only and should not constitute any limitation on the functionality and scope of the implementations described in this disclosure. As shown in FIG. 1 , components of device 100 may include, but are not limited to, one or more processors or processing units 110, memory 120, storage device 130, one or more communication units 140, one or more input devices 150, and a or multiple output devices 160 .
  • the device 100 may be implemented as various user terminals or service terminals.
  • the service terminal may be a server, a large computing device, etc. provided by various service providers.
  • User terminals such as any type of mobile, stationary or portable terminal, including mobile handsets, multimedia computers, multimedia tablets, Internet nodes, communicators, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, personal Communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio/video players, digital cameras/camcorders, pointing devices, television receivers, radio broadcast receivers, e-book devices, gaming devices, or any Combinations, including accessories and peripherals for these devices or any combination thereof.
  • PCS personal Communication system
  • PDAs personal digital assistants
  • audio/video players digital cameras/camcorders
  • pointing devices television receivers, radio broadcast receivers, e-book devices, gaming devices, or any Combinations, including accessories and peripherals for these devices or any combination thereof.
  • device 100 can support any type of user-directed interface (such as "wear
  • the processing unit 110 may be an actual or virtual processor and is capable of performing various processes according to programs stored in the memory 120 . In a multi-processor system, multiple processing units execute computer-executable instructions in parallel to increase the parallel processing capability of the device 100 .
  • the processing unit 110 may also be called a central processing unit (CPU), a microprocessor, a controller, a microcontroller.
  • Device 100 typically includes a plurality of computer storage media. Such a medium can be a device Any available media accessible by 100, including but not limited to volatile and nonvolatile media, removable and non-removable media.
  • Memory 120 can be volatile memory (eg, registers, cache, random access memory (RAM)), nonvolatile memory (eg, read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM) , flash memory) or some combination thereof.
  • Memory 120 may include one or more program modules 125 configured to perform the functions of various implementations described herein. The design module 125 can be accessed and executed by the processing unit 110 to realize corresponding functions.
  • Storage device 130 may be a removable or non-removable medium, and may include machine-readable media that can be used to store information and/or data and that can be accessed within device 100 .
  • the functions of the components of device 100 may be implemented in a single computing cluster or as a plurality of computing machines capable of communicating via communication links.
  • device 100 may operate in a networked environment using logical connections to one or more other servers, personal computers (PCs), or another general network node.
  • the device 100 can also communicate with one or more external devices (not shown) through the communication unit 140 as required, such as a database 145, other storage devices, servers, display devices, etc., and one or more external devices that allow users to communicate with the device.
  • the devices 100 interacts communicate with, or with any device (eg, network card, modem, etc.) that enables device 100 to communicate with one or more other computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • the input device 150 may be one or more various input devices, such as a mouse, a keyboard, a trackball, a voice input device, a camera, and the like.
  • Output device 160 may be one or more output devices, such as a display, speakers, printer, or the like.
  • device 100 may receive an identification corresponding to a target molecule (eg, a targeting protein molecule), eg, via input device 150 .
  • a target molecule eg, a targeting protein molecule
  • input device 150 may receive an identification corresponding to a target molecule (eg, a targeting protein molecule), eg, via input device 150 .
  • a user may input a PDB file via the input device 150 to indicate the corresponding target molecule.
  • the design module 125 can iteratively edit the molecular structure using the editing model to determine the target structure of the final ligand molecule 170 .
  • the process of determining the target structure of the ligand molecule 170 will be described in detail below.
  • output ligand molecules 170 in FIG. 1 are shown as 2D molecular structures structure.
  • output device 160 may output a 3D molecular structure, for example.
  • FIG. 2 shows a block diagram of the design module 125 according to some embodiments of the present disclosure.
  • the design module 125 includes a plurality of modules for implementing an exemplary process of designing a ligand molecule according to some embodiments of the present disclosure.
  • the design module 125 includes an editing module 230 and a generating module 240 .
  • editing module 230 can edit first 2D molecular structure 220 .
  • editing may include deleting a 2D structure segment from the first 2D molecular structure 220, and such editing is also referred to as a "delete editing operation”.
  • editing may also include adding a new 2D structure segment to the first 2D molecular structure 220, and such editing is also referred to as an "add editing operation”.
  • the editing module 230 can determine the bond to be deleted in the first 2D molecular structure 220, and correspondingly delete the 2D structure fragment associated with the bond to be deleted from the first molecular structure. Exemplarily, the editing module 230 may delete the group associated with the bond to be deleted from the first 2D molecular structure 220 .
  • the editing module 230 can determine the atoms to be edited in the first 2D molecular structure 220 , and accordingly select a 2D structure fragment from the fragment library 240 to append to the first 2D molecular structure 220 .
  • atoms to be edited in the first 2D molecular structure 220 can add new bonds with the selected 2D fragments to construct a new molecular structure.
  • fragment library 240 may include a plurality of 2D structure fragments 250 .
  • number of 2D structure fragments 250 may be determined, eg, based on experimental knowledge.
  • the plurality of 2D structural fragments 250 may also be constructed based on existing drug molecules.
  • the first 2D molecular structure 220 may, for example, be formed from the initial 2D molecular structure 210 (eg, the ethane molecule C 2 H 6 shown in FIG. 2 ) through at least one editing process as discussed above. acquired.
  • the first 2D molecular structure 220 may also be an initial 2D molecular structure.
  • the initial 2D molecular structure its For example, it may be randomly selected by the editing module 230, or determined by the editing module 230 according to input.
  • editing module 230 may edit first 2D molecular structure 220 to obtain second 2D molecular structure 260 using the deployed editing model.
  • the editing model can be implemented based on a machine learning model, for example. Specific details about the editing module 230 and the editing model will be described in detail below.
  • the design module 125 may also include a generation module 270 .
  • generation module 270 may be used to determine a 3D molecular structure corresponding to second 2D molecular structure 260 .
  • the generation module 270 can efficiently construct and
  • the second 3D molecular structure 290 corresponds to the second 2D molecular structure 260 .
  • the detailed process of constructing the second 3D molecular structure 290 will be described below in conjunction with FIG. 3 and FIG. 4 .
  • editing module 230 and/or production module 270 may also determine an evaluation for second 3D molecular structure 290 (also referred to as a first evaluation for convenience of description). For example, editing module 230 may determine the first evaluation based on the binding between second 3D molecular structure 290 and target molecule 170 . Additionally, generation module 270 may also determine the first evaluation based on, for example, drug-like QED and/or synthesizable.
  • the editing module 230 may further determine whether the second 2D molecular structure 260 is acceptable based on the first evaluation with the second 3D molecular structure 290 and the second evaluation with respect to the first 3D molecular structure 280 . If the second 2D molecular structure 260 is determined to be acceptable, it can, for example, be determined as the next state of the Markov chain to iteratively determine the final target structure 170 of the ligand molecule.
  • the editing module 230 may discard the second 2D molecular structure and continue to use the first 2D molecular structure 220 as a basis to determine a new Editing to iteratively determine the target structure 170 of the final ligand molecule.
  • the editing module 230 can determine the Second evaluation of molecular structure 280 . In some embodiments, if the first evaluation is better than the second evaluation, the editing module 230 may further train the editing model deployed in the editing module 230 based on the editing operations performed on the first 2D molecular structure 220 .
  • editing module 230 may iteratively perform editing using a trained editing model based on second 2D molecular structure 260 until the target structure 170 of the ligand molecule for the target molecule is determined.
  • the editing module 230 may terminate the iteration after performing a predetermined number of edits on the initial 2D molecular structure 210 , and determine the final output 2D molecular structure as the target structure 170 of the ligand molecule. Alternatively, the editing module 230 may also determine the 3D molecular structure corresponding to the final 2D molecular structure as the target structure 170 of the ligand molecule.
  • the editing module 230 may also determine whether to converge based on the degree of change in the evaluation of the edited molecular structure for each iteration. For example, if the estimated change after a predetermined number of iterations is less than a predetermined threshold, the editing module 230 may determine that convergence has been achieved, and determine the final output molecular structure as the target structure of the ligand molecule.
  • editing module 230 is configured to edit first 2D molecular structure 220 using the deployed editing model.
  • the editing model may be implemented, for example, based on a suitable machine learning model.
  • the editing module 230 may first determine a feature representation of the first 2D molecular structure 220 .
  • the first 2D molecular structure 220 may be represented as a graph x, which may have n atoms and n bonds, for example.
  • editing module 230 may represent first 2D molecular structure 220 as:
  • a represents the index of the atom in the first 2D molecular structure 220, is the hidden layer feature representation corresponding to the atom; w and v represent the bond b in the first 2D molecular structure 220
  • the atom connected, the hidden layer feature corresponding to the bond is expressed as Represents an MPNN (Message Passing Neural Network, message passing neural network) whose model parameter is ⁇ .
  • MPNN Message Passing Neural Network, message passing neural network
  • the editing module 230 may determine a set of probabilities associated with a set of predetermined editing operations using the operation prediction model and based on the feature representation determined according to equations (1) and/or (2).
  • Such predetermined editing operations include, for example: adding a specific 2D structure fragment at a specific atom in the first 2D molecular structure 220 , or deleting a specific bond in the first 2D molecular structure 220 .
  • MLP multi-layer perceptron
  • ⁇ ( ⁇ ) represents the Softmax operation.
  • the editing module 230 may determine probabilities corresponding to different predetermined editing operations based on the following formula: q(x′ (u, k)
  • x) p c (add
  • x' (u, k) represents the molecule obtained by adding the k-th 2D structure fragment in the fragment library 240 to the atom u; x' (b) represents deleting the bond b and the attached The resulting molecule after splicing the fragments.
  • the editing module 230 may determine an editing operation to be applied to the first 2D molecular structure 220 from a set of predetermined editing operations based on the determined set of probabilities. Exemplarily, the editing module 230 may sample and determine the applied editing operations based on the determined set of probabilities.
  • generation module 270 may construct a second 3D molecular structure for second 2D molecular structure 260 based on first 3D molecular structure 280 corresponding to first 2D molecular structure 220, as discussed above with reference to FIG. Structure 290.
  • the generation module 270 can determine a set of candidate 3D molecular structures based on the edits applied to the first 2D molecular structure 220 and using the first 3D molecular structure 280, wherein the set of candidate 3D molecular structures have the same
  • the 3D molecular structure 280 corresponds to a partial 3D structure corresponding to the partial 2D structure not modified by the editing operation.
  • the generating module 270 can construct a constrained 3D molecular structure based on the first 3D molecular structure 280 , so as to more efficiently determine the second 3D molecular structure 290 .
  • FIG. 3 shows a schematic diagram 300 of constructing a 3D molecular structure according to some embodiments of the present disclosure.
  • the generation module 270 can consider the first 3D molecular structure during the generation process, that is, introduce Corresponding configuration constraints.
  • the generation module 270 may determine a configuration constraint based on the first 3D molecular structure, and the configuration constraint is used to limit the extent to which the first 3D molecular structure is adjusted during the subsequent generation process.
  • the generating module 270 may determine constraints related to interatomic distances based on the first 3D molecular structure (eg, the 3D molecular structure 330 in FIG. 3 , which corresponds to the 2D molecular structure 310 ).
  • the generation module 270 can generate multiple candidate 3D molecular structures based on the configuration constraints.
  • the generation module 270 may utilize appropriate configuration generation tools to generate multiple candidate 3D molecular structures under configuration constraints.
  • the generation module 270 may further perform energy optimization on multiple candidate 3D molecular structures based on configurational constraints, thereby determining a set of candidate 3D molecular structures (eg, candidate 3D molecular structures 340 in FIG. 3 ).
  • the generation module 270 can also determine the second 3D molecular structure corresponding to the second 2D molecular structure 260 based on the binding between the group of candidate 3D molecular structures and the target molecule.
  • Molecular Structure 290 the generation module 270 can determine the target 3D molecular structure with the minimum binding free energy with the target molecule in the group of candidate 3D molecular structures, and use it as the target 3D molecular structure with the second 2D molecular structure (for example, the 2D molecular structure 320 in FIG. 3 ). , which is determined by performing an add edit operation on the 2D molecular structure 310) corresponding to the second 3D molecular structure (eg, the 3D molecular structure 350 in FIG. 3 ).
  • Fig. 4 shows a schematic diagram of constructing a 3D molecular structure according to still other embodiments of the present disclosure.
  • the generation module 270 may retain the unmodified parts of the first 3D molecular structure (for example, the 3D molecular structure 430 in FIG. 4 , which corresponds to the 2D molecular structure 410 ). Deletes the part removed by the editing operation.
  • the generation model 270 can release the retained part of the 3D molecular structure, and perform local energy optimization to determine a candidate 3D molecular structure (for example, the 3D molecular structure 440 in FIG. 4 ).
  • the generation module 270 can also determine the second 3D molecular structure 290 corresponding to the second 2D molecular structure 260 based on the binding property between the candidate 3D molecular structure and the target molecule. Specifically, the generation module 270 can determine the target 3D molecular structure based on the candidate 3D molecular structure by minimizing the free energy of binding with the target molecule, and use it as a combination with the second 2D molecular structure (for example, in FIG. 4 2D molecular structure 420 of , which is determined by performing a delete-edit operation on 2D molecular structure 410 ) corresponds to a second 3D molecular structure (eg, 3D molecular structure 450 in FIG. 4 ).
  • a second 3D molecular structure eg, 3D molecular structure 450 in FIG. 4 .
  • the embodiments of the present disclosure can greatly reduce the computational overhead required to construct the 3D molecular structure, thereby improving the efficiency of constructing the 3D molecular structure.
  • the construction process based on the constrained 3D molecular structure can greatly improve the computational efficiency of searching for the minimum binding energy.
  • the editing module 230 can also self-supervisedly train the editor based on the editing operations applied to the first 2D molecular structure 220. Model.
  • the editing operations applied to the first 2D molecular structure 220 are determined based on probabilistic sampling.
  • the design module 125 may, for example, perform multiple samplings in parallel to obtain multiple candidate 2D molecular structures based on the first 2D molecular structure 220 .
  • editing module 230 can determine an evaluation for each candidate 2D molecular structure.
  • the evaluation can be based on, for example: the binding property between the 3D molecular structure corresponding to the candidate 2D molecular structure and the target molecule, the drug-like QED (Quantitative Estimate of Drug-likeness) of the 3D molecular structure and/or Synthesis of this 3D molecular structure.
  • embodiments of the present disclosure can simultaneously achieve multi-target ligand molecule generation.
  • editing module 230 can determine a second normalized value that increases based on an increase in drug-likeness.
  • QED( ⁇ ) represents the QED score, which can be calculated by RDKit, for example.
  • editing module 230 may determine a third normalized value that decreases based on the increased difficulty of synthesis indicated by synthesizable.
  • editing module 230 may determine the first evaluation based on the first normalization value, the second normalization value and the third normalization value. In some embodiments, editing module 230 may, based on the first weight associated with the first normalized value, the second weight associated with the second normalized value, and the third weight associated with the third normalized value, according to the first normalized value, the second normalized value and the third normalized value determine the first evaluation.
  • the first evaluation can be expressed as:
  • w 1 , w 2 and w 3 respectively denote the weight corresponding to drug-likeness, the weight corresponding to synthesizable and the weight corresponding to associativity.
  • editing module 230 may determine a probability that second 2D molecular structure 260 is accepted based on the first evaluation and the second evaluation for first 2D molecular structure 220 . This probability can be expressed, for example, as:
  • ⁇ (x′) represents the first evaluation for the second 2D molecular structure 260
  • ⁇ ( x ) represents the second evaluation for the first 2D molecular structure 220
  • T represents the temperature coefficient, which is determined based on the annealing mechanism.
  • the temperature coefficient T is determined based on the number of editing operations the first 2D molecular structure has undergone. Exemplarily, if the first 2D molecular structure is generated by applying a first number of editing operations to the initial 2D molecular structure, the temperature coefficient T is associated with the first number.
  • the design module 125 can determine the probability of whether the second 2D molecular structure 260 is accepted or rejected based on equation (13). As discussed with reference to FIG. 2, if the second 2D molecular structure 260 is accepted, the reference module 125 may further iteratively edit based on the second 2D molecular structure 260 to determine the target structure 170 of the ligand molecule. Conversely, if the second 2D molecular structure is rejected, the design module 125 may further iteratively edit based on the first 2D molecular structure 220 for determining the target structure 170 of the ligand molecule.
  • the editing module 230 may further train an editing model based on the editing operations corresponding to generating the candidate 2D molecular structures.
  • training the editing model may be based on maximum likelihood estimation (MLE).
  • the editing module 230 may terminate the iteration after performing a predetermined number of edits on the initial 2D molecular structure 210 , and determine the final output 2D molecular structure as the target structure 170 of the ligand molecule.
  • the edit module 230 may utilize the retrained edit model to generate a new third 2D molecular structure based on the second 2D molecular structure, and perform iteratively thereby. During the iterative process, the editing module 230 may increment the number of edited times, and exit the iteration until a predetermined number of times has been edited.
  • the editing module 230 may determine the second 3D molecular structure 290 and/or the second 2D molecular structure 260 as the target structure.
  • the editing module 230 may also determine whether to converge based on the degree of change in the evaluation of the edited molecular structure for each iteration. For example, if the estimated change after a predetermined number of iterations is less than a predetermined threshold, the editing module 230 may determine that convergence has been achieved, and determine the final output molecular structure as the target structure of the ligand molecule.
  • Method 500 shows a flowchart of a method 500 for designing ligand molecules according to some implementations of the present disclosure.
  • Method 500 may be implemented by computing device 100 , for example at design module 125 in memory 120 of computing device 100 .
  • the computing device 100 edits the first 2D molecular structure to determine the second 2D molecular structure, and the editing at least includes: deleting a 2D structural segment from the first 2D molecular structure, or adding to the first 2D molecular structure 2D structural fragments.
  • the computing device 100 determines a set of candidate 3D molecular structures corresponding to the second 2D molecular structure based on the first 3D molecular structure corresponding to the first 2D molecular structure and the edit.
  • the computing device 100 determines a second 3D molecular structure corresponding to the second 2D molecular structure based on binding between the set of candidate 3D molecular structures and the target molecule.
  • the computing device 100 determines a target structure of the ligand molecule for the target molecule based on the second 3D molecular structure.
  • editing the first 2D molecular structure comprises: using an operation prediction model and based on feature representations corresponding to the first 2D molecular structure, determining an editing operation to be applied to the first 2D molecular structure; and based on the determined The editing operation edits the first 2D molecular structure.
  • determining the editing operations to be applied to the first 2D molecular structure comprises: using an operation prediction model and based on the feature representation, determining a set of probabilities associated with a predetermined set of editing operations, wherein the set of predetermined editing operations The operation includes: adding a specific 2D structure fragment at a specific atom in the first 2D molecular structure, or deleting a specific bond in the first 2D molecular structure; and based on a set of probabilities, determining from a set of predetermined editing operations to be applied to Editing operations of the first 2D molecular structure.
  • adding a 2D structure fragment includes: selecting a target 2D structure fragment from a fragment library, the fragment library including a plurality of 2D structure fragments; and adding the target 2D structure fragment to a specific atom in the first 2D molecular structure.
  • determining a set of candidate 3D molecular structures corresponding to the second 2D molecular structure comprises: based on editing and using the first 3D molecular structure, determining a set of candidate 3D molecular structures, wherein the set of candidate structures has the same structure as the first 3D molecular structure.
  • the 3D molecular structure corresponds to the partial 3D structure
  • the partial 3D structure corresponds to the partial 2D structure not modified by the editing operation.
  • the editing is to add a target 2D structure segment to the first 2D molecular structure
  • determining a set of candidate 3D molecular structures comprises: determining a conformational constraint based on the first 3D molecular structure corresponding to the first 2D molecular structure;
  • a plurality of candidate 3D molecular structures corresponding to the editing are generated based on configurational constraints, the configurational constraints are used to limit the degree to which the first 3D molecular structure is adjusted during the process of generating the plurality of candidate 3D molecular structures; and based on the configurational constraints, Energy optimization is performed on a plurality of candidate 3D molecular structures to determine a set of candidate 3D molecular structures.
  • binding is determined based on the free energy of binding between a set of candidate 3D structural fragments and a target molecule.
  • determining the target structure of the ligand molecule for the target molecule comprises: determining a first estimate for the second 3D molecular structure, the first estimate indicating at least one of the following: the second 3D molecular structure is not related to the target molecule target binding, drug-like QED of the second 3D molecular structure, or synthesizable of the second 3D molecular structure; based on the first evaluation and the second evaluation for the first 3D molecular structure, determining the second 2D molecular structure a probability of being accepted; and according to the probability, determining the target structure based on the second 2D molecular structure and the second 3D molecular structure.
  • determining the target structure based on the second 2D molecular structure and the second 3D molecular structure comprises: in response to the first evaluation being superior to the second evaluation, training for predictive editing operations based on edits to the first 2D molecular structure an editing model; using the trained editing model, editing the second 2D molecular structure to determine a third 2D molecular structure; and based on the third 2D molecular structure and the second 2D molecular structure, determining a target structure of a ligand molecule for the target molecule .
  • determining the first estimate for the second 3D molecular structure comprises: based on the target binding, determining a first normalized value, the first normalized value decreases as the free energy of binding indicated by the target binding increases; Drug-likeness, determine a second normalized value, the second normalized value is increased based on the increase of drug-likeness; based on the synthesizable, determine the third normalized value, the third normalized value is decreased based on the synthetic difficulty indicated by the synthesizable small; and based on the first normalized value, the second normalized value, and the third normalized value, determining a first evaluation.
  • determining the first rating based on the first normalized value, the second normalized value, and the third normalized value includes: based on a first weight associated with the first normalized value, a second weight associated with the second normalized value A weight and a third weight associated with the third normalized value, the first evaluation is determined based on the first normalized value, the second normalized value and the third normalized value.
  • the first 2D molecular structure is generated by applying a first number of editing operations to the initial 2D molecular structure, and the probability is also based on the first number.
  • the first 2D molecular structure is generated by applying a first number of editing operations to the initial 2D molecular structure, and the ligand assignment for the target molecule is determined.
  • the sub-target structure includes: incrementing the first number to determine the second number; and determining the second 3D molecular structure as the target structure if the second number reaches a predetermined threshold.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • ASSP application specific standard product
  • SOC system on a chip
  • CPLD load programmable logic device
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种用于设计配体分子的方法(500)、装置、设备、存储介质和程序产品。该方法(500)包括:编辑第一2D分子结构,以确定第二2D分子结构(510),编辑至少包括:从第一2D分子结构中删除2D结构片段,或者向第一2D分子结构添加2D结构片段;基于与第一2D分子结构对应的第一3D分子结构和编辑,确定与第二2D分子结构对应的一组候选3D分子结构(520);基于一组候选3D分子结构与目标分子之间的结合性,确定与第二2D分子结构对应的第二3D分子结构(530);以及基于第二3D分子结构,确定针对目标分子的配体分子的目标结构(540)。该方法能够基于在先状态的3D分子结构来约束后续3D分子结构的生成,从而提高了设计配体分子的效率。

Description

设计配体分子的方法和装置
相关申请的交叉引用
本申请要求于2022年2月18日递交的,标题为“设计配体分子的方法和装置”、申请号为202210152512.4的中国发明专利申请的优先权,其全部公开通过引用并入本文。
技术领域
本公开的各实现方式涉及计算机领域,更具体地,涉及设计配体分子的方法、装置、设备和计算机存储介质。
背景技术
在药物发现中,一项重要的工作是寻找能够与目标分子(例如,靶向蛋白质分子)有效结合的药物小分子(也称为配体分子,Ligand)。近年来,随着计算机技术的发展,诸如机器学习技术等计算机辅助技术被逐渐被应用于药物分子发现的过程中。
在设计配体分子的过程中,通常需要考虑配体分子的三维(3D)结构与目标分子之间的可结合性。如何高效地构建3D分子结构是设计配体分子中一项重要的挑战。
发明内容
在本公开的第一方面,提供了一种用于设计配体分子的方法。该方法包括:编辑第一2D分子结构,以确定第二2D分子结构,编辑至少包括:从第一2D分子结构中删除2D结构片段,或者向第一2D分子结构添加2D结构片段;基于与第一2D分子结构对应的第一3D分子结构和编辑,确定与第二2D分子结构对应的一组候选3D分子结构;基于一组候选3D分子结构与目标分子之间的结合性,确定与 第二2D分子结构对应的第二3D分子结构;以及基于第二3D分子结构,确定针对目标分子的配体分子的目标结构。
在一些实施例中,编辑第一2D分子结构包括:利用操作预测模型并基于与第一2D分子结构对应的特征表示,确定待被应用于第一2D分子结构的编辑操作;以及基于所确定的编辑操作,编辑第一2D分子结构。
在一些实施例中,确定待被应用于第一2D分子结构的编辑操作包括:利用操作预测模型并基于与特征表示,确定与一组预定编辑操作相关联的一组概率,其中一组预定编辑操作包括:在第一2D分子结构中的特定原子处添加特定2D结构片段,或者删除第一2D分子结构中的特定键;以及基于一组概率,从一组预定编辑操作中确定待被应用于第一2D分子结构的编辑操作。
在一些实施例中,添加2D结构片段包括:从片段库中选择目标2D结构片段,片段库包括多个2D结构片段;以及将目标2D结构片段添加至第一2D分子结构中的特定原子处。
在一些实施例中,确定与第二2D分子结构对应的一组候选3D分子结构包括:基于编辑并利用第一3D分子结构,确定一组候选3D分子结构,其中一组候选结构具有与第一3D分子结构对应的部分3D结构,部分3D结构对应于编辑操作未修改的部分2D结构。
在一些实施例中,编辑为向第一2D分子结构添加目标2D结构片段,并且确定一组候选3D分子结构包括:基于与第一2D分子结构对应的第一3D分子结构,确定构型约束;基于构型约束,生成与编辑对应的多个候选3D分子结构,构型约束用于限制第一3D分子结构在生成多个候选3D分子结构的过程中被调整的程度;以及基于构型约束,对多个候选3D分子结构执行能量优化,以确定一组候选3D分子结构。
在一些实施例中,结合性基于一组候选3D结构片段与目标分子之间的结合自由能而被确定。
在一些实施例中,确定针对目标分子的配体分子的目标结构包 括:确定针对第二3D分子结构的第一评价,第一评价指示以下中的至少一项:第二3D分子结构与目标分子之间的目标结合性、第二3D分子结构的类药性QED、或者第二3D分子结构的可合成性;基于第一评价和针对第一3D分子结构的第二评价,确定第二2D分子结构被接受的概率;以及根据概率,基于第二2D分子结构和第二3D分子结构确定目标结构。
在一些实施例中,基于第二2D分子结构和第二3D分子结构确定目标结构包括:响应于第一评价优于第二评价,基于针对第一2D分子结构的编辑来训练用于预测编辑操作的编辑模型;利用经训练的编辑模型,编辑第二2D分子结构以确定第三2D分子结构;以及基于第三2D分子结构和第二3D分子结构,确定针对目标分子的配体分子的目标结构。
在一些实施例中,确定针对第二3D分子结构的第一评价包括:基于目标结合性,确定第一标准化值,第一标准化值随目标结合性指示的结合自由能增大而减小;基于类药性,确定第二标准化值,第二标准化值基于类药性的增大来增大;基于可合成性,确定第三标准化值,第三标准化值基于可合成性指示的合成难度增大而减小;以及基于第一标准化值、第二标准化值和第三标准化值,确定第一评价。
在一些实施例中,基于第一标准化值、第二标准化值和第三标准化值确定第一评价包括:基于与第一标准化值相关联的第一权重、与第二标准化值相关联的第二权重和与第三标准化值相关联的第三权重,根据第一标准化值、第二标准化值和第三标准化值确定第一评价。
在一些实施例中,第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且概率还基于第一数目。
在一些实施例中,第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且确定针对目标分子的配体分子的目标结构包括:递增第一数目以确定第二数目;以及如果第二数目达到预定阈值,将第二3D分子结构确定为目标结构。
在本公开的第二方面中,提供了一种用于设计配体分子的装置。 该装置包括:编辑模块,被配置为编辑第一2D分子结构,以确定第二2D分子结构,编辑至少包括:从第一2D分子结构中删除2D结构片段,或者向第一2D分子结构添加2D结构片段;以及生成模块,被配置为基于与第一2D分子结构对应的第一3D分子结构和编辑,确定与第二2D分子结构对应的一组候选3D分子结构;以及基于一组候选3D分子结构与目标分子之间的结合性,确定与第二2D分子结构对应的第二3D分子结构;其中编辑模块还被配置为:基于第二3D分子结构,确定针对目标分子的配体分子的目标结构。
在一些实施例中,编辑模块还被配置为:利用操作预测模型并基于与第一2D分子结构对应的特征表示,确定待被应用于第一2D分子结构的编辑操作;以及基于所确定的编辑操作,编辑第一2D分子结构。
在一些实施例中,编辑模块还被配置为:利用操作预测模型并基于与特征表示,确定与一组预定编辑操作相关联的一组概率,其中一组预定编辑操作包括:在第一2D分子结构中的特定原子处添加特定2D结构片段,或者删除第一2D分子结构中的特定键;以及基于一组概率,从一组预定编辑操作中确定待被应用于第一2D分子结构的编辑操作。
在一些实施例中,编辑模块还被配置为:从片段库中选择目标2D结构片段,片段库包括多个2D结构片段;以及将目标2D结构片段添加至第一2D分子结构中的特定原子处。
在一些实施例中,生成模块还被配置为:基于编辑并利用第一3D分子结构,确定一组候选3D分子结构,其中一组候选结构具有与第一3D分子结构对应的部分3D结构,部分3D结构对应于编辑操作未修改的部分2D结构。
在一些实施例中,编辑为向第一2D分子结构添加目标2D结构片段,并且生成模块还被配置为:基于与第一2D分子结构对应的第一3D分子结构,确定构型约束;基于构型约束,生成与编辑对应的多个候选3D分子结构,构型约束用于限制第一3D分子结构在生成 多个候选3D分子结构的过程中被调整的程度;以及基于构型约束,对多个候选3D分子结构执行能量优化,以确定一组候选3D分子结构。
在一些实施例中,结合性基于一组候选3D结构片段与目标分子之间的结合自由能而被确定。
在一些实施例中,编辑模块还被配置为:确定针对第二3D分子结构的第一评价,第一评价指示以下中的至少一项:第二3D分子结构与目标分子之间的目标结合性、第二3D分子结构的类药性QED、或者第二3D分子结构的可合成性;基于第一评价和针对第一3D分子结构的第二评价,确定第二2D分子结构被接受的概率;以及根据概率,基于第二2D分子结构和第二3D分子结构确定目标结构。
在一些实施例中,编辑模块还被配置为:响应于第一评价优于第二评价,基于针对第一2D分子结构的编辑来训练用于预测编辑操作的编辑模型;利用经训练的编辑模型,编辑第二2D分子结构以确定第三2D分子结构;以及基于第三2D分子结构和第二3D分子结构,确定针对目标分子的配体分子的目标结构。
在一些实施例中,生成模块还被配置为:基于目标结合性,确定第一标准化值,第一标准化值随目标结合性指示的结合自由能增大而减小;基于类药性,确定第二标准化值,第二标准化值基于类药性的增大来增大;基于可合成性,确定第三标准化值,第三标准化值基于可合成性指示的合成难度增大而减小;以及基于第一标准化值、第二标准化值和第三标准化值,确定第一评价。
在一些实施例中,生成模块还被配置为:基于与第一标准化值相关联的第一权重、与第二标准化值相关联的第二权重和与第三标准化值相关联的第三权重,根据第一标准化值、第二标准化值和第三标准化值确定第一评价。
在一些实施例中,第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且概率还基于第一数目。
在一些实施例中,第一2D分子结构是对初始的2D分子结构应 用了第一数目的编辑操作而被生成,并且编辑模块还被配置为:递增第一数目以确定第二数目;以及如果第二数目达到预定阈值,将第二3D分子结构确定为目标结构。
在本公开的第三方面,提供了一种电子设备,包括:存储器和处理器;其中存储器用于存储一条或多条计算机指令,其中一条或多条计算机指令被处理器执行以实现根据本公开的第一方面的方法。
在本公开的第四方面,提供了一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中一条或多条计算机指令被处理器执行实现根据本公开的第一方面的方法。
在本公开的第五方面,提供了一种计算机程序产品,其包括一条或多条计算机指令,其中一条或多条计算机指令被处理器执行实现根据本公开的第一方面的方法。
根据本公开的各种实施例,能够利用在先状态的3D分子结构来构建新的3D分子结构,以用于评估经编辑的3D分子结构(或其对应的2D分子结构)是否可以被接受,以用于确定最终配体分子的目标结构。基于这样的方式,本公开的实施例能够提高3D分子结构的构建效率,尤其能提高3D分子结构与目标分子之间结合构型的搜索,从而提高确定配体分子的效率。
附图说明
结合附图并参考以下详细说明,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。在附图中,相同或相似的附图标注表示相同或相似的元素,其中:
图1示出了能够实施本公开的一些实施例的计算设备的示意性框图;
图2示出了根据本公开的一些实施例的设计模块的示意性框图;
图3示出了根据本公开的一些实施例的构建3D分子结构的示意图;
图4示出了根据本公开的又一些实施例的构建3D分子结构的示 意图;以及
图5示出了根据本公开的一些实施例的用于设计配体分子的示例方法的流程图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。
如以上讨论的,随着计算机技术的发展,诸如机器学习技术等计算机辅助技术被逐渐被应用于药物分子发现的过程中。人们也越来越关注基于计算机辅助技术来进行药物分子发现的效率。
根据本公开的实现,提供了一种用于设计配体分子的方案。在该方案中,可以编辑第一2D分子结构,以确定第二2D分子结构,其中编辑至少包括:从第一2D分子结构中删除2D结构片段,或者向第一2D分子结构添加2D结构片段。进一步地,可以基于与第一2D分子结构对应的第一3D分子结构和编辑,确定与第二2D分子结构对应的一组候选3D分子结构,并基于一组候选3D分子结构与目标分子之间的结合性确定与第二2D分子结构对应的第二3D分子结构。进一步地,可以基于第二3D分子结构,确定针对目标分子的配体分子的目标结构。
本公开的各种实施例能够利用在先状态的3D分子结构来构建新 的3D分子结构,以用于评估其是否可以用于确定配体分子。基于这样的方式,本公开的实施例能够提高3D分子结构的构建效率,尤其能提高3D分子结构与目标分子之间结合构型的搜索,从而提高确定配体分子的效率。
以下参考附图来说明本公开的基本原理和若干示例实现。
示例设备
图1示出了可以用来实施本公开的实施例的示例设备100的示意性框图。应当理解,图1所示出的设备100仅仅是示例性的,而不应当构成对本公开所描述的实现的功能和范围的任何限制。如图1所示,设备100的组件可以包括但不限于一个或多个处理器或处理单元110、存储器120、存储设备130、一个或多个通信单元140、一个或多个输入设备150以及一个或多个输出设备160。
在一些实现中,设备100可以被实现为各种用户终端或服务终端。服务终端可以是各种服务提供方提供的服务器、大型计算设备等。用户终端诸如是任何类型的移动终端、固定终端或便携式终端,包括移动手机、多媒体计算机、多媒体平板、互联网节点、通信器、台式计算机、膝上型计算机、笔记本计算机、上网本计算机、平板计算机、个人通信系统(PCS)设备、个人导航设备、个人数字助理(PDA)、音频/视频播放器、数码相机/摄像机、定位设备、电视接收器、无线电广播接收器、电子书设备、游戏设备或者其任意组合,包括这些设备的配件和外设或者其任意组合。还可预见到的是,设备100能够支持任何类型的针对用户的接口(诸如“可佩戴”电路等)。
处理单元110可以是实际或虚拟处理器并且能够根据存储器120中存储的程序来执行各种处理。在多处理器系统中,多个处理单元并行执行计算机可执行指令,以提高设备100的并行处理能力。处理单元110也可以被称为中央处理单元(CPU)、微处理器、控制器、微控制器。
设备100通常包括多个计算机存储介质。这样的介质可以是设备 100可访问的任何可以获得的介质,包括但不限于易失性和非易失性介质、可拆卸和不可拆卸介质。存储器120可以是易失性存储器(例如寄存器、高速缓存、随机访问存储器(RAM))、非易失性存储器(例如,只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、闪存)或其某种组合。存储器120可以包括一个或多个设计模块125,这些程序模块被配置为执行本文所描述的各种实现的功能。设计模块125可以由处理单元110访问和运行,以实现相应功能。存储设备130可以是可拆卸或不可拆卸的介质,并且可以包括机器可读介质,其能够用于存储信息和/或数据并且可以在设备100内被访问。
设备100的组件的功能可以以单个计算集群或多个计算机器来实现,这些计算机器能够通过通信连接进行通信。因此,设备100可以使用与一个或多个其他服务器、个人计算机(PC)或者另一个一般网络节点的逻辑连接来在联网环境中进行操作。设备100还可以根据需要通过通信单元140与一个或多个外部设备(未示出)进行通信,外部设备诸如数据库145、其他存储设备、服务器、显示设备等,与一个或多个使得用户与设备100交互的设备进行通信,或者与使得设备100与一个或多个其他计算设备通信的任何设备(例如,网卡、调制解调器等)进行通信。这样的通信可以经由输入/输出(I/O)接口(未示出)来执行。
输入设备150可以是一个或多个各种输入设备,例如鼠标、键盘、追踪球、语音输入设备、相机等。输出设备160可以是一个或多个输出设备,例如显示器、扬声器、打印机等。
在一些实现中,设备100例如可以通过输入设备150接收与目标分子(例如,靶向蛋白质分子)对应的标识。例如,用户可以通过输入设备150输入PDB文件,以指示对应的目标分子。
在一些实现中,设计模块125可以利用编辑模型来迭代地编辑分子结构,以确定最终的配体分子170的目标结构。关于确定配体分子170的目标结构的过程将在下文详细介绍。
应当理解,虽然图1中输出的配体分子170被示出为2D分子结 构。在一些实施例中,输出设备160例如可以输出3D分子结构。
配体分子设计
首先参考图2,图2示出了根据本公开的一些实施例的设计模块125的框图。如图2所示,设计模块125包括用于实现根据本公开的一些实施例的示例设计配体分子的过程的多个模块。如图2所示,设计模块125包括编辑模块230和生成模块240。
在一些实施例中,编辑模块230可以编辑第一2D分子结构220。具体地,编辑可以包括从第一2D分子结构220中删除一个2D结构分段,这样的编辑也被称为“删除编辑操作”。备选地,编辑也可以包括向第一2D分子结构220添加一个新的2D结构分段,这样的编辑也被称为“添加编辑操作”。
对于“删除编辑操作”,编辑模块230可以确定第一2D分子结构220中待被删除的键,并相应地从第一分子结构中删除与该待被删除的键相关联的2D结构片段。示例性,编辑模块230可以从第一2D分子结构220中删除与待删除的键相关联的基团。
对于“添加编辑操作”,编辑模块230可以确定第一2D分子结构220中待编辑的原子,并相应地从片段库240中选择一个2D结构片段以附加到第一2D分子结构220。在“添加编辑操作”过程中,第一2D分子结构220中待编辑的原子可以同所选择的2D片段添加新的键,以构建新的分子结构。
在一些实施例中,片段库240可以包括多个2D结构片段250。在一些实施例中,多个2D结构片段250可以是例如基于实验知识所确定的。备选地,多个2D结构片段250也可以是根据已有的药物分子而被构建的。
在一些实施例中,第一2D分子结构220例如可以是由初始的2D分子结构210(例如,图2中所示的乙烷分子C2H6)经过至少一次如上文所讨论的编辑过程而获得的。备选地,第一2D分子结构220也可以是初始的2D分子结构。相应地,作为初始的2D分子结构,其 例如可以由编辑模块230随机地选择,或者由编辑模块230根据输入而确定。
如图2所示,编辑模块230可以利用所部署的编辑模型来编辑第一2D分子结构220以获得第二2D分子结构260。编辑模型例如可以是基于机器学习模型而被实现。关于编辑模块230和编辑模型的具体细节将在下文详细描述。
如图2所示,设计模块125还可以包括生成模块270。在一些实施例中,生成模块270可以用于确定与第二2D分子结构260所对应的3D分子结构。
在一些实施例中,生成模块270例如可以基于与第一2D分子结构220所对应的第一3D分子结构280以及编辑模块230对第一2D分子结构220所执行的编辑操作,来高效地构建与第二2D分子结构260所对应的第二3D分子结构290。关于构建第二3D分子结构290的详细过程将在下文结合图3和图4描述。
在一些实施例中,编辑模块230和/或生产模块270还可以确定针对第二3D分子结构290的评价(为了方便描述,也称为第一评价)。例如,编辑模块230可以基于第二3D分子结构290与目标分子170之间的结合性来确定第一评价。附加地,生成模块270还可以基于诸如类药性QED和/或可合成性来确定第一评价。
进一步地,编辑模块230可以进一步就与第二3D分子结构290的第一评价与针对第一3D分子结构280的第二评价来确定第二2D分子结构260是否可以被接受。如果第二2D分子结构260被确定可以接受,则其例如可以被确定为马尔科夫链的下一状态,以迭代地确定最终的配体分子的目标结构170。
相反,如果基于第一评价和第二评价,确定第二2D分子结构260被拒绝,则编辑模块230可以放弃第二2D分子结构,并继续以第一2D分子结构220作为基础,来确定新的编辑,从而迭代地确定最终的配体分子的目标结构170。
应当理解,编辑模块230可以基于类似的过程来确定关于第一3D 分子结构280的第二评价。在一些实施例中,如果第一评价优于第二评价,则编辑模块230可以进一步基于针对第一2D分子结构220所执行的编辑操作来训练编辑模块230中所部署的编辑模型。
在一些实施例中,编辑模块230可以利用经训练的编辑模型并基于第二2D分子结构260来迭代地执行编辑,直至确定针对目标分子的配体分子的目标结构170。
在一些实施例中,编辑模块230例如可以在对初始的2D分子结构210执行了预定次数的编辑后便终止迭代,并将最终输出的2D分子结构确定作为配体分子的目标结构170。备选地,编辑模块230也可以将最终的2D分子结构对应的3D分子结构确定作为配体分子的目标结构170。
在一些实施例中,编辑模块230也可以基于每次迭代编辑后的分子结构的评价的变化程度来确定是否收敛。例如,如果预定次数迭代后评价的变化小于预定阈值,则编辑模块230可以确定已经收敛,并将最终输出的分子结构确定作为配体分子的目标结构。
关于自监督训练的详细过程将在下文详细介绍。
分子结构编辑
如参考图2所讨论的,编辑模块230被配置为利用所部署的编辑模型来编辑第一2D分子结构220。在一些实施例中,编辑模型例如可以基于适当的机器学习模型来被实现。
具体地,编辑模块230首先可以确定第一2D分子结构220的特征表示。在一些实施例中,第一2D分子结构220可以表示为图x,其例如可以具有n个原子以及n个键。在一些实施例中,编辑模块230可以将第一2D分子结构220表示为:

其中,a表示第一2D分子结构220中原子的索引,是该原子对应的隐藏层特征表示;w和v表示由第一2D分子结构220中的键b 所连接的原子,该键所对应的隐藏层特征表示为表示模型参数为θ的MPNN(Message Passing Neural Network,消息传递神经网络)。
进一步地,编辑模块230可以利用操作预测模型并基于与根据公式(1)和/或(2)所确定的特征表示,来确定与一组预定编辑操作相关联的一组概率。这样的预定编辑操作例如包括:在第一2D分子结构220中的特定原子处添加特定2D结构片段,或者删除第一2D分子结构220中的特定键。
这样的过程例如可以表示为:


其中,其表示独立的多层感知器(MLP,multi-layer perceptron),σ(·)表示Softmax运算。
进一步地,编辑模块230可以基于以下公式来确定与不同预定编辑操作所对应的概率:

q(x′(u,k)|x)=pc(add|x)·padd(u|x)·pfrag(k|x,u)    (7)
q(x′(b)|x)=pc(del|x)·pdel(b|x)     (8)
其中,x′(u,k)表示将片段库240中的第k个2D结构片段添加到原子u所得到的分子;x′(b)表示从第一2D分子结构220中删除键b以及附接的片段后所得到的分子。
进一步地,编辑模块230可以基于所确定的该组概率,从一组预定编辑操作中确定待被应用于第一2D分子结构220的编辑操作。示例性地,编辑模块230可以基于所确定的该组概率,来采样确定被应用的编辑操作。
3D分子结构生成
在一些实施例中,如上文参考图2所讨论的,生成模块270可以基于与第一2D分子结构220所对应的第一3D分子结构280来构建针对第二2D分子结构260的第二3D分子结构290。
在一些实施例中,生成模块270可以基于应用于第一2D分子结构220的编辑,并利用第一3D分子结构280来确定一组候选3D分子结构,其中该组候选3D分子结构具有与第一3D分子结构280对应的部分3D结构,该部分3D结构对应于编辑操作未修改的部分2D结构。
以此方式,生成模块270可以基于第一3D分子结构280来进行有约束的3D分子结构构造,从而更高效地确定第二3D分子结构290。
图3示出了根据本公开的一些实施例的构建3D分子结构的示意图300。如图3所示,对于添加目标2D结构片段的添加编辑操作,与传统的生成过程不同,生成模块270可以在生成过程中考虑第一3D分子结构,也即,引入与第一3D分子结构所对应的构型约束。
具体地,生成模块270可以基于第一3D分子结构来确定构型约束,该构型约束用于限制第一3D分子结构在后续生成过程中被调整的程度。示例性地,生成模块270可以基于第一3D分子结构(例如,图3中3D分子结构330,其对应于2D分子结构310)中确定与原子间距离有关的约束。
进一步地,生成模块270可以基于该构型约束,生成多个候选3D分子结构。示例性地,生成模块270例如可以利用适当的构型生成工具来在构型约束的前提下生成多个候选3D分子结构。
附加地,生成模块270可以进一步基于构型约束对多个候选3D分子结构执行能量优化,从而确定一组候选3D分子结构(例如,图3中候选3D分子结构340)。
进一步地,生成模块270还可以基于该组候选3D分子结构与目标分子之间的结合性,来确定与第二2D分子结构260对应的第二3D 分子结构290。具体地,生成模块270可以确定该组候选3D分子结构中与目标分子具有最小结合自由能的目标3D分子结构,并将其作为与第二2D分子结构(例如,图3中的2D分子结构320,其通过对2D分子结构310执行添加编辑操作所确定)对应的第二3D分子结构(例如,图3中的3D分子结构350)。
图4示出了根据本公开的又一些实施例的构建3D分子结构的示意图。如图4所示,对于删除目标2D结构片段的删除编辑操作,生成模块270可以保留第一3D分子结构(例如,图4中的3D分子结构430,其对应于2D分子结构410)中未被删除编辑操作所删除的部分。
进一步地,生成模型270可以将所保留的部分3D分子结构进行释放,并执行局部能量优化,以确定候选3D分子结构(例如,图4中的3D分子将结构440)。
进一步地,生成模块270还可以基于该候选3D分子结构与目标分子之间的结合性,来确定与第二2D分子结构260对应的第二3D分子结构290。具体地,生成模块270可以通过使与目标分子之间的结合自由能最小化,以基于候选3D分子结构来确定目标3D分子结构,并将其作为与第二2D分子结构(例如,图4中的2D分子结构420,其通过对2D分子结构410执行删除编辑操作所确定)对应的第二3D分子结构(例如,图4中的3D分子结构450)。
通过有约束的3D分子结构构建过程,本公开的实施例可以大大降低构建3D分子结构所需要的计算开销,从而提高了构建3D分子结构的效率。此外,在考虑与目标分子结合能最小化的过程中,基于有约束的3D分子结构构建过程能够大大地提高搜索最小结合能的计算效率。
自监督训练
在一些实施例中,如上文参考图2所讨论的,编辑模块230还可以基于应用于第一2D分子结构220的编辑操作来自监督地训练编辑 模型。
如上文所讨论的,应用于第一2D分子结构220的编辑操作是基于概率采样确定的。在一些实施例中,设计模块125例如可以并行地执行多次采样,以基于第一2D分子结构220获得多个候选2D分子结构。
在一些实施例中,编辑模块230可以确定针对每个候选2D分子结构的评价。如上文所讨论的,该评价例如可以基于:候选2D分子结构所对应的3D分子结构与目标分子之间的结合性、该3D分子结构的类药性QED(Quantitative Estimate of Drug-likeness)和/或该3D分子结构的可合成性。
以此方式,本公开的实施例可以同时实现多目标的配体分子生成。
在一些实施例中,编辑模块230可以将结合性、类药性和可合成性进行标准化。对于结合性,编辑模块230可以确定该分子结构与目标分子之间的结合自由能D(x)。示例性地,其可以由分子对接(molecular docking)软件所生成。进一步地,编辑模块230可以基于该结合性,确定第一标准化值,权重第一标准化值随目标结合性指示的结合自由能增大而减小。示例性地,第一标准化值可以表示为:
sD(x)=e-D(x)      (9)
对于类药性,编辑模块230可以确定第二标准化值,第二标准化值基于类药性的增大来增大。示例性地,第二标准化值可以表示为:
sQED(x)=QED(x)       (10)
其中,QED(·)表示QED得分,例如可以通过由RDKit来进行计算。
对于可合成性,编辑模块230可以确定第三标准化值,第三标准化值基于可合成性指示的合成难度增大而减小。示例性地,第三标准化值可以表示为:
sSA(x)=(10-SA(x))/9     (11)
其中,sSA(x)表示可合成难度得分。
进一步地,编辑模块230可以基于第一标准化值、第二标准化值和第三标准化值,确定第一评价。在一些实施例,编辑模块230可以基于与第一标准化值相关联的第一权重、与第二标准化值相关联的第二权重和与第三标准化值相关联的第三权重,根据第一标准化值、第二标准化值和第三标准化值确定第一评价。
示例性地,第一评价可以表示为:
其中,w1、w2和w3分别表示与类药性对应的权重、与可合成性对应的权重和与结合性对应的权重。
在一些实施例中,编辑模块230可以基于第一评价和针对第一2D分子结构220的第二评价来确定第二2D分子结构260被接受的概率。该概率例如可以表示为:
其中,πα(x′)表示针对第二2D分子结构260的第一评价,πα(x)表示针对第一2D分子结构220的第二评价,其中T表示温度系数,其基于退火机制而被确定。在一些实施例中,温度系数T基于第一2D分子结构所经历的编辑操作的数目而被确定。示例性地,如果第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,则温度系数T与该第一数目相关联。
在一些实施例中,设计模块125可以基于公式(13)来确定第二2D分子结构260是被接受还是被拒绝的概率。如参考图2所讨论的,如果第二2D分子结构260被接受,则涉及模块125可以进一步基于第二2D分子结构260进行迭代地编辑,以确定配体分子的目标结构170。相反,如果第二2D分子结构被拒绝,则设计模块125可以进一步基于第一2D分子结构220进行迭代地编辑,以用于确定配体分子的目标结构170。
基于这样的方式,一些导致评价降低的编辑操作也可以被随机地保留,从而提高了药物分子生成的多样性。
在一些实施例中,对于评价优于第一2D分子结构220的候选2D分子结构,编辑模块230可以进一步基于与生成候选2D分子结构所对应的编辑操作来训练编辑模型。在一些实施例中,训练编辑模型可以基于最大似然估计(MLE)。
在一些实施例中,编辑模块230例如可以在对初始的2D分子结构210执行了预定次数的编辑后便终止迭代,并将最终输出的2D分子结构确定作为配体分子的目标结构170。
如果还未执行预定次数的编辑,则编辑模块230可以利用经重新训练的编辑模型来基于第二2D分子结构生成新的第三2D分子结构,并由此迭代执行。在迭代过程中,编辑模块230可以递增已经被编辑的次数,直至编辑了预定次数才退出迭代。
相反,生成第二2D分子结构260已经执行了预定次数的编辑(例如,该数目达到预定阈值),则编辑模块230可以将第二3D分子结构290和/或第二2D分子结构260确定为目标结构。
在一些实施例中,编辑模块230也可以基于每次迭代编辑后的分子结构的评价的变化程度来确定是否收敛。例如,如果预定次数迭代后评价的变化小于预定阈值,则编辑模块230可以确定已经收敛,并将最终输出的分子结构确定作为配体分子的目标结构。
示例过程
图5示出了根据本公开一些实现的用于设计配体分子的方法500的流程图。方法500可以由计算设备100来实现,例如可以被实现在计算设备100的存储器120中的设计模块125处。
如图5,在框510,计算设备100编辑第一2D分子结构,以确定第二2D分子结构,编辑至少包括:从第一2D分子结构中删除2D结构片段,或者向第一2D分子结构添加2D结构片段。
在框520,计算设备100基于与第一2D分子结构对应的第一3D分子结构和编辑,确定与第二2D分子结构对应的一组候选3D分子结构。
在框530,计算设备100基于一组候选3D分子结构与目标分子之间的结合性,确定与第二2D分子结构对应的第二3D分子结构。
在框540,计算设备100基于第二3D分子结构,确定针对目标分子的配体分子的目标结构。
以下列出了本公开的一些示例实现方式。
在一些实施例中,编辑第一2D分子结构包括:利用操作预测模型并基于与第一2D分子结构对应的特征表示,确定待被应用于第一2D分子结构的编辑操作;以及基于所确定的编辑操作,编辑第一2D分子结构。
在一些实施例中,确定待被应用于第一2D分子结构的编辑操作包括:利用操作预测模型并基于与特征表示,确定与一组预定编辑操作相关联的一组概率,其中一组预定编辑操作包括:在第一2D分子结构中的特定原子处添加特定2D结构片段,或者删除第一2D分子结构中的特定键;以及基于一组概率,从一组预定编辑操作中确定待被应用于第一2D分子结构的编辑操作。
在一些实施例中,添加2D结构片段包括:从片段库中选择目标2D结构片段,片段库包括多个2D结构片段;以及将目标2D结构片段添加至第一2D分子结构中的特定原子处。
在一些实施例中,确定与第二2D分子结构对应的一组候选3D分子结构包括:基于编辑并利用第一3D分子结构,确定一组候选3D分子结构,其中一组候选结构具有与第一3D分子结构对应的部分3D结构,部分3D结构对应于编辑操作未修改的部分2D结构。
在一些实施例中,编辑为向第一2D分子结构添加目标2D结构片段,并且确定一组候选3D分子结构包括:基于与第一2D分子结构对应的第一3D分子结构,确定构型约束;基于构型约束,生成与编辑对应的多个候选3D分子结构,构型约束用于限制第一3D分子结构在生成多个候选3D分子结构的过程中被调整的程度;以及基于构型约束,对多个候选3D分子结构执行能量优化,以确定一组候选3D分子结构。
在一些实施例中,结合性基于一组候选3D结构片段与目标分子之间的结合自由能而被确定。
在一些实施例中,确定针对目标分子的配体分子的目标结构包括:确定针对第二3D分子结构的第一评价,第一评价指示以下中的至少一项:第二3D分子结构与目标分子之间的目标结合性、第二3D分子结构的类药性QED、或者第二3D分子结构的可合成性;基于第一评价和针对第一3D分子结构的第二评价,确定第二2D分子结构被接受的概率;以及根据概率,基于第二2D分子结构和第二3D分子结构确定目标结构。
在一些实施例中,基于第二2D分子结构和第二3D分子结构确定目标结构包括:响应于第一评价优于第二评价,基于针对第一2D分子结构的编辑来训练用于预测编辑操作的编辑模型;利用经训练的编辑模型,编辑第二2D分子结构以确定第三2D分子结构;以及基于第三2D分子结构和第二2D分子结构,确定针对目标分子的配体分子的目标结构。
在一些实施例中,确定针对第二3D分子结构的第一评价包括:基于目标结合性,确定第一标准化值,第一标准化值随目标结合性指示的结合自由能增大而减小;基于类药性,确定第二标准化值,第二标准化值基于类药性的增大来增大;基于可合成性,确定第三标准化值,第三标准化值基于可合成性指示的合成难度增大而减小;以及基于第一标准化值、第二标准化值和第三标准化值,确定第一评价。
在一些实施例中,基于第一标准化值、第二标准化值和第三标准化值确定第一评价包括:基于与第一标准化值相关联的第一权重、与第二标准化值相关联的第二权重和与第三标准化值相关联的第三权重,根据第一标准化值、第二标准化值和第三标准化值确定第一评价。
在一些实施例中,第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且概率还基于第一数目。
在一些实施例中,第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且确定针对目标分子的配体分 子的目标结构包括:递增第一数目以确定第二数目;以及如果第二数目达到预定阈值,将第二3D分子结构确定为目标结构。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)等等。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
此外,虽然采用特定次序描绘了各操作,但是这应当理解为要求这样操作以所示出的特定次序或以顺序次序执行,或者要求所有图示的操作应被执行以取得期望的结果。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实现 的上下文中描述的某些特征还可以组合地实现在单个实现中。相反地,在单个实现的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实现中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (17)

  1. 一种用于设计配体分子的方法,包括:
    编辑第一2D分子结构,以确定第二2D分子结构,所述编辑至少包括:从所述第一2D分子结构中删除2D结构片段,或者向所述第一2D分子结构添加2D结构片段;
    基于与所述第一2D分子结构对应的第一3D分子结构和所述编辑,确定与所述第二2D分子结构对应的一组候选3D分子结构;
    基于所述一组候选3D分子结构与目标分子之间的结合性,确定与所述第二2D分子结构对应的第二3D分子结构;以及
    基于所述第二3D分子结构,确定针对目标分子的配体分子的目标结构。
  2. 根据权利要求1所述的方法,其中编辑第一2D分子结构包括:
    利用操作预测模型并基于与所述第一2D分子结构对应的特征表示,确定待被应用于所述第一2D分子结构的编辑操作;以及
    基于所确定的所述编辑操作,编辑所述第一2D分子结构。
  3. 根据权利要求2所述的方法,其中确定待被应用于所述第一2D分子结构的编辑操作包括:
    利用所述操作预测模型并基于与所述特征表示,确定与一组预定编辑操作相关联的一组概率,其中所述一组预定编辑操作包括:在所述第一2D分子结构中的特定原子处添加特定2D结构片段,或者删除所述第一2D分子结构中的特定键;以及
    基于所述一组概率,从所述一组预定编辑操作中确定待被应用于所述第一2D分子结构的所述编辑操作。
  4. 根据权利要求1所述的方法,其中添加2D结构片段包括:
    从片段库中选择目标2D结构片段,所述片段库包括多个2D结构片段;以及
    将所述目标2D结构片段添加至所述第一2D分子结构中的特定 原子处。
  5. 根据权利要求1所述的方法,其中确定与所述第二2D分子结构对应的一组候选3D分子结构包括:
    基于所述编辑并利用所述第一3D分子结构,确定所述一组候选3D分子结构,其中所述一组候选结构具有与所述第一3D分子结构对应的部分3D结构,所述部分3D结构对应于所述编辑操作未修改的部分2D结构。
  6. 根据权利要求5所述的方法,其中所述编辑为向所述第一2D分子结构添加目标2D结构片段,并且确定所述一组候选3D分子结构包括:
    基于与所述第一2D分子结构对应的所述第一3D分子结构,确定构型约束;
    基于所述构型约束,生成与所述编辑对应的多个候选3D分子结构,所述构型约束用于限制所述第一3D分子结构在生成所述多个候选3D分子结构的过程中被调整的程度;以及
    基于所述构型约束,对所述多个候选3D分子结构执行能量优化,以确定所述一组候选3D分子结构。
  7. 根据权利要求1所述的方法,其中所述结合性基于所述一组候选3D结构片段与所述目标分子之间的结合自由能而被确定。
  8. 根据权利要求1所述的方法,其中确定针对目标分子的配体分子的目标结构包括:
    确定针对所述第二3D分子结构的第一评价,所述第一评价指示以下中的至少一项:所述第二3D分子结构与所述目标分子之间的目标结合性、所述第二3D分子结构的类药性QED、或者所述第二3D分子结构的可合成性;
    基于所述第一评价和针对所述第一3D分子结构的第二评价,确定所述第二2D分子结构被接受的概率;以及
    根据所述概率,基于所述第二2D分子结构和所述第二3D分子结构确定所述目标结构。
  9. 根据权利要求8所述的方法,其中基于所述第二2D分子结构和所述第二3D分子结构确定所述目标结构包括:
    响应于所述第一评价优于所述第二评价,基于针对所述第一2D分子结构的所述编辑来训练用于预测编辑操作的编辑模型;
    利用经训练的所述编辑模型,编辑所述第二2D分子结构以确定第三2D分子结构;以及
    基于所述第三2D分子结构和所述第二2D分子结构,确定针对目标分子的所述配体分子的所述目标结构。
  10. 根据权利要求8所述的方法,其中确定针对所述第二3D分子结构的第一评价包括:
    基于所述目标结合性,确定第一标准化值,所述第一标准化值随所述目标结合性指示的结合自由能增大而减小;
    基于所述类药性,确定第二标准化值,所述第二标准化值基于所述类药性的增大来增大;
    基于所述可合成性,确定第三标准化值,所述第三标准化值基于所述可合成性指示的合成难度增大而减小;以及
    基于所述第一标准化值、所述第二标准化值和所述第三标准化值,确定所述第一评价。
  11. 根据权利要求10所述的方法,其中基于所述第一标准化值、所述第二标准化值和所述第三标准化值确定所述第一评价包括:
    基于与所述第一标准化值相关联的第一权重、与所述第二标准化值相关联的第二权重和与所述第三标准化值相关联的第三权重,根据所述第一标准化值、所述第二标准化值和所述第三标准化值确定所述第一评价。
  12. 根据权利要求8所述的方法,其中所述第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且所述概率还基于所述第一数目。
  13. 根据权利要求1所述的方法,其中所述第一2D分子结构是对初始的2D分子结构应用了第一数目的编辑操作而被生成,并且确 定针对目标分子的配体分子的目标结构包括:
    递增所述第一数目以确定第二数目;以及
    如果第二数目达到预定阈值,将所述第二3D分子结构确定为所述目标结构。
  14. 一种用于设计配体分子的装置,包括:
    编辑模块,被配置为编辑第一2D分子结构,以确定第二2D分子结构,所述编辑至少包括:从所述第一2D分子结构中删除2D结构片段,或者向所述第一2D分子结构添加2D结构片段;以及
    生成模块,被配置为基于与所述第一2D分子结构对应的第一3D分子结构和所述编辑,确定与所述第二2D分子结构对应的一组候选3D分子结构;以及基于所述一组候选3D分子结构与目标分子之间的结合性,确定与所述第二2D分子结构对应的第二3D分子结构;
    其中所述编辑模块还被配置为:基于所述第二3D分子结构,确定针对目标分子的配体分子的目标结构。
  15. 一种电子设备,包括:
    存储器和处理器;
    其中所述存储器用于存储一条或多条计算机指令,其中所述一条或多条计算机指令被所述处理器执行以实现根据权利要求1至13中任一项所述的方法。
  16. 一种计算机可读存储介质,其上存储有一条或多条计算机指令,其中所述一条或多条计算机指令被处理器执行以实现根据权利要求1至13中任一项所述的方法。
  17. 一种计算机程序产品,包括一条或多条计算机指令,其中所述一条或多条计算机指令被处理器执行以实现根据权利要求1至13中任一项所述的方法。
PCT/CN2023/075067 2022-02-18 2023-02-08 设计配体分子的方法和装置 WO2023155724A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210152512.4A CN114530215B (zh) 2022-02-18 2022-02-18 设计配体分子的方法和装置
CN202210152512.4 2022-02-18

Publications (1)

Publication Number Publication Date
WO2023155724A1 true WO2023155724A1 (zh) 2023-08-24

Family

ID=81622009

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/075067 WO2023155724A1 (zh) 2022-02-18 2023-02-08 设计配体分子的方法和装置

Country Status (2)

Country Link
CN (1) CN114530215B (zh)
WO (1) WO2023155724A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114530215B (zh) * 2022-02-18 2023-03-28 北京有竹居网络技术有限公司 设计配体分子的方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105308602A (zh) * 2013-06-13 2016-02-03 Ucb生物制药私人有限公司 获得改善的治疗性配体
US20190073452A1 (en) * 2015-09-25 2019-03-07 Bioanalytix, Inc. Method for determining the in vivo comparability of a biologic drug and a reference drug
CN113096723A (zh) * 2021-03-24 2021-07-09 北京晶派科技有限公司 小分子药物筛选通用分子库构建平台
CN113611376A (zh) * 2021-07-01 2021-11-05 苏州创腾软件有限公司 分子结构的构建方法、装置、计算机设备和存储介质
CN113838541A (zh) * 2021-09-29 2021-12-24 脸萌有限公司 设计配体分子的方法和装置
CN114530215A (zh) * 2022-02-18 2022-05-24 北京有竹居网络技术有限公司 设计配体分子的方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0028157D0 (en) * 2000-11-17 2001-01-03 Amedis Pharm Ltd Method for predicting a biological target characteristic of a molecule
CN107657146B (zh) * 2017-09-20 2020-05-05 广州市爱菩新医药科技有限公司 基于三维子结构的药物分子比较方法
CN108536999A (zh) * 2018-03-21 2018-09-14 南京邮电大学 一种配体小分子关键子结构筛选方法及装置
US11615867B2 (en) * 2018-11-15 2023-03-28 Openeye Scientific Software, Inc. Molecular structure editor with version control and simultaneous editing operations
CN112201313B (zh) * 2020-09-15 2024-02-23 北京晶泰科技有限公司 一种自动化的小分子药物筛选方法和计算设备
CN117373563A (zh) * 2021-01-21 2024-01-09 北京晶泰科技有限公司 一种分子筛选方法和计算设备
CN113241126B (zh) * 2021-05-18 2023-08-11 百度时代网络技术(北京)有限公司 用于训练确定分子结合力的预测模型的方法和装置
CN113409898B (zh) * 2021-06-30 2022-05-27 北京百度网讯科技有限公司 分子结构获取方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105308602A (zh) * 2013-06-13 2016-02-03 Ucb生物制药私人有限公司 获得改善的治疗性配体
US20190073452A1 (en) * 2015-09-25 2019-03-07 Bioanalytix, Inc. Method for determining the in vivo comparability of a biologic drug and a reference drug
CN113096723A (zh) * 2021-03-24 2021-07-09 北京晶派科技有限公司 小分子药物筛选通用分子库构建平台
CN113611376A (zh) * 2021-07-01 2021-11-05 苏州创腾软件有限公司 分子结构的构建方法、装置、计算机设备和存储介质
CN113838541A (zh) * 2021-09-29 2021-12-24 脸萌有限公司 设计配体分子的方法和装置
CN114530215A (zh) * 2022-02-18 2022-05-24 北京有竹居网络技术有限公司 设计配体分子的方法和装置

Also Published As

Publication number Publication date
CN114530215B (zh) 2023-03-28
CN114530215A (zh) 2022-05-24

Similar Documents

Publication Publication Date Title
Yang et al. Disentangled representation learning for multimodal emotion recognition
US20190179858A1 (en) Fast Indexing with Graphs and Compact Regression Codes on Online Social Networks
Wu et al. DeepDist: real-value inter-residue distance prediction with deep residual convolutional network
CN112639831A (zh) 互信息对抗自动编码器
JP5006929B2 (ja) 高速音声検索の方法および装置
Ma et al. MRFalign: protein homology detection through alignment of Markov random fields
US20120290293A1 (en) Exploiting Query Click Logs for Domain Detection in Spoken Language Understanding
Chen et al. Locating landmarks on high-dimensional free energy surfaces
Chen et al. An improved deep forest model for predicting self-interacting proteins from protein sequence using wavelet transformation
WO2023155724A1 (zh) 设计配体分子的方法和装置
Singh et al. Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment
WO2012158572A2 (en) Exploiting query click logs for domain detection in spoken language understanding
WO2023109436A1 (zh) 词性感知嵌套命名实体识别方法、系统、设备和存储介质
CN113838541B (zh) 设计配体分子的方法和装置
CN115964029A (zh) 用于文本到代码变换的双贝叶斯编码-解码技术
Zhang et al. A novel liver cancer diagnosis method based on patient similarity network and DenseGCN
Wan et al. A new weakly supervised discrete discriminant hashing for robust data representation
Dimitsaki et al. Benchmarking of Machine Learning classifiers on plasma proteomic for COVID-19 severity prediction through interpretable artificial intelligence
CN113110843B (zh) 合约生成模型训练方法、合约生成方法及电子设备
Jiang et al. Identification of all-against-all protein–protein interactions based on deep hash learning
Sui et al. Similarity-based active learning methods
Liang et al. Nonlinear sufficient dimension reduction with a stochastic neural network
Vincent et al. A simplified approach to disulfide connectivity prediction from protein sequences
Long et al. Hierarchical region learning for nested named entity recognition
Song et al. L2rs: A learning-to-rescore mechanism for hybrid speech recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23755727

Country of ref document: EP

Kind code of ref document: A1