CN113223637A - Drug molecule generator training method based on domain knowledge and deep reinforcement learning - Google Patents

Drug molecule generator training method based on domain knowledge and deep reinforcement learning Download PDF

Info

Publication number
CN113223637A
CN113223637A CN202110496113.5A CN202110496113A CN113223637A CN 113223637 A CN113223637 A CN 113223637A CN 202110496113 A CN202110496113 A CN 202110496113A CN 113223637 A CN113223637 A CN 113223637A
Authority
CN
China
Prior art keywords
pharmacophore
molecules
domain knowledge
drug
drug molecule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110496113.5A
Other languages
Chinese (zh)
Other versions
CN113223637B (en
Inventor
黄向生
蔡金儒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110496113.5A priority Critical patent/CN113223637B/en
Publication of CN113223637A publication Critical patent/CN113223637A/en
Application granted granted Critical
Publication of CN113223637B publication Critical patent/CN113223637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Abstract

The invention belongs to the field of drug molecule generation, and particularly relates to a drug molecule generator training method based on domain knowledge and deep reinforcement learning, aiming at solving the problem that samples are limited when drug molecules are generated through deep learning. The invention comprises constructing an active pharmacophore molecular group by using domain knowledge; by utilizing curriculum-based learning under a dynamic strategy, randomly inserting the pharmacophore in the active pharmacophore molecular group into the generation process of molecules to generate pharmacophore structural molecules with specific targets; and (4) maximizing a mixed reward function by utilizing a reinforcement learning method to obtain the trained medicine molecule generator. The invention solves the problem of undersize sample size, prevents the problem of single molecule caused by excessive demonstration learning, and can enable the generated molecules to be converged to the pharmacophore structure with a specific target more quickly.

Description

Drug molecule generator training method based on domain knowledge and deep reinforcement learning
Technical Field
The invention belongs to the field of medicine molecule generation, and particularly relates to a medicine molecule generator training method based on domain knowledge and deep reinforcement learning.
Background
In recent years, deep learning methods have been widely used in computational chemistry, for example, many studies have demonstrated the feasibility of designing new compounds with desirable chemical properties by deep learning methods. While deep learning methods have good performance in generating compounds of the target chemistry, large numbers of samples are typically required in practical procedures. However, obtaining large numbers of samples is difficult, resulting in a small number of drugs that have been discovered. This presents a challenge to the general deep learning approach. How to complete drug design under limited sample conditions is an important issue. Therefore, a reinforcement learning method based on domain knowledge is provided by combining prior knowledge. The method integrates domain knowledge into a training process on the basis of an original method, and uses various generalized same characteristics for course learning for the first time.
Disclosure of Invention
In order to solve the above problems in the prior art, i.e. to solve the problem of limited samples in the process of drug molecule generation through deep learning, the invention provides a drug molecule generator training method based on domain knowledge and deep reinforcement learning, which comprises the steps of
Based on the drug molecule generator, by utilizing curriculum-based learning under a dynamic strategy, the pharmacophore in the active pharmacophore molecular group is randomly inserted into the generation process of the molecule to generate a pharmacophore structural molecule with a specific target;
calculating the reward value of the acquired pharmacophore structure molecules with the specific target by utilizing a pre-constructed mixed reward function; the hybrid reward function is constructed based on a task reward function and a simulation reward function;
a reinforcement learning method is utilized, a hybrid reward function is maximized, and a trained drug molecule generator is obtained;
wherein the content of the first and second substances,
the active pharmacophore molecular group is constructed on the basis of a given drug sample by utilizing domain knowledge;
the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES.
In some preferred embodiments, the method for obtaining the molecular structural group of active pharmacophores is:
step S110, extracting a pharmacophore from a given drug sample;
step S120, splitting the pharmacophore into a plurality of atomic groups by utilizing chemical bond connection;
step S130, obtaining atoms or elements with the same property from the domain knowledge;
step S140, replacing a plurality of atomic groups of the pharmacophore molecular structure with atoms or elements from domain knowledge;
s150, carrying out chemical activity comparison on the new medicinal effect group molecules obtained after replacement and the original medicinal effect group molecules, and selecting the new medicinal effect group molecules with the chemical activity difference smaller than a preset value and the original medicinal effect group molecules to form a medicinal effect group molecular group;
step S160, selecting new pharmacophore molecules and original pharmacophore molecules to form an active pharmacophore molecular group if the chemical activity difference of the new pharmacophore molecules and the original pharmacophore molecules is not large after comparison, and establishing a relationship network;
step S170, repeating the above steps, and expanding the active pharmacophore molecular group.
In some preferred embodiments, the drug molecule generator is constructed based on RNN network, and training is performed based on training samples constructed from drug compound string data;
the training samples comprise input samples and output samples; the output sample is a character in the string of pharmaceutical compounds and the corresponding input sample is the string of pharmaceutical compounds preceding the character.
In some preferred embodiments, based on the stack-memory mechanism provided in the drug molecule generator, in the process of generating the drug compound character string, the existing character string sequence is used as a prefix segment, and the generation of the next character is performed cyclically by the drug molecule generator until the complete drug compound character string is obtained.
In some preferred embodiments, the "random insertion of the pharmacophore into the molecule generation process" is performed by:
randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning;
and if the generator does not reach the preset convergence condition within the preset iteration time threshold, the probability P is adjusted down according to a preset second adjusting value, and the second adjusting value is smaller than the first adjusting value.
In some preferred embodiments, the hybrid reward function is
r(st,at)=rtask(st,at)+λ*rimition(st,at),λ∈[0,1]
Wherein, r(s)t,at) Represents a state of stThe action is atThe hybrid reward function of (1); r istask(st,at) Obtaining reward for the task reward function when the generated pharmacophore structure molecules have activity; r isimition(st,at) In order to mimic the reward function, the similarity between the generated pharmacophore structural molecules and the selected active pharmacophore molecules as initial state is obtained when greater than a set threshold.
In some preferred embodiments, the loss function θ is trained*Is composed of
Figure BDA0003054492960000031
Wherein the content of the first and second substances,
Figure BDA0003054492960000032
[]representing the expectation of each round, γtIs the reward at time t, b is the error term, alpha is the hyperparameter, H (a)t|st) Is in a state stLower motion atThe probability of (c).
In a second aspect of the present invention, a system for training a drug molecule generator based on domain knowledge and deep reinforcement learning is provided, which includes:
a pharmacophore storage unit configured to store a group of active pharmacophore molecular structures; the active pharmacophore molecular structure group is constructed on the basis of a given drug sample by utilizing domain knowledge;
a drug molecule generator configured to randomly insert pharmacophores in an active pharmacophore molecular group into a generation process of molecules based on course learning under a dynamic strategy to generate pharmacophore structural molecules with a specific target; the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES;
a predictor configured to perform chemical activity judgment on the generated pharmacophore structure molecule;
the mixed reward calculation unit is configured to calculate a reward value of the acquired pharmacophore structure molecules with specific targets based on a pre-constructed mixed reward function; the hybrid reward function is constructed based on a mission reward function and an emulated reward function.
In a third aspect of the present invention, an apparatus is provided, which includes:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the method of drug molecule generator training based on domain knowledge and deep reinforcement learning described above.
In a fourth aspect of the present invention, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores computer instructions for execution by the computer to implement the method for training a drug molecule generator based on domain knowledge and deep reinforcement learning.
The invention has the beneficial effects that:
(1) the invention realizes the establishment of a specific pharmacophore structure group based on the domain knowledge, and solves the problem of undersize sample size.
(2) Course learning under a dynamic strategy introduces a guiding probability P, controls the probability of using pharmacodynamic fragments in the training process, and prevents the problem of single molecule caused by excessive demonstration learning.
(3) Constructing a mixed reward function, giving an intermediate reward for generating molecules containing similar active pharmacophore structures, increasing the probability of the occurrence of such structures, and enabling the generated molecules to converge to the pharmacophore structure with a specific target more quickly.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a method for training a drug molecule generator based on domain knowledge and deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a drug molecule generator training system framework based on domain knowledge and deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a method for constructing a cluster of active pharmacophore molecules using domain knowledge in one embodiment of the invention;
FIG. 4 is a diagram illustrating a method for establishing a relationship network using domain knowledge according to an embodiment of the present invention.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a drug molecule generator training method based on domain knowledge and deep reinforcement learning, which is shown in figure 1 and comprises the following steps
Based on the drug molecule generator, by utilizing curriculum-based learning under a dynamic strategy, the pharmacophore in the active pharmacophore molecular group is randomly inserted into the generation process of the molecule to generate a pharmacophore structural molecule with a specific target;
calculating the reward value of the acquired pharmacophore structure molecules with the specific target by utilizing a pre-constructed mixed reward function; the hybrid reward function is constructed based on a task reward function and a simulation reward function;
a reinforcement learning method is utilized, a hybrid reward function is maximized, and a trained drug molecule generator is obtained;
wherein the content of the first and second substances,
the active pharmacophore molecular group is constructed on the basis of a given drug sample by utilizing domain knowledge;
the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES.
In order to more clearly explain the present invention, the following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings. In order to make the technical solution description clearer, the following description is made first of the composition of the training system, and then the training method is described.
The system for training a drug molecule generator based on domain knowledge and deep reinforcement learning according to the first embodiment of the present invention, as shown in fig. 2, includes a pharmacophore storage unit, a drug molecule generator (abbreviated as generator in fig. 2), a predictor, and a hybrid reward calculation unit (abbreviated as hybrid reward in fig. 2).
1. Pharmacophore storage unit
A pharmacophore storage unit configured to store an active pharmacophore molecular structure group (active pharmacophore group in fig. 2); the active pharmacophore molecular structure group is constructed based on a given drug sample by using domain knowledge.
The method for acquiring the molecular structure group of the active pharmacophore comprises the following steps:
step S110, extracting a pharmacophore from a given drug sample;
step S120, splitting the pharmacophore into a plurality of atomic groups by utilizing chemical bond connection;
step S130, obtaining atoms or elements with the same property from the domain knowledge;
step S140, replacing a plurality of atomic groups of the pharmacophore molecular structure with atoms or elements from domain knowledge;
s150, carrying out chemical activity comparison on the new medicinal effect group molecules obtained after replacement and the original medicinal effect group molecules, and selecting the new medicinal effect group molecules with the chemical activity difference smaller than a preset value and the original medicinal effect group molecules to form a medicinal effect group molecular group;
step S160, selecting new pharmacophore molecules and original pharmacophore molecules to form an active pharmacophore molecular group if the chemical activity difference of the new pharmacophore molecules and the original pharmacophore molecules is not large after comparison, and establishing a relationship network;
step S170, repeating the above steps, and expanding the active pharmacophore molecular group (using S110-S160 for different drug samples of the same drug, respectively, to obtain the pharmacophore molecular group of the drug).
2. Drug molecule generator
A drug molecule generator configured to randomly insert pharmacophores in an active pharmacophore molecular group into a generation process of molecules based on course learning under a dynamic strategy to generate pharmacophore structural molecules with a specific target; the character string generated by the drug molecule generator is a syntactic structure of a pharmacophore character string SMILES (Simplified molecular input line specification, Simplified molecular linear input specification).
The drug molecule generator G is constructed based on RNN networks, and in some embodiments a Stack-RNN model (Stack-based recurrent neural network) may be employed. Before the training of the drug molecule generator based on the domain knowledge and the deep reinforcement learning in the invention, the molecule generator G needs to be trained to learn the grammar structure of the pharmacophore character string SMILES.
In this embodiment, the CheMBL21 database containing 150 ten thousand drug compound strings is used as a training set to train the drug molecule generator G, so that the trained drug molecule generator G can generate a complete drug compound string according to the input prefix segment string.
The method comprises the following steps of training a drug molecule generator G by using a CheMBL21 database containing 150 ten thousand drug compound character strings as a training set, firstly constructing a training sample, and acquiring the drug compound character strings from the CheMBL21 database to construct the training sample, wherein the training sample comprises an input sample and an output sample; the output sample is one of the characters in the pharmaceutical compound string (output character) and the corresponding input sample is the character string preceding the character in the pharmaceutical compound string (prefix input).
The training method comprises the following steps:
the drug molecule generator G obtains the probability distribution of the next character based on the input sample;
predicting the next character according to the obtained probability distribution to be used as a predicted character;
and calculating a cross entropy loss function and updating parameters in the medicine molecule generator G based on the predicted characters and the output characters corresponding to the input samples.
In this embodiment, the next character is predicted according to the obtained probability distribution, and the calculation formula is
ht=σ(Wixt+Whht-1)
Wherein h istIs the vector of the hidden layer, ht-1Is the vector from the last time step, wiIs a parameter of the input layer, whIs a parameter in the hidden layer, σ is an activation function, xtIs the input of the current time node.
In this embodiment, the drug molecule generator G introduces stack-memory (stack-memory mechanism) to maintain and transmit hidden layer information at each time step, and the calculation formula is
st[0]=at[PUSH]σ(Dht)+at[POP]st-1[1]+at[NO-OP]st-1[0]
Where D is a 1 × m matrix, at=[at[PUSH],at[POP],at[NO-OP]]Is a vector of stack control variables, St[0]Containing array, S, with a certain time node statet-1[0]Is the initial state of the last time node, St-1[1]Is the state of the last time node, σ (Dh)t) Is the probability of a state value, at[POP]Agent's action when state is drained from the stack, at[PUSH]To push the state from agent's action on stack, at[NO-OP]No action is taken on the stack operating state.
If a ist[POP]If the value is equal to 1, replacing the element value at the top in the stack; if a ist[PUSH]If the value is equal to 1, adding a new value to the top of the stack, and descending the rest values in the stack; if at[NO-OP]Equal to 1, the value in the stack remains unchanged.
The calculation mode similar to the method is applied to the calculation of elements in the stack under different depths i, and the calculation formula is as follows:
Figure BDA0003054492960000091
thus, the hidden layer htCan be calculated using the following formula:
Figure BDA0003054492960000092
where D is a 1 x m matrix,
Figure BDA0003054492960000093
is k elements from the top to the bottom of the stack in the time step of t-1, σ () is the sum activation function, U is the parameter vector, R is the reward matrix, xtIs the input of the current time node.
In this embodiment, the trained drug molecule generator G has the following structure due to the stack-memory mechanism:
a POP operation to delete the topmost element in the stack, replacing the topmost element;
a PUSH operation to PUSH the new element to the top of the stack;
NO-OP operations to hold elements in the stack, not to perform any operations.
The method for generating the complete character string of the pharmaceutical compound according to the character string of the input prefix fragment comprises the following steps: the existing generated sequence (the character string of the drug molecule generated at present) is used as prefix input; predicting the probability distribution of the next character; determining a generated character from the probability distribution; and circularly generating the next character until a complete character string of the pharmaceutical compound is obtained.
3. Predictor
And a predictor configured to determine the chemical activity of the generated pharmacophore structure molecule. The determination method comprises comparing chemical activity of the generated pharmacophore structure molecule with that of the original pharmacophore molecule, and determining that the pharmacophore structure molecule has chemical activity when the generated pharmacophore structure molecule is extensive over a set threshold.
4. Hybrid reward calculation unit
The mixed reward calculation unit is configured to calculate a reward value of the acquired pharmacophore structure molecules with specific targets based on a pre-constructed mixed reward function; the hybrid reward function is constructed based on a mission reward function and an emulated reward function.
The mixed reward function is
r(st,at)=rtask(st,at)+λ*rimition(st,at),λ∈[0,1]
Wherein, r(s)t,at) Represents a state of stThe action is atThe hybrid reward function of (1); r istask(st,at) Obtaining reward for the task reward function when the generated pharmacophore structure molecules have activity; r isimition(st,at) In order to mimic the reward function, the similarity between the generated pharmacophore structural molecules and the selected active pharmacophore molecules as initial state is obtained when greater than a set threshold.
(1) The task reward function is determined by the specific task, and the reward is acquired when the predictor judges that the generated pharmacophore structure molecules have activity.
(2) And simulating a reward function, and acquiring the similarity between the generated pharmacophore structure molecules and the selected active pharmacophore molecules in the initial state when the similarity is larger than a set threshold value.
In the simulated reward function, the intersection formula is used:
Figure BDA0003054492960000101
calculating the similarity degree between the molecules, comparing the similarity degree calculation result with a threshold value G, if the similarity degree calculation result is greater than the threshold value, acquiring the reward function, otherwise, if the similarity degree calculation result is less than the threshold value, not acquiring the reward function; wherein M is1、M2Respectively, specimen molecules which can obtain rewards and new molecules which are generated currently.
The expression formula imitating the reward function is
Figure BDA0003054492960000111
Figure BDA0003054492960000112
Wherein the content of the first and second substances,
Figure BDA0003054492960000113
to indicate the ratio of common features possessed, MsIs a set of structural features extracted from the molecule S, miIs MsOf (i), ECFP(s)T) Is a state sTMolar fingerprint of (C), ECFP (m)i) Is a structural feature miMolar fingerprint of (C), T (ECFP(s)T),ECFP(mi) K1 is a super parameter, initially set to k 1-0.25.
The method for training the drug molecule generator based on the domain knowledge and the deep reinforcement learning comprises the following steps
(1) Based on the drug molecule generator, the pharmacophore in the active pharmacophore molecular group is randomly inserted into the generation process of the molecule by utilizing curriculum learning under a dynamic strategy to generate the pharmacophore structural molecule with a specific target.
Randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning; thus, the drug molecule generator does not interact only with one active pharmacophore molecule, but with a plurality of molecules of the group consisting of the active pharmacophore molecules.
Each active pharmacophore molecule in the initial state is attached with a probability P, and the interaction of the drug molecule generator with the initial state is adjusted by controlling the size of P. In the training process, the probability P is dynamically adjusted according to the reward obtained by the drug molecule generator, namely the probability P of the active pharmacophore molecule is adjusted based on the reward value obtained by the mixed reward function, if the generator quickly reaches the reward requirement (within a preset iteration threshold, a preset convergence condition is reached), the probability P is quickly reduced (the probability P is adjusted down according to a preset first adjustment value), otherwise, if the generator cannot quickly reach the reward requirement (within the preset iteration threshold, the preset convergence condition is not reached), the probability P is slowly reduced (the probability P is adjusted down according to a preset second adjustment value, and the second adjustment value is smaller than the first adjustment value).
(2) And calculating the reward value of the acquired pharmacophore structure molecules with specific targets by utilizing a pre-constructed mixed reward function.
(3) And (4) maximizing a mixed reward function by utilizing a reinforcement learning method to obtain the trained medicine molecule generator.
Maximizing a hybrid reward function θ*Is of the formula
Figure BDA0003054492960000121
Wherein, tau to pθ(τ) is the probability corresponding to each action (action) in a round,
Figure BDA0003054492960000123
[]representing the expectation, γ, of each round with a certain probabilitytIs the reward at time t, b is the error term, alpha is the hyperparameter, H (a)t|st) Is in a state stLower motion atThe probability of (c).
In order to make the diversity of molecules produced by the drug molecule generator richer, implemented using entropy constraints, the formula for maximizing the hybrid reward function can be transformed into:
Figure BDA0003054492960000122
wherein alpha is a hyperparameter, H (a)t|st) Is in a state stLower motion atThe probability of (c).
Fig. 3 is a schematic diagram of a method for constructing an active pharmacophore molecular cluster by using domain knowledge in an embodiment of the present invention, which extracts active drug molecular fragments from an existing sample, obtains atomic groups with similar characteristics by combining the domain knowledge, establishes the atomic groups into atomic group characteristic clusters, and establishes a pharmacophore cluster by using the atomic group characteristic clusters and the pharmacophore.
Fig. 4 is a schematic diagram of a method for establishing a relationship network by using domain knowledge in an embodiment of the present invention, where the diagram includes two pharmacophores (pharmacophores 1 and 2), in the diagram, Frag1 and Frag2 are respectively a fragment 1 not containing a pharmacophore and a fragment 2 not containing a pharmacophore, dfag 1, dfag 2, dfag 3 and dfag 4 are respectively fragments 1, 2, 3 and 4 containing a common molecular part, and mF is a complete molecule with an undamaged structure.
The invention provides a medicine molecule generating method based on domain knowledge and deep reinforcement learning, which generates a specific pharmacophore molecular structure based on a trained medicine molecule generator obtained by the medicine molecule generator training system and method based on the domain knowledge and the deep reinforcement learning.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the training method for a drug molecule generator based on domain knowledge and deep reinforcement learning, and the method for generating a drug molecule based on domain knowledge and deep reinforcement learning described above may refer to the corresponding processes in the foregoing system embodiments, and are not described herein again.
It should be noted that, the medicine molecule generator training system based on domain knowledge and deep reinforcement learning provided in the foregoing embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
An apparatus of a fourth embodiment of the invention comprises:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the method of drug molecule generator training based on domain knowledge and deep reinforcement learning described above.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for training a drug molecule generator based on domain knowledge and deep reinforcement learning described above.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A medicine molecule generator training method based on domain knowledge and deep reinforcement learning is characterized by comprising the following steps
Based on the drug molecule generator, by utilizing curriculum-based learning under a dynamic strategy, the pharmacophore in the active pharmacophore molecular group is randomly inserted into the generation process of the molecule to generate a pharmacophore structural molecule with a specific target;
calculating the reward value of the acquired pharmacophore structure molecules with the specific target by utilizing a pre-constructed mixed reward function; the hybrid reward function is constructed based on a task reward function and a simulation reward function;
a reinforcement learning method is utilized, a hybrid reward function is maximized, and a trained drug molecule generator is obtained;
wherein the content of the first and second substances,
the active pharmacophore molecular group is constructed on the basis of a given drug sample by utilizing domain knowledge;
the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES.
2. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 1, wherein the method for obtaining the molecular structure group of the active pharmacophore is as follows:
step S110, extracting a pharmacophore from a given drug sample;
step S120, splitting the pharmacophore into a plurality of atomic groups by utilizing chemical bond connection;
step S130, obtaining atoms or elements with the same property from the domain knowledge;
step S140, replacing a plurality of atomic groups of the pharmacophore molecular structure with atoms or elements from domain knowledge;
s150, carrying out chemical activity comparison on the new medicinal effect group molecules obtained after replacement and the original medicinal effect group molecules, and selecting the new medicinal effect group molecules with the chemical activity difference smaller than a preset value and the original medicinal effect group molecules to form a medicinal effect group molecular group;
step S160, selecting new pharmacophore molecules and original pharmacophore molecules to form an active pharmacophore molecular group if the chemical activity difference of the new pharmacophore molecules and the original pharmacophore molecules is not large after comparison, and establishing a relationship network;
step S170, repeating the above steps, and expanding the active pharmacophore molecular group.
3. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 1, wherein the drug molecule generator is constructed based on RNN network, and is trained based on training samples constructed from character string data of drug compounds;
the training samples comprise input samples and output samples; the output sample is a character in the string of pharmaceutical compounds and the corresponding input sample is the string of pharmaceutical compounds preceding the character.
4. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 3, wherein based on a stack-memory mechanism provided in the drug molecule generator, in the process of generating the drug compound character string, an existing character string sequence is used as a prefix segment, and the generation of the next character is performed cyclically by the drug molecule generator until a complete drug compound character string is obtained.
5. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 1, wherein the method for randomly inserting the pharmacophore into the generation process of the molecule comprises:
randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning;
and if the generator does not reach the preset convergence condition within the preset iteration time threshold, the probability P is adjusted down according to a preset second adjusting value, and the second adjusting value is smaller than the first adjusting value.
6. The method of claim 1, wherein the hybrid reward function is
r(st,at)=rtask(st,at)+λ*rimition(st,at),λ∈[0,1]
Wherein, r(s)t,at) Represents a state of stThe action is atThe hybrid reward function of (1); r istask(st,at) Obtaining reward for the task reward function when the generated pharmacophore structure molecules have activity; r isimition(st,at) To mimic the reward function, the similarity between the resulting pharmacophore structural molecules and the selected active pharmacophore molecules as the initial state is greater thanAnd obtaining at fixed threshold.
7. The method of claim 1, wherein the loss function θ is used for training*Is composed of
Figure FDA0003054492950000031
Wherein the content of the first and second substances,
Figure FDA0003054492950000032
representing the expectation of each round, γtIs the reward at time t, b is the error term, alpha is the hyperparameter, H (a)t|st) Is in a state stLower motion atThe probability of (c).
8. A drug molecule generator training system based on domain knowledge and deep reinforcement learning is characterized by comprising
A pharmacophore storage unit configured to store a group of active pharmacophore molecular structures; the active pharmacophore molecular structure group is constructed on the basis of a given drug sample by utilizing domain knowledge;
a drug molecule generator configured to randomly insert pharmacophores in an active pharmacophore molecular group into a generation process of molecules based on course learning under a dynamic strategy to generate pharmacophore structural molecules with a specific target; the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES;
a predictor configured to perform chemical activity judgment on the generated pharmacophore structure molecule;
the mixed reward calculation unit is configured to calculate a reward value of the acquired pharmacophore structure molecules with specific targets based on a pre-constructed mixed reward function; the hybrid reward function is constructed based on a mission reward function and an emulated reward function.
9. An apparatus, comprising:
at least one processor; and
a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,
the memory stores instructions executable by the processor for execution by the processor to implement the domain knowledge and deep reinforcement learning based drug molecule generator training method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the domain knowledge and deep reinforcement learning-based drug molecule generator training method of any one of claims 1-7.
CN202110496113.5A 2021-05-07 2021-05-07 Medicine molecular generator training method based on domain knowledge and deep reinforcement learning Active CN113223637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110496113.5A CN113223637B (en) 2021-05-07 2021-05-07 Medicine molecular generator training method based on domain knowledge and deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110496113.5A CN113223637B (en) 2021-05-07 2021-05-07 Medicine molecular generator training method based on domain knowledge and deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113223637A true CN113223637A (en) 2021-08-06
CN113223637B CN113223637B (en) 2023-07-25

Family

ID=77091711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110496113.5A Active CN113223637B (en) 2021-05-07 2021-05-07 Medicine molecular generator training method based on domain knowledge and deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113223637B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974461A (en) * 2022-06-15 2022-08-30 烟台国工智能科技有限公司 Multi-target attribute molecule generation method and system based on strategy learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161635A1 (en) * 2015-12-02 2017-06-08 Preferred Networks, Inc. Generative machine learning systems for drug design
WO2019018780A1 (en) * 2017-07-20 2019-01-24 The University Of North Carolina At Chapel Hill Methods, systems and non-transitory computer readable media for automated design of molecules with desired properties using artificial intelligence
CN110534164A (en) * 2019-09-26 2019-12-03 广州费米子科技有限责任公司 Drug molecule generation method based on deep learning
CN111508568A (en) * 2020-04-20 2020-08-07 腾讯科技(深圳)有限公司 Molecule generation method and device, computer readable storage medium and terminal equipment
CN112270951A (en) * 2020-11-10 2021-01-26 四川大学 Brand-new molecule generation method based on multitask capsule self-encoder neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161635A1 (en) * 2015-12-02 2017-06-08 Preferred Networks, Inc. Generative machine learning systems for drug design
WO2019018780A1 (en) * 2017-07-20 2019-01-24 The University Of North Carolina At Chapel Hill Methods, systems and non-transitory computer readable media for automated design of molecules with desired properties using artificial intelligence
US20200168302A1 (en) * 2017-07-20 2020-05-28 The University Of North Carolina At Chapel Hill Methods, systems and non-transitory computer readable media for automated design of molecules with desired properties using artificial intelligence
CN110534164A (en) * 2019-09-26 2019-12-03 广州费米子科技有限责任公司 Drug molecule generation method based on deep learning
CN111508568A (en) * 2020-04-20 2020-08-07 腾讯科技(深圳)有限公司 Molecule generation method and device, computer readable storage medium and terminal equipment
CN112270951A (en) * 2020-11-10 2021-01-26 四川大学 Brand-new molecule generation method based on multitask capsule self-encoder neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈鑫 等: "药物表示学习研究进展", 《清华大学学报(自然科学版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114974461A (en) * 2022-06-15 2022-08-30 烟台国工智能科技有限公司 Multi-target attribute molecule generation method and system based on strategy learning

Also Published As

Publication number Publication date
CN113223637B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN109496320B (en) Artificial intelligence engine with architect module
US20210103823A1 (en) Training neural networks using a variational information bottleneck
CN111078836A (en) Machine reading understanding method, system and device based on external knowledge enhancement
KR20200014510A (en) Method for providing prediction service based on mahcine-learning and apparatus thereof
EP3568811A1 (en) Training machine learning models
CN110659678B (en) User behavior classification method, system and storage medium
CN111340221B (en) Neural network structure sampling method and device
CN112083933A (en) Service function chain deployment method based on reinforcement learning
US20190295688A1 (en) Processing biological sequences using neural networks
US9536206B2 (en) Method and apparatus for improving resilience in customized program learning network computational environments
Meshram et al. College enquiry chatbot using rasa framework
CN113778871A (en) Mock testing method, device, equipment and storage medium
CN112925926B (en) Training method and device of multimedia recommendation model, server and storage medium
Sayed et al. DiSH simulator: Capturing dynamics of cellular signaling with heterogeneous knowledge
CN113223637B (en) Medicine molecular generator training method based on domain knowledge and deep reinforcement learning
WO2021117180A1 (en) Dialog processing device, learning device, dialog processing method, learning method, and program
Mehmood et al. From information society to knowledge society: The Asian perspective
CN110955765A (en) Corpus construction method and apparatus of intelligent assistant, computer device and storage medium
CN116964678A (en) Prediction of protein amino acid sequence using generative model conditioned on protein structure intercalation
CN111090740B (en) Knowledge graph generation method for dialogue system
Rohr Discrete-time leap method for stochastic simulation
CN110580528A (en) network model generation method and device
US20140250034A1 (en) Method and apparatus for improving resilience in customized program learning network computational environments
CN115496431A (en) Order form and transport capacity matching method and device and electronic equipment
Bompiani et al. High-performance computing with terastat

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant