CN113223637A

CN113223637A - Drug molecule generator training method based on domain knowledge and deep reinforcement learning

Info

Publication number: CN113223637A
Application number: CN202110496113.5A
Authority: CN
Inventors: 黄向生; 蔡金儒
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-08-06
Anticipated expiration: 2041-05-07
Also published as: CN113223637B

Abstract

The invention belongs to the field of drug molecule generation, and particularly relates to a drug molecule generator training method based on domain knowledge and deep reinforcement learning, aiming at solving the problem that samples are limited when drug molecules are generated through deep learning. The invention comprises constructing an active pharmacophore molecular group by using domain knowledge; by utilizing curriculum-based learning under a dynamic strategy, randomly inserting the pharmacophore in the active pharmacophore molecular group into the generation process of molecules to generate pharmacophore structural molecules with specific targets; and (4) maximizing a mixed reward function by utilizing a reinforcement learning method to obtain the trained medicine molecule generator. The invention solves the problem of undersize sample size, prevents the problem of single molecule caused by excessive demonstration learning, and can enable the generated molecules to be converged to the pharmacophore structure with a specific target more quickly.

Description

Drug molecule generator training method based on domain knowledge and deep reinforcement learning

Technical Field

The invention belongs to the field of medicine molecule generation, and particularly relates to a medicine molecule generator training method based on domain knowledge and deep reinforcement learning.

Background

In recent years, deep learning methods have been widely used in computational chemistry, for example, many studies have demonstrated the feasibility of designing new compounds with desirable chemical properties by deep learning methods. While deep learning methods have good performance in generating compounds of the target chemistry, large numbers of samples are typically required in practical procedures. However, obtaining large numbers of samples is difficult, resulting in a small number of drugs that have been discovered. This presents a challenge to the general deep learning approach. How to complete drug design under limited sample conditions is an important issue. Therefore, a reinforcement learning method based on domain knowledge is provided by combining prior knowledge. The method integrates domain knowledge into a training process on the basis of an original method, and uses various generalized same characteristics for course learning for the first time.

Disclosure of Invention

In order to solve the above problems in the prior art, i.e. to solve the problem of limited samples in the process of drug molecule generation through deep learning, the invention provides a drug molecule generator training method based on domain knowledge and deep reinforcement learning, which comprises the steps of

Based on the drug molecule generator, by utilizing curriculum-based learning under a dynamic strategy, the pharmacophore in the active pharmacophore molecular group is randomly inserted into the generation process of the molecule to generate a pharmacophore structural molecule with a specific target;

calculating the reward value of the acquired pharmacophore structure molecules with the specific target by utilizing a pre-constructed mixed reward function; the hybrid reward function is constructed based on a task reward function and a simulation reward function;

a reinforcement learning method is utilized, a hybrid reward function is maximized, and a trained drug molecule generator is obtained;

wherein the content of the first and second substances,

the active pharmacophore molecular group is constructed on the basis of a given drug sample by utilizing domain knowledge;

the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES.

In some preferred embodiments, the method for obtaining the molecular structural group of active pharmacophores is:

step S110, extracting a pharmacophore from a given drug sample;

step S120, splitting the pharmacophore into a plurality of atomic groups by utilizing chemical bond connection;

step S130, obtaining atoms or elements with the same property from the domain knowledge;

step S140, replacing a plurality of atomic groups of the pharmacophore molecular structure with atoms or elements from domain knowledge;

s150, carrying out chemical activity comparison on the new medicinal effect group molecules obtained after replacement and the original medicinal effect group molecules, and selecting the new medicinal effect group molecules with the chemical activity difference smaller than a preset value and the original medicinal effect group molecules to form a medicinal effect group molecular group;

step S160, selecting new pharmacophore molecules and original pharmacophore molecules to form an active pharmacophore molecular group if the chemical activity difference of the new pharmacophore molecules and the original pharmacophore molecules is not large after comparison, and establishing a relationship network;

step S170, repeating the above steps, and expanding the active pharmacophore molecular group.

In some preferred embodiments, the drug molecule generator is constructed based on RNN network, and training is performed based on training samples constructed from drug compound string data;

the training samples comprise input samples and output samples; the output sample is a character in the string of pharmaceutical compounds and the corresponding input sample is the string of pharmaceutical compounds preceding the character.

In some preferred embodiments, based on the stack-memory mechanism provided in the drug molecule generator, in the process of generating the drug compound character string, the existing character string sequence is used as a prefix segment, and the generation of the next character is performed cyclically by the drug molecule generator until the complete drug compound character string is obtained.

In some preferred embodiments, the "random insertion of the pharmacophore into the molecule generation process" is performed by:

randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning;

and if the generator does not reach the preset convergence condition within the preset iteration time threshold, the probability P is adjusted down according to a preset second adjusting value, and the second adjusting value is smaller than the first adjusting value.

In some preferred embodiments, the hybrid reward function is

r(s_t,a_t)＝rt_ask(s_t,a_t)+λ*r_imition(s_t,a_t),λ∈[0,1]

Wherein, r(s)_t,a_t) Represents a state of s_tThe action is a_tThe hybrid reward function of (1); r is_task(s_t,a_t) Obtaining reward for the task reward function when the generated pharmacophore structure molecules have activity; r is_imition(s_t,a_t) In order to mimic the reward function, the similarity between the generated pharmacophore structural molecules and the selected active pharmacophore molecules as initial state is obtained when greater than a set threshold.

In some preferred embodiments, the loss function θ is trained^*Is composed of

Wherein the content of the first and second substances,

[]representing the expectation of each round, γ^tIs the reward at time t, b is the error term, alpha is the hyperparameter, H (a)_t|s_t) Is in a state s_tLower motion a_tThe probability of (c).

In a second aspect of the present invention, a system for training a drug molecule generator based on domain knowledge and deep reinforcement learning is provided, which includes:

a pharmacophore storage unit configured to store a group of active pharmacophore molecular structures; the active pharmacophore molecular structure group is constructed on the basis of a given drug sample by utilizing domain knowledge;

a drug molecule generator configured to randomly insert pharmacophores in an active pharmacophore molecular group into a generation process of molecules based on course learning under a dynamic strategy to generate pharmacophore structural molecules with a specific target; the character string output by the drug molecule generator is a grammatical structure of a pharmacophore character string SMILES;

a predictor configured to perform chemical activity judgment on the generated pharmacophore structure molecule;

the mixed reward calculation unit is configured to calculate a reward value of the acquired pharmacophore structure molecules with specific targets based on a pre-constructed mixed reward function; the hybrid reward function is constructed based on a mission reward function and an emulated reward function.

In a third aspect of the present invention, an apparatus is provided, which includes:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,

the memory stores instructions executable by the processor for execution by the processor to implement the method of drug molecule generator training based on domain knowledge and deep reinforcement learning described above.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores computer instructions for execution by the computer to implement the method for training a drug molecule generator based on domain knowledge and deep reinforcement learning.

The invention has the beneficial effects that:

(1) the invention realizes the establishment of a specific pharmacophore structure group based on the domain knowledge, and solves the problem of undersize sample size.

(2) Course learning under a dynamic strategy introduces a guiding probability P, controls the probability of using pharmacodynamic fragments in the training process, and prevents the problem of single molecule caused by excessive demonstration learning.

(3) Constructing a mixed reward function, giving an intermediate reward for generating molecules containing similar active pharmacophore structures, increasing the probability of the occurrence of such structures, and enabling the generated molecules to converge to the pharmacophore structure with a specific target more quickly.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow chart of a method for training a drug molecule generator based on domain knowledge and deep reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a drug molecule generator training system framework based on domain knowledge and deep reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a method for constructing a cluster of active pharmacophore molecules using domain knowledge in one embodiment of the invention;

FIG. 4 is a diagram illustrating a method for establishing a relationship network using domain knowledge according to an embodiment of the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention provides a drug molecule generator training method based on domain knowledge and deep reinforcement learning, which is shown in figure 1 and comprises the following steps

wherein the content of the first and second substances,

In order to more clearly explain the present invention, the following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings. In order to make the technical solution description clearer, the following description is made first of the composition of the training system, and then the training method is described.

The system for training a drug molecule generator based on domain knowledge and deep reinforcement learning according to the first embodiment of the present invention, as shown in fig. 2, includes a pharmacophore storage unit, a drug molecule generator (abbreviated as generator in fig. 2), a predictor, and a hybrid reward calculation unit (abbreviated as hybrid reward in fig. 2).

1. Pharmacophore storage unit

A pharmacophore storage unit configured to store an active pharmacophore molecular structure group (active pharmacophore group in fig. 2); the active pharmacophore molecular structure group is constructed based on a given drug sample by using domain knowledge.

The method for acquiring the molecular structure group of the active pharmacophore comprises the following steps:

step S110, extracting a pharmacophore from a given drug sample;

step S170, repeating the above steps, and expanding the active pharmacophore molecular group (using S110-S160 for different drug samples of the same drug, respectively, to obtain the pharmacophore molecular group of the drug).

2. Drug molecule generator

A drug molecule generator configured to randomly insert pharmacophores in an active pharmacophore molecular group into a generation process of molecules based on course learning under a dynamic strategy to generate pharmacophore structural molecules with a specific target; the character string generated by the drug molecule generator is a syntactic structure of a pharmacophore character string SMILES (Simplified molecular input line specification, Simplified molecular linear input specification).

The drug molecule generator G is constructed based on RNN networks, and in some embodiments a Stack-RNN model (Stack-based recurrent neural network) may be employed. Before the training of the drug molecule generator based on the domain knowledge and the deep reinforcement learning in the invention, the molecule generator G needs to be trained to learn the grammar structure of the pharmacophore character string SMILES.

In this embodiment, the CheMBL21 database containing 150 ten thousand drug compound strings is used as a training set to train the drug molecule generator G, so that the trained drug molecule generator G can generate a complete drug compound string according to the input prefix segment string.

The method comprises the following steps of training a drug molecule generator G by using a CheMBL21 database containing 150 ten thousand drug compound character strings as a training set, firstly constructing a training sample, and acquiring the drug compound character strings from the CheMBL21 database to construct the training sample, wherein the training sample comprises an input sample and an output sample; the output sample is one of the characters in the pharmaceutical compound string (output character) and the corresponding input sample is the character string preceding the character in the pharmaceutical compound string (prefix input).

The training method comprises the following steps:

the drug molecule generator G obtains the probability distribution of the next character based on the input sample;

predicting the next character according to the obtained probability distribution to be used as a predicted character;

and calculating a cross entropy loss function and updating parameters in the medicine molecule generator G based on the predicted characters and the output characters corresponding to the input samples.

In this embodiment, the next character is predicted according to the obtained probability distribution, and the calculation formula is

h_t＝σ(W_ix_t+W_hh_t-1)

Wherein h is_tIs the vector of the hidden layer, h_t-1Is the vector from the last time step, w_iIs a parameter of the input layer, w_hIs a parameter in the hidden layer, σ is an activation function, x_tIs the input of the current time node.

In this embodiment, the drug molecule generator G introduces stack-memory (stack-memory mechanism) to maintain and transmit hidden layer information at each time step, and the calculation formula is

s_t[0]＝a_t[PUSH]σ(Dh_t)+a_t[POP]s_t-1[1]+a_t[NO-OP]s_t-1[0]

Where D is a 1 × m matrix, a_t＝[a_t[PUSH]，a_t[POP]，a_t[NO-OP]]Is a vector of stack control variables, S_t[0]Containing array, S, with a certain time node state_t-1[0]Is the initial state of the last time node, S_t-1[1]Is the state of the last time node, σ (Dh)_t) Is the probability of a state value, a_t[POP]Agent's action when state is drained from the stack, a_t[PUSH]To push the state from agent's action on stack, a_t[NO-OP]No action is taken on the stack operating state.

If a is_t[POP]If the value is equal to 1, replacing the element value at the top in the stack; if a is_t[PUSH]If the value is equal to 1, adding a new value to the top of the stack, and descending the rest values in the stack; if a_t[NO-OP]Equal to 1, the value in the stack remains unchanged.

The calculation mode similar to the method is applied to the calculation of elements in the stack under different depths i, and the calculation formula is as follows:

thus, the hidden layer h_tCan be calculated using the following formula:

where D is a 1 x m matrix,

is k elements from the top to the bottom of the stack in the time step of t-1, σ () is the sum activation function, U is the parameter vector, R is the reward matrix, x_tIs the input of the current time node.

In this embodiment, the trained drug molecule generator G has the following structure due to the stack-memory mechanism:

a POP operation to delete the topmost element in the stack, replacing the topmost element;

a PUSH operation to PUSH the new element to the top of the stack;

NO-OP operations to hold elements in the stack, not to perform any operations.

The method for generating the complete character string of the pharmaceutical compound according to the character string of the input prefix fragment comprises the following steps: the existing generated sequence (the character string of the drug molecule generated at present) is used as prefix input; predicting the probability distribution of the next character; determining a generated character from the probability distribution; and circularly generating the next character until a complete character string of the pharmaceutical compound is obtained.

3. Predictor

And a predictor configured to determine the chemical activity of the generated pharmacophore structure molecule. The determination method comprises comparing chemical activity of the generated pharmacophore structure molecule with that of the original pharmacophore molecule, and determining that the pharmacophore structure molecule has chemical activity when the generated pharmacophore structure molecule is extensive over a set threshold.

4. Hybrid reward calculation unit

The mixed reward function is

r(s_t,a_t)＝r_task(s_t,a_t)+λ*r_imition(s_t,a_t),λ∈[0,1]

(1) The task reward function is determined by the specific task, and the reward is acquired when the predictor judges that the generated pharmacophore structure molecules have activity.

(2) And simulating a reward function, and acquiring the similarity between the generated pharmacophore structure molecules and the selected active pharmacophore molecules in the initial state when the similarity is larger than a set threshold value.

In the simulated reward function, the intersection formula is used:

calculating the similarity degree between the molecules, comparing the similarity degree calculation result with a threshold value G, if the similarity degree calculation result is greater than the threshold value, acquiring the reward function, otherwise, if the similarity degree calculation result is less than the threshold value, not acquiring the reward function; wherein M is₁、M₂Respectively, specimen molecules which can obtain rewards and new molecules which are generated currently.

The expression formula imitating the reward function is

Wherein the content of the first and second substances,

to indicate the ratio of common features possessed, M_sIs a set of structural features extracted from the molecule S, m_iIs M_sOf (i), ECFP(s)_T) Is a state s_TMolar fingerprint of (C), ECFP (m)_i) Is a structural feature m_iMolar fingerprint of (C), T (ECFP(s)_T),ECFP(m_i) K1 is a super parameter, initially set to k 1-0.25.

The method for training the drug molecule generator based on the domain knowledge and the deep reinforcement learning comprises the following steps

(1) Based on the drug molecule generator, the pharmacophore in the active pharmacophore molecular group is randomly inserted into the generation process of the molecule by utilizing curriculum learning under a dynamic strategy to generate the pharmacophore structural molecule with a specific target.

Randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning; thus, the drug molecule generator does not interact only with one active pharmacophore molecule, but with a plurality of molecules of the group consisting of the active pharmacophore molecules.

Each active pharmacophore molecule in the initial state is attached with a probability P, and the interaction of the drug molecule generator with the initial state is adjusted by controlling the size of P. In the training process, the probability P is dynamically adjusted according to the reward obtained by the drug molecule generator, namely the probability P of the active pharmacophore molecule is adjusted based on the reward value obtained by the mixed reward function, if the generator quickly reaches the reward requirement (within a preset iteration threshold, a preset convergence condition is reached), the probability P is quickly reduced (the probability P is adjusted down according to a preset first adjustment value), otherwise, if the generator cannot quickly reach the reward requirement (within the preset iteration threshold, the preset convergence condition is not reached), the probability P is slowly reduced (the probability P is adjusted down according to a preset second adjustment value, and the second adjustment value is smaller than the first adjustment value).

(2) And calculating the reward value of the acquired pharmacophore structure molecules with specific targets by utilizing a pre-constructed mixed reward function.

(3) And (4) maximizing a mixed reward function by utilizing a reinforcement learning method to obtain the trained medicine molecule generator.

Maximizing a hybrid reward function θ^*Is of the formula

Wherein, tau to p_θ(τ) is the probability corresponding to each action (action) in a round,

[]representing the expectation, γ, of each round with a certain probability^tIs the reward at time t, b is the error term, alpha is the hyperparameter, H (a)_t|s_t) Is in a state s_tLower motion a_tThe probability of (c).

In order to make the diversity of molecules produced by the drug molecule generator richer, implemented using entropy constraints, the formula for maximizing the hybrid reward function can be transformed into:

wherein alpha is a hyperparameter, H (a)_t|s_t) Is in a state s_tLower motion a_tThe probability of (c).

Fig. 3 is a schematic diagram of a method for constructing an active pharmacophore molecular cluster by using domain knowledge in an embodiment of the present invention, which extracts active drug molecular fragments from an existing sample, obtains atomic groups with similar characteristics by combining the domain knowledge, establishes the atomic groups into atomic group characteristic clusters, and establishes a pharmacophore cluster by using the atomic group characteristic clusters and the pharmacophore.

Fig. 4 is a schematic diagram of a method for establishing a relationship network by using domain knowledge in an embodiment of the present invention, where the diagram includes two pharmacophores (pharmacophores 1 and 2), in the diagram, Frag1 and Frag2 are respectively a fragment 1 not containing a pharmacophore and a fragment 2 not containing a pharmacophore, dfag 1, dfag 2, dfag 3 and dfag 4 are respectively fragments 1, 2, 3 and 4 containing a common molecular part, and mF is a complete molecule with an undamaged structure.

The invention provides a medicine molecule generating method based on domain knowledge and deep reinforcement learning, which generates a specific pharmacophore molecular structure based on a trained medicine molecule generator obtained by the medicine molecule generator training system and method based on the domain knowledge and the deep reinforcement learning.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the training method for a drug molecule generator based on domain knowledge and deep reinforcement learning, and the method for generating a drug molecule based on domain knowledge and deep reinforcement learning described above may refer to the corresponding processes in the foregoing system embodiments, and are not described herein again.

It should be noted that, the medicine molecule generator training system based on domain knowledge and deep reinforcement learning provided in the foregoing embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An apparatus of a fourth embodiment of the invention comprises:

at least one processor; and

A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the method for training a drug molecule generator based on domain knowledge and deep reinforcement learning described above.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A medicine molecule generator training method based on domain knowledge and deep reinforcement learning is characterized by comprising the following steps

wherein the content of the first and second substances,

2. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 1, wherein the method for obtaining the molecular structure group of the active pharmacophore is as follows:

step S110, extracting a pharmacophore from a given drug sample;

3. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 1, wherein the drug molecule generator is constructed based on RNN network, and is trained based on training samples constructed from character string data of drug compounds;

4. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 3, wherein based on a stack-memory mechanism provided in the drug molecule generator, in the process of generating the drug compound character string, an existing character string sequence is used as a prefix segment, and the generation of the next character is performed cyclically by the drug molecule generator until a complete drug compound character string is obtained.

5. The method for training a drug molecule generator based on domain knowledge and deep reinforcement learning of claim 1, wherein the method for randomly inserting the pharmacophore into the generation process of the molecule comprises:

6. The method of claim 1, wherein the hybrid reward function is

r(s_t,a_t)＝r_task(s_t,a_t)+λ*r_imition(s_t,a_t),λ∈[0,1]

Wherein, r(s)_t,a_t) Represents a state of s_tThe action is a_tThe hybrid reward function of (1); r is_task(s_t,a_t) Obtaining reward for the task reward function when the generated pharmacophore structure molecules have activity; r is_imition(s_t,a_t) To mimic the reward function, the similarity between the resulting pharmacophore structural molecules and the selected active pharmacophore molecules as the initial state is greater thanAnd obtaining at fixed threshold.

7. The method of claim 1, wherein the loss function θ is used for training^*Is composed of

Wherein the content of the first and second substances,

representing the expectation of each round, γ^tIs the reward at time t, b is the error term, alpha is the hyperparameter, H (a)_t|s_t) Is in a state s_tLower motion a_tThe probability of (c).

8. A drug molecule generator training system based on domain knowledge and deep reinforcement learning is characterized by comprising

9. An apparatus, comprising:

at least one processor; and

the memory stores instructions executable by the processor for execution by the processor to implement the domain knowledge and deep reinforcement learning based drug molecule generator training method of any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the domain knowledge and deep reinforcement learning-based drug molecule generator training method of any one of claims 1-7.