CN113223637B

CN113223637B - Medicine molecular generator training method based on domain knowledge and deep reinforcement learning

Info

Publication number: CN113223637B
Application number: CN202110496113.5A
Authority: CN
Inventors: 黄向生; 蔡金儒
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2023-07-25
Anticipated expiration: 2041-05-07
Also published as: CN113223637A

Abstract

The invention belongs to the field of drug molecule generation, in particular relates to a training method of a drug molecule generator based on field knowledge and deep reinforcement learning, and aims to solve the problem that samples are limited when drug molecule generation is performed through deep learning. The invention includes constructing active pharmacophore molecular groups using domain knowledge; using course learning under dynamic strategy to randomly insert pharmacophores in active pharmacophore molecular groups into the generation process of the molecules to generate pharmacophore structure molecules with specific targets; and maximizing the mixed rewarding function by using a reinforcement learning method to obtain a trained drug molecule generator. The invention solves the problem of too small sample size, prevents the problem of single molecule caused by excessive demonstration study, and can lead the generated molecule to be converged to a pharmacophore structure with a specific target more quickly.

Description

Medicine molecular generator training method based on domain knowledge and deep reinforcement learning

Technical Field

The invention belongs to the field of drug molecule generation, and particularly relates to a training method of a drug molecule generator based on domain knowledge and deep reinforcement learning.

Background

In recent years, the deep learning method has been widely used in computational chemistry, for example, many studies have demonstrated the feasibility of designing new compounds with desirable chemical properties by the deep learning method. Although deep learning methods have good performance in generating compounds of a target chemistry, a large number of samples are typically required during actual operation. However, the difficulty in obtaining a large number of samples is great, resulting in a small number of drugs that have been found. This presents challenges to common deep learning methods. How to accomplish drug design under limited sample conditions is an important issue. Therefore, we propose a reinforcement learning method based on domain knowledge in combination with priori knowledge. The method integrates the domain knowledge into the training process based on the original method, and uses various generalized same features in course learning for the first time.

Disclosure of Invention

In order to solve the problems in the prior art, namely the problem of limited samples during the generation of drug molecules by deep learning, the invention provides a training method of a drug molecule generator based on domain knowledge and deep reinforcement learning, which comprises the following steps

Based on a drug molecule generator, using course learning under a dynamic strategy to randomly insert pharmacophores in an active pharmacophore molecular group into a generation process of the molecule, and generating a pharmacophore structure molecule with a specific target;

calculating a reward value of the obtained pharmacophore structure molecules with specific targets by utilizing a pre-constructed mixed reward function; the mixed rewards function is constructed based on the task rewards function and the imitative rewards function;

maximizing a mixed rewarding function by using a reinforcement learning method to obtain a trained drug molecule generator;

wherein,,

the active pharmacophore molecular group is constructed based on a given drug sample by using domain knowledge;

the character string output by the medicine molecule generator is a grammar structure of a pharmacophore character string SMILES.

In some preferred embodiments, the method of obtaining the group of active pharmacophore molecular structures is:

step S110, extracting a pharmacophore from a given drug sample;

step S120, splitting a pharmacophore into a plurality of atomic groups by utilizing chemical bond connection;

step S130, obtaining atoms or elements with the same properties from domain knowledge;

step S140, replacing a plurality of atomic groups of the pharmacophore molecular structure with atoms or elements from the domain knowledge;

step S150, comparing the chemical activity of the replaced new pharmacophore molecule with that of the original pharmacophore molecule, and selecting the new pharmacophore molecule with the chemical activity difference smaller than a preset value and the original pharmacophore molecule to form a pharmacophore molecule group;

step S160, selecting that if the chemical activity difference between the new pharmacophore molecule and the original pharmacophore molecule is not large after comparison, combining the new pharmacophore molecule and the original pharmacophore molecule to form an active pharmacophore molecule group, and establishing a relation network;

step S170, repeating the steps to expand the active pharmacophore molecular group.

In some preferred embodiments, the drug molecule generator is constructed based on RNN networks, training based on training samples constructed from drug compound string data;

the training samples comprise input samples and output samples; the output sample is one character in the character string of the medicine compound, and the corresponding input sample is the character string of the medicine compound before the character.

In some preferred embodiments, based on the stack-memory mechanism provided in the drug molecule generator, in the process of generating a drug compound string, the existing string sequence is used as a prefix fragment, and the generation of the next character is circularly performed through the drug molecule generator until the complete drug compound string is obtained.

In some preferred embodiments, the "during generation of the random insertion of the pharmacophore into the molecule" is by:

randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning;

and adjusting the probability P of the active pharmacophore molecule based on the rewarding value obtained by the mixed rewarding function, if the generator reaches a preset convergence condition within a preset iteration number threshold, adjusting the probability P downwards according to a preset first adjustment value, otherwise, if the generator does not reach the preset convergence condition within the preset iteration number threshold, adjusting the probability P downwards according to a preset second adjustment value, wherein the second adjustment value is smaller than the first adjustment value.

In some preferred embodiments, the hybrid bonus function is

r(s _t ,a _t )＝rt _ask (s _t ,a _t )+λ*r _imition (s _t ,a _t ),λ∈[0,1]

Wherein r(s) _t ,a _t ) Representing a state s _t Action a _t Is a hybrid bonus function of (2); r is (r) _task (s _t ,a _t ) For rewarding functions for tasks, the generated pharmacophore structure fractionObtaining rewards when the son has activity; r is (r) _imition (s _t ,a _t ) In order to imitate the reward function, the similarity between the generated pharmacophore structure molecule and the active pharmacophore molecule selected as the initial state is obtained when the similarity is larger than a set threshold value.

In some preferred embodiments, the training loss function θ ^* Is that

Wherein,,[]indicating the desire for each round, gamma ^t For rewards at time t, b is the error term, α is the hyper-parameter, H (a _t |s _t ) To be in state s _t Lower motion a _t Is a probability of (2).

In a second aspect of the present invention, a system for training a drug molecule generator based on domain knowledge and deep reinforcement learning is provided, comprising:

a pharmacophore storage unit configured to store active pharmacophore molecular structure groups; the active pharmacophore molecular structure group is constructed based on a given drug sample by using domain knowledge;

a drug molecule generator configured to randomly insert pharmacophores in the active pharmacophore molecular group into a generation process of the molecule based on course learning under a dynamic strategy, to generate a pharmacophore structure molecule having a specific target; the character string output by the medicine molecule generator is a grammar structure of a pharmacophore character string SMILES;

a predictor configured to determine chemical activity of the generated pharmacophore structural molecule;

a mixed rewards calculation unit configured to calculate rewards values of the obtained pharmacophore structure molecules with specific targets based on a pre-constructed mixed rewards function; the hybrid bonus function is constructed based on a task bonus function and an imitative bonus function.

In a third aspect of the invention, an apparatus is presented comprising:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein,,

the memory stores instructions executable by the processor for execution by the processor to implement the domain knowledge and deep reinforcement learning based drug molecule generator training method described above.

In a fourth aspect of the present invention, a computer readable storage medium is provided, wherein the computer readable storage medium stores computer instructions for execution by the computer to implement the above-described domain knowledge and deep reinforcement learning based drug molecule generator training method.

The invention has the beneficial effects that:

(1) The invention realizes the establishment of a specific pharmacophore structure group based on field knowledge, and solves the problem of too small sample size.

(2) And guiding probability P is introduced in course learning under a dynamic strategy, and the probability of using the efficacy fragments in the training process is controlled, so that the problem of single molecule caused by excessive demonstration learning is prevented.

(3) Constructing a hybrid rewarding function, giving an intermediate rewarding to the generation of molecules containing similar active pharmacophore structures, increasing the probability of occurrence of such structures, and enabling the generated molecules to converge more rapidly to a pharmacophore structure with a specific target.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 is a schematic flow diagram of a method for training a drug molecule generator based on domain knowledge and deep reinforcement learning in accordance with one embodiment of the present invention;

FIG. 2 is a schematic diagram of a drug molecule generator training system framework based on domain knowledge and deep reinforcement learning in accordance with one embodiment of the present invention;

FIG. 3 is a schematic diagram of a method for constructing groups of active pharmacophore molecules using domain knowledge in one embodiment of the invention;

FIG. 4 is a schematic diagram of a method for building a relationship network using domain knowledge in one embodiment of the invention.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The invention provides a training method of a drug molecular generator based on domain knowledge and deep reinforcement learning, as shown in figure 1, comprising

wherein,,

In order to more clearly illustrate the present invention, the following detailed description of the various parts of the embodiments of the present invention will be provided with reference to the accompanying drawings. In order to make the description of the technical solution clearer, the following description will be given of the constitution of the training system, and then the training method will be described.

The training system for a drug molecule generator based on domain knowledge and deep reinforcement learning according to the first embodiment of the present invention, as shown in fig. 2, includes a pharmacophore storage unit, a drug molecule generator (abbreviated as generator in fig. 2), a predictor, and a hybrid prize calculation unit (abbreviated as hybrid prize in fig. 2).

1. Pharmacophore storage unit

A pharmacophore storage unit configured to store active pharmacophore molecular structure groups (active pharmacophore groups in fig. 2); the active pharmacophore molecular structure group is constructed based on a given drug sample using domain knowledge.

The method for obtaining the active pharmacophore molecular structure group comprises the following steps:

step S110, extracting a pharmacophore from a given drug sample;

step S170, repeating the steps, and expanding the active pharmacophore molecular group (S110-S160 are respectively adopted for different drug samples of the same type of drug to obtain the pharmacophore molecular group of the type of drug).

2. Drug molecule generator

A drug molecule generator configured to randomly insert pharmacophores in the active pharmacophore molecular group into a generation process of the molecule based on course learning under a dynamic strategy, to generate a pharmacophore structure molecule having a specific target; the character string generated by the medicine molecule generator is the grammar structure of a pharmacophore character string SMILES (Simplified molecular input line entry specification, simplified molecule linear input standard).

The drug molecule generator G is constructed based on an RNN network, and in some embodiments may employ a Stack-RNN model (Stack-based recurrent neural network). Before training the drug molecule generator based on domain knowledge and deep reinforcement learning in the present invention, the molecule generator G needs to be trained to learn the grammar structure of the pharmacophore character string SMILES.

In this embodiment, the CheMBL21 database containing 150 ten thousand drug compound strings is used as a training set to train the drug molecule generator G, so that the trained drug molecule generator G can generate a complete drug compound string according to the input prefix fragment string.

Training a drug molecule generator G by using a CheMBL21 database containing 150 ten thousand drug compound character strings as a training set, firstly constructing a training sample, acquiring the drug compound character strings from the CheMBL21 database, constructing the training sample, and acquiring an input sample and an output sample from the training sample; the output sample is one character (output character) in the pharmaceutical compound character string, and the corresponding input sample is the character string (prefix input) in the pharmaceutical compound character string preceding the character.

The training method comprises the following steps:

the drug molecule generator G obtains probability distribution of the next character based on the input sample;

predicting the next character according to the obtained probability distribution, and taking the next character as a predicted character;

based on the predicted character and the output character corresponding to the input sample, a cross entropy loss function is calculated and the parameters in the drug molecule generator G are updated.

In this embodiment, the next character is predicted according to the obtained probability distribution, and the calculation formula is

h _t ＝σ(W _i x _t +W _h h _t-1 )

Wherein h is _t Is the vector of the hidden layer, h _t-1 Is the vector from the last time step, w _i Is a parameter of the input layer, w _h Is a parameter in the hidden layer, σ is an activation function, x _t Is the input of the current time node.

In this embodiment, the drug molecule generator G introduces a stack memory (stack-memory mechanism) to hold and transfer hidden layer information at each time step, and the calculation formula is

s _t [0]＝a _t [PUSH]σ(Dh _t )+a _t [POP]s _t-1 [1]+a _t [NO-OP]s _t-1 [0]

Wherein D is a matrix of 1×m, a _t ＝[a _t [PUSH]，a _t [POP]，a _t [NO-OP]]Is a vector of stack control variables, S _t [0]An array containing node states at a certain time, S _t-1 [0]Is the initial state of the node at the last time, S _t-1 [1]For the state of the node at the previous time, σ (Dh _t ) Probability of being a state value, a _t [POP]A) an action of the agent (agent) when the state is ejected from the stack _t [PUSH]A, an action of the agent when pushing state from the stack _t [NO-OP]To not act on stack operational state.

If a is _t [POP]Equal to 1, the uppermost element value in the stack is replaced; if a is _t [PUSH]If equal to 1, then add the new value to the top of the stack, the remainder will drop in the stack; if a is _t [NO-OP]Equal to 1, the value in the stack remains unchanged.

The calculation mode similar to the method is applied to element calculation in stacks at different depths i, and the calculation formula is as follows:

thus, conceal layer h _t The following formula can be used for calculation:

wherein D is a matrix of 1×m,is k elements from the top to the bottom of the stack in the t-1 time step, sigma () is and activation function, U is parameter vector, R is rewarding matrix, x _t Is the input of the current time node.

In this embodiment, the trained drug molecule generator G has:

POP operation for deleting the topmost element in the stack, thereby replacing the topmost element;

PUSH operations to PUSH new elements to the top of the stack;

NO-OP operation to hold elements in the stack, without performing any operations.

The method for generating the complete character string of the pharmaceutical compound according to the character string of the inputted prefix fragment comprises the following steps: inputting by using an existing generation sequence (a current generated drug molecule character string) as a prefix; predicting probability distribution of the next character; determining a generated character from the probability distribution; and generating the next character circularly until a complete character string of the pharmaceutical compound is obtained.

3. Predictor(s)

And a predictor configured to determine chemical activity of the generated pharmacophore structural molecule. The judging method is that the generated pharmacophore structure molecule is compared with the original pharmacophore molecule in chemical activity, and the result is judged to have chemical activity when the result is larger than a set threshold value.

4. Mixed rewards calculating unit

The mixed rewards function is

r(s _t ,a _t )＝r _task (s _t ,a _t )+λ*r _imition (s _t ,a _t ),λ∈[0,1]

Wherein r(s) _t ,a _t ) Representing a state s _t Action a _t Is a hybrid bonus function of (2); r is (r) _task (s _t ,a _t ) Obtaining rewards when the generated pharmacophore structure molecules have activity for the task rewarding function; r is (r) _imition (s _t ,a _t ) In order to imitate the reward function, the similarity between the generated pharmacophore structure molecule and the active pharmacophore molecule selected as the initial state is obtained when the similarity is larger than a set threshold value.

(1) The task rewarding function is determined by a specific task, and rewards are obtained when the predictor judges that the generated pharmacophore structural molecule has activity.

(2) And simulating a reward function, and acquiring when the similarity between the generated pharmacophore structural molecule and the active pharmacophore molecule selected as an initial state is larger than a set threshold value.

In mimicking the bonus function, the intersection formula is used:calculating the similarity between molecules, comparing the similarity calculation result with a threshold G, if the similarity calculation result is larger than the threshold, obtaining the reward function, otherwise, not obtaining the reward function; wherein M is ₁ 、M ₂ The specimen molecules which can obtain rewards and the new molecules which are generated at present are respectively.

The expression formula imitating the bonus function is

Wherein,,to represent ratios of common features, M _s Is the structure feature set extracted from the molecule S, m _i Is M _s Is the ith structural feature of (a), ECFP(s) _T ) Is state s _T Is a molar fingerprint of ECFP (m) _i ) For structural feature m _i Is defined by the molar fingerprint of T (ECFP(s) _T ),ECFP(m _i ) For similarity of both molar fingerprints, k1 is a super parameter, initially set to k1=0.25.

The method for training the drug molecular generator based on the domain knowledge and the deep reinforcement learning of the embodiment comprises the following steps of

(1) Based on a drug molecule generator, a pharmacophore in an active pharmacophore molecular group is randomly inserted into a generation process of a molecule by using course learning under a dynamic strategy, so that a pharmacophore structure molecule with a specific target is generated.

Randomly selecting active pharmacophore molecules from the active pharmacophore molecule group according to the probability P of each active pharmacophore molecule as an initial state of reinforcement learning; thus, the drug molecule generator interacts not only with one active pharmacophore molecule, but with a plurality of molecules of the group consisting of the active pharmacophore molecules.

Each active pharmacophore molecule as an initial state is attached with a probability P, and the interaction of the drug molecule generator with the initial state is adjusted by controlling the size of P. In the training process, the probability P is dynamically adjusted according to rewards obtained by the drug molecule generator, namely, the probability P of the active pharmacophore molecule is adjusted based on rewards obtained by the mixed rewarding function, if the generator quickly reaches the rewarding requirement (within a preset iteration number threshold, a preset convergence condition is reached), the probability P is quickly reduced (the probability P is reduced according to a preset first adjustment value), otherwise, if the generator can not quickly reach the rewarding requirement (within the preset iteration number threshold, the probability P is not reduced according to a preset convergence condition), the probability P is slowly reduced (the probability P is reduced according to a preset second adjustment value, and the second adjustment value is smaller than the first adjustment value).

(2) And calculating the reward value of the obtained pharmacophore structure molecule with the specific target by utilizing the pre-constructed mixed reward function.

(3) And maximizing the mixed rewarding function by using a reinforcement learning method to obtain a trained drug molecule generator.

Maximizing a hybrid bonus function θ ^* The formula of (2) is

Wherein τ to p _θ (tau) is the probability corresponding to each action in a round,[]indicating the expectation of each round, gamma, with a certain probability ^t For rewards at time t, b is the error term, α is the hyper-parameter, H (a _t |s _t ) To be in state s _t Lower motion a _t Is a probability of (2).

In order to enrich the diversity of molecules generated by the drug molecule generator, implemented with entropy constraints, the formula that maximizes the hybrid bonus function can be morphed as:

wherein alpha is a hyper-parameter, H (a) _t |s _t ) To be in state s _t Lower motion a _t Is a probability of (2).

FIG. 3 is a schematic diagram of a method for constructing active pharmacophore molecular groups using domain knowledge in one embodiment of the invention, wherein active drug molecular fragments are extracted from existing samples, and atomic groups with similar characteristics are obtained by combining domain knowledge, and are established into atomic group feature groups, and the atomic group feature groups and the pharmacophore groups are utilized to establish a pharmacophore cluster.

FIG. 4 is a schematic diagram of a method for establishing a relational network by using domain knowledge according to an embodiment of the present invention, wherein the diagram includes two pharmacophores (pharmacophore 1 and pharmacophore 2), in which Frag1 and Frag2 are respectively fragment 1 without pharmacophore and fragment 2 without pharmacophore, dFrag1, dFrag2, dFrag3 and dFrag4 are respectively fragments 1, 2, 3 and 4 with common molecular parts, and mF is a complete molecule with undamaged structure.

According to the drug molecule generation method based on the domain knowledge and the deep reinforcement learning, a specific pharmacophore molecule structure is generated based on the trained drug molecule generator obtained by the training system and the training method of the drug molecule generator based on the domain knowledge and the deep reinforcement learning.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the above-described method for training a drug molecule generator based on domain knowledge and deep reinforcement learning, and the drug molecule generating method based on domain knowledge and deep reinforcement learning may refer to the corresponding process in the foregoing system embodiment, which is not repeated herein.

It should be noted that, in the training system for a drug molecular generator based on domain knowledge and deep reinforcement learning provided in the above embodiment, only the division of the above functional modules is illustrated, in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are decomposed or combined again, for example, the modules in the embodiment may be combined into one module, or may be further split into a plurality of sub-modules to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present invention are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present invention.

An apparatus of a fourth embodiment of the present invention includes:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein,,

A computer readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the domain knowledge and deep reinforcement learning based drug molecule generator training method described above.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via a communication portion, and/or installed from a removable medium. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU). It should be noted that the computer readable medium described in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A method for training a drug molecular generator based on domain knowledge and deep reinforcement learning is characterized by comprising the following steps of

the active pharmacophore molecular group is obtained by the following steps:

step S110, extracting a pharmacophore from a given drug sample;

step S170, repeating the steps S110-S160, and expanding the active pharmacophore molecular group;

in the generation process of randomly inserting pharmacophores in active pharmacophore molecular groups into molecules, the method comprises the following steps:

the probability P of the active pharmacophore molecule is adjusted based on a reward value obtained by a mixed reward function, if the generator reaches a preset convergence condition within a preset iteration number threshold, the probability P is adjusted downwards according to a preset first adjustment value, otherwise, if the generator does not reach the preset convergence condition within the preset iteration number threshold, the probability P is adjusted downwards according to a preset second adjustment value, and the second adjustment value is smaller than the first adjustment value;

the hybrid bonus function is:

r(s _t ，a _t )＝r _task (s _t ，a _t )+λ*r _imitiin (s _t ，a _t )，λ∈[0，1]

wherein r(s) _t ，a _t ) Representing a state s _t Action a _t Is a hybrid bonus function of (2); r is (r) _task (s _t ，a _t ) Obtaining rewards when the generated pharmacophore structure molecules have activity for the task rewarding function; r is (r) _imition (s _t ，a _t ) In order to simulate the reward function, the similarity between the generated pharmacophore structure molecule and the active pharmacophore molecule selected as the initial state is obtained when the similarity is larger than a set threshold value;

through the loss function theta ^* Training:

wherein,,indicating the desire for each round, gamma ^t For rewards at time t, b is the error term, α is the hyper-parameter, H (a _t |s _t ) To be in state s _t Lower motion a _t Probability of (2);

wherein,,

2. The domain knowledge and deep reinforcement learning based drug molecule generator training method of claim 1, wherein the drug molecule generator is constructed based on RNN networks and is trained based on training samples constructed from drug compound string data;

3. The training method of a pharmaceutical molecule generator based on domain knowledge and deep reinforcement learning according to claim 2, wherein based on a stack-memory mechanism provided in the pharmaceutical molecule generator, in the process of generating pharmaceutical compound character strings, the existing character string sequence is used as a prefix segment, and the next character generation is circularly performed through the pharmaceutical molecule generator until the complete pharmaceutical compound character string is obtained.

4. A drug molecule generator training system based on domain knowledge and deep reinforcement learning is characterized by comprising

the active pharmacophore molecular group is obtained by the following steps:

step S110, extracting a pharmacophore from a given drug sample;

a mixed rewards calculation unit configured to calculate rewards values of the obtained pharmacophore structure molecules with specific targets based on a pre-constructed mixed rewards function; the mixed rewards function is constructed based on the task rewards function and the imitative rewards function;

the hybrid bonus function is:

through the loss function theta ^* Training:

wherein,,indicating the desire for each round, gamma ^t For rewards at time t, b is the error term, α is the hyper-parameter, H (a _t |s _t ) To be in state s _t Lower motion a _t Is a probability of (2).

5. An apparatus, comprising:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein,,

the memory stores instructions executable by the processor for execution by the processor to implement the domain knowledge and deep reinforcement learning based drug molecule generator training method of any one of claims 1-3.

6. A computer readable storage medium storing computer instructions for execution by the computer to implement the domain knowledge and deep reinforcement learning based drug molecule generator training method of any one of claims 1-3.