WO2014034577A1 - 化合物設計装置、化合物設計方法、及びコンピュータプログラム - Google Patents
化合物設計装置、化合物設計方法、及びコンピュータプログラム Download PDFInfo
- Publication number
- WO2014034577A1 WO2014034577A1 PCT/JP2013/072630 JP2013072630W WO2014034577A1 WO 2014034577 A1 WO2014034577 A1 WO 2014034577A1 JP 2013072630 W JP2013072630 W JP 2013072630W WO 2014034577 A1 WO2014034577 A1 WO 2014034577A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- compound
- information
- protein
- score
- combination
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Definitions
- the present invention relates to a compound design apparatus that designs a compound that interacts with a protein, a compound design method that designs a compound that interacts with a protein using a computer, and a computer program that causes a computer to design a compound that interacts with a protein. .
- Patent Documents 1 to 3 As a method of predicting the interaction with a compound against a protein that is a target for drug discovery, for example, using the three-dimensional structure information of a protein experimentally obtained by NMR or X-ray crystal structure analysis, There is a method of evaluating the binding site of (2) by docking with a compound (see, for example, Patent Documents 1 to 3).
- One of the methods for designing a compound having a novel structure using a computer is de novo design.
- a de ⁇ novo design using a particle swarm optimization method as an optimization method for example, a technique described in Non-Patent Document 1 is known.
- Patent Literature a pattern recognition technology such as a support vector machine
- the conventional prediction of interaction by docking is an interaction prediction of existing compounds, and a new compound design cannot be performed.
- protein three-dimensional structure information is required for prediction, and it takes enormous costs and time to obtain protein three-dimensional structure information.
- Non-Patent Document 1 de novo design is performed based on the structural similarity of the ligand, but the designed compound is actually synthesized, assayed, and the calculation prediction result is tested. There was a problem in reliability regarding accuracy because it was not verified.
- the compound design device of the present invention comprises, for at least one or a plurality of query proteins, input means for inputting protein information corresponding to the proteins; (A) generating one or more compound information; (B) calculating a score indicating the possibility of interaction between the compound corresponding to the compound information and the query protein; (C) updating the compound information so as to increase the possibility of interaction based on the score calculated in step (b) by the optimization method; (D) repeating step (b) and step (c) multiple times; And the score calculated in step (b) is at least a first combination of protein information and compound information corresponding to the first interacting protein and compound It was obtained by machine learning using as a teacher data.
- the machine learning is a support vector machine
- the protein information corresponding to the second interaction protein and the compound and the compound information corresponding to the compound information are stored.
- a separation plane for separating the first combination and the second combination is obtained using the combination of 2 as teacher data, and the score includes compound information to be score-calculated and protein information to be score-calculated This represents the distance from the separation surface of the combination.
- one or a plurality selected from the group consisting of a swarm intelligence optimization method, an evolutionary calculation method, and a particle swarm optimization method is employed as the optimization method.
- the processing means follows step (c), (C1)
- the compound information corresponding to the compound is selected from the compound information approximated to the compound information updated in step (c), and the step of using the compound information as the updated compound information is executed.
- another compound design apparatus of the present invention includes a storage unit, and the storage unit stores the updated compound information as a history, and the processing unit continues to step (c1), (C2) referring to the history stored in the storage means, determining whether the selected compound information is the same as the compound information included in the history; (C3) If it is determined in step (c2) that they are the same, another compound information is selected and step (c2) is executed again. If it is determined in step (c2) that they are not the same, Making the compound information updated compound information; Execute.
- the compound information includes fragment information corresponding to fragments generated by cutting the chemical structure of the compound based on a predetermined rule.
- the predetermined rule is preferably a rule that, when a plurality of cleavage positions exist in the chemical structure of the same compound, fragments are generated by all possible combinations of the plurality of cleavage positions.
- the compound information is a direct sum of vectors existing in a space in which one or more principal components obtained as a result of principal component analysis of a plurality of fragment information are assigned to axes. Represented as:
- the optimization method is a particle group optimization method, the number of constituent units of the designed compound fragment is set, and the position X of the particle representing the compound information is It is expressed as Here, m is the maximum number of elements of a fragment, and n is the number of structural units.
- the particle velocity V is expressed as follows.
- m is the maximum number of elements of a fragment
- n is the number of structural units.
- the score calculated in step (b) is the first combination of protein information and compound information corresponding to the first interacting protein and compound, and teacher data. Obtained by machine learning, score obtained from activity value prediction, score obtained from selectivity prediction, score obtained from docking calculation, score obtained from synthesis possibility prediction, ADME-Tox prediction 1 or more selected from the score obtained from the prediction of physical properties, the score obtained from the prediction of physical properties, and the score obtained from the prediction of binding free energy obtained from the molecular dynamics method.
- the compound design method using the computer of the present invention includes: (A) The step of inputting the query protein information corresponding to the protein for at least one or a plurality of query proteins to the input means provided in the computer; (A) In the processing means provided in the computer, a step of generating one or more pieces of compound information; (C) a step of calculating a score indicating a possibility of interaction between the compound corresponding to the compound information and the query protein in the processing means provided in the computer; (D) In the processing means provided in the computer, the step of updating the compound information so as to increase the possibility of interaction with the optimization method based on the score calculated in the score calculation step (c); The step (c) and step (d) are repeated a plurality of times, and the score calculated in step (c) includes at least protein information corresponding to the first interacting protein and compound. This is obtained by machine learning using the first combination with compound information as teacher data.
- the machine learning is a support vector machine, and in addition to the first combination, the protein information corresponding to the compound and the compound information corresponding to the second interaction protein and the compound information
- a separation plane for separating the first combination and the second combination is obtained using the combination of 2 as teacher data, and the score includes compound information to be score-calculated and protein information to be score-calculated This represents the distance from the separation surface of the combination.
- a computer program for causing a computer of the present invention to design a compound, (I) for one or more query proteins, receiving input of query protein information corresponding to the proteins; (Ii) generating one or more compound information; (Iii) calculating a score indicating the possibility of interaction between the compound corresponding to the compound information and the query protein; (Iv) updating the compound information so as to increase the possibility of interaction based on the score calculated in step (iii) by the optimization method; (V) repeating step (iii) and step (iv) multiple times; And the score calculated in step (iii) is a machine that uses at least a first combination of protein information and compound information corresponding to a protein and a compound that interact with each other as teacher data. It was obtained by learning.
- machine learning is a support vector machine, and in addition to the first combination, the protein information corresponding to the second interaction protein and the compound and the compound information corresponding to the compound information are stored.
- a separation plane for separating the first combination and the second combination is obtained using the combination of 2 as teacher data, and the score includes compound information to be score-calculated and protein information to be score-calculated This represents the distance from the separation surface of the combination.
- the three-dimensional structure information of the protein is not required, based on information that can be easily obtained, such as information such as protein name and amino acid sequence, and fragmented compound, that is, fragment structure information, A new compound structure can be obtained. Further, the calculation can be performed in a short time, and it was revealed from the verification experiment described in the Examples that the resulting compound interacts with the target protein with a high probability.
- FIG. 4 is a plot diagram of predicted activity values and actual measurement values obtained by the QSAR model in Example 3. It is the figure which showed the result of the compound design by this invention.
- the present invention is a compound design apparatus, a compound design method using a computer, and a computer program for causing a computer to design a compound.
- the computer program of the present invention is executed by a computer, the computer functions as a compound design device, and a compound can be designed by the compound design method of the present invention.
- the compound design apparatus of the present invention includes at least an input unit and a processing unit. Furthermore, you may provide a memory
- Input means In the input means, for one or a plurality of query proteins, query protein information corresponding to the proteins is input, and the compound design apparatus of the present invention receives these information.
- a target protein is used as an inquiry protein, and protein information corresponding to the protein is input from the input means, whereby a compound that interacts with the protein is designed.
- the design of the compound is performed by updating the compound information stored in the storage means by an optimization method. From the compound information, it is presumed that the compound corresponding to the compound information updated by the optimization method is a compound having a high possibility of interacting with the query protein.
- Protein information is information representing the characteristics of a protein, and specifically includes protein name, amino acid sequence, three-dimensional structure, and the like. Protein information is expressed as a protein descriptor. In addition, protein information is vectorized as a multidimensional feature vector, and a relative difference between two or more proteins is represented as a similarity index such as a distance between vectors.
- an amino acid sequence is preferably used. For example, according to a known spectrum method, an amino acid sequence can be decomposed into an amino acid sequence having a fixed length k, and the number of frequencies of an amino acid sequence pattern having a length k allowed up to m mismatches can be used as a descriptor.
- the protein information input from the input means is preferably simple information.
- the conversion to the protein descriptor can be executed by the processing means as one of the steps.
- the protein name or amino acid sequence corresponding to the query protein is input from the input means, and the processing means generates a protein descriptor from the protein name or amino acid sequence corresponding to the query protein.
- the compound information which consists of 1 or a some compound may be input in an input means, and the compound design apparatus of this invention may receive such information.
- the mother nucleus structure of a compound that is expected to interact with the query protein is known in advance, the accuracy of the prediction can be improved by inputting information on the structure as query compound information.
- a novel compound can be designed by inputting information on the structure as inquiry compound information. The compound information will be described later.
- the processing means includes a step (a) of generating one or more pieces of compound information, and a step of calculating a score indicating the possibility of interaction between the compound corresponding to the compound information and the query protein ( b), step (c) for updating the compound information so as to increase the possibility of interaction, based on the score calculated in step (b) by the optimization method, step (b) and step (c) ) Is repeated a plurality of times (d).
- step (a) at least one or more pieces of compound information are generated.
- another protein information based on the protein information corresponding to the query protein input from the input means may be generated.
- compound information is input from the input means, other compound information based on the compound information may be generated.
- the compound information is information representing the characteristics of the compound, and specifically represents the compound name, chemical structure, physical properties, and the like.
- Compound information is expressed as a compound descriptor or chemical descriptor.
- the compound information is converted into a multidimensional feature vector, and the relative difference between two or more compounds is expressed as a similarity index such as a distance between vectors.
- the compound information is composed of information relating to fragments obtained by fragmenting the chemical structure of the compound.
- the fragment of the compound may be obtained by a known method such as Retrosynthetic Combinatorial Analysis Procedure (RECAP) rule, or may be obtained by fragmentation based on an original rule, It may be obtained by random fragmentation.
- the RECAP rule is a method of setting a cutting position based on a chemical reaction on the chemical structure of a compound. When fragment information obtained by this method is used, there is an advantage that it becomes difficult to design a compound that cannot be chemically synthesized.
- the known method uses fragment information of fragments fragmented at all the cleavage positions.
- the present inventors have found that a larger number of fragments to be obtained is preferable in the compound design in the present invention. Therefore, the present inventors increase the types of fragments by generating fragments by all possible combinations of a plurality of cleavage positions when there are a plurality of cleavage positions on the chemical structure of the same compound. I found.
- a method of setting a cleavage position based on a chemical reaction on the chemical structure of a compound, such as a RECAP rule, and when there are a plurality of cleavage positions on the chemical structure of the same compound in the technique It is preferable to combine the method of generating a fragment by all possible combinations of a plurality of cutting positions. When fragment information obtained by this method is used, synthesis is not difficult and various compounds can be designed.
- compound fragments are classified into mother nucleus fragments (sometimes referred to as parent fragments) and substituent fragments.
- the mother nucleus fragment and the substituent fragment can be linked based on an arbitrary rule. For example, a chemically reasonable bond and bond pattern may be set for each fragment, and the fragments may be linked to each other according to the bond and bond pattern.
- a frame that is a combination of a structural unit that is, a unit and a structural unit will be described in detail with reference to FIG.
- the design of a compound is expressed as a frame in which fragments are combined as structural units (hereinafter sometimes referred to as units).
- a frame is composed of one or a plurality of units, and a fragment is assigned to a unit having the same joining effort.
- the number of units constituting the frame and the topology of the units can be freely set.
- the topology that can be set is determined by the number of units. For example, when the number of units is two or three, each topology is set to one, but when the number of units is four, the number of topologies that can be set is two.
- the compound design device of the present invention by setting the number of units of the compound to be designed, the fragment corresponding to each unit is updated in the frame based on the topology that can be designed by the number of units.
- the compound information is limited to the compound having the central skeleton, and only the substituent fragment is updated, thereby improving the accuracy of the compound design. it can.
- a novel compound can be designed by updating only the substituent fragment.
- the compound information can be represented by a numerical expression such as a continuous type or a discrete type.
- the continuous type is also referred to as a continuous vector expression, and is expressed as a direct sum of vectors existing in a space in which one or more principal components obtained as a result of principal component analysis of a plurality of fragment information are assigned to axes. it can.
- the discrete type is also called a discrete type matrix expression, and is expressed as a matrix using a score corresponding to the frequency of use of fragments.
- compound information can be expressed as a direct sum of vectors existing in a space in which one or more principal components obtained as a result of principal component analysis of a plurality of fragment information are assigned to axes. Such a case is preferable because the dimension of the vector representing the compound information is reduced.
- fragment compound descriptors are calculated, principal component analysis is performed on the calculated fragment compound descriptor groups, and several principal components are extracted from the obtained principal components in descending order of contribution.
- a vector x representing, and a direct sum of these vectors is a vector indicating compound information.
- the number of extracted main components is preferably 3 to 10 in consideration of calculation efficiency.
- the updated compound information since the vector representing the compound information exists discretely in the space, the updated compound information has a feature that does not correspond to the actual compound. Therefore, as a discrete expression, compound information can also be represented by a matrix that is directly associated with fragments.
- the position X of the particle representing the compound information is represented by the following mathematical formula.
- Each element of the row example X indicates a fragment selection state, where 0 is not selected and 1 is selected. Since one fragment is selected for each constituent unit, the column vector of the matrix X becomes a unit vector, and is expressed by the following formula.
- protein information corresponding to the query protein input from the input means may be converted into protein information of another format.
- a protein name corresponding to the query protein is input from the input means, and the processing means generates amino acid sequence information from the protein name corresponding to the query protein, and further, based on the amino acid sequence information, a protein descriptor. Is generated.
- the processing means executes a step (b) of calculating a score indicating the possibility of interaction between the compound corresponding to the compound information and the query protein.
- the score indicating the possibility of interaction calculated in step (b) is at least the first combination of protein information and compound information corresponding to the first interacting protein and compound. It was obtained by machine learning as teacher data.
- machine learning using the first combination as teacher data for example, support vector regression (Support Vector Regression, SVR), Partial Least Squares (PLS) regression, or the like can be used.
- the score indicating the possibility of the interaction calculated in step (b) is the second of the protein information and the compound information corresponding to the second interacting protein and compound. May be obtained by machine learning using a combination of As machine learning using the first combination and the second combination as teacher data, for example, a support vector machine (Support Vector Vector Machine, SVM) or the like can be used.
- SVM Support Vector Machine
- first pair refers to, for example, a pair of protein and compound that are known to interact.
- second pair is, for example, a pair of a protein and a compound that is not known to interact, or a pair of a random protein and a compound.
- the first pair is a positive example
- the second pair is a negative example
- information obtained from a document such as a paper or a database may be used, or information obtained by experimental verification may be used.
- a learning model is constructed by analyzing the first pair, the first pair, and the second pair using a machine learning method such as a support vector machine. Using the learning model, it can be determined whether the pair of the query compound and the query protein belongs to the first pair or the second pair. In addition, the possibility can be expressed by a score.
- a support vector machine is a kind of machine learning.
- a space constructed by feature vectors is called a feature space.
- the support vector machine uses a kernel function to map a vector to a finite-dimensional or infinite-dimensional feature space, and a learning model is constructed by performing linear separation on the feature space.
- a separation surface that separates a plurality of vectors with a maximum margin is obtained, and is divided into two classes based on the separation surface. Therefore, it can be determined to which class the vector inquired by the separation plane belongs.
- the protein descriptor including the protein information corresponding to the protein in the first pair and the compound descriptor including the compound information corresponding to the compound in the first combination are combined. This is called the first combination.
- a protein descriptor including protein information corresponding to the protein in the second pair and a compound descriptor including compound information corresponding to the compound in the second pair are combined. This is called the second combination.
- a hyperplane for classifying the first combination and the second combination is obtained by calculating the kernels of the first combination and the second combination using these combinations and performing a support vector machine.
- the kernel method can be used as an effective means for integrating the compound vector and the protein vector. Specifically, defined as follows using the kernel K c of the compound, and a kernel K p of the protein compounds vector X (x) a protein vector Y (y).
- the interaction relationship between a compound and a protein may be quantified using a synthesis method using a tensor product kernel that is particularly known to be effective.
- a feature vector combining a protein descriptor and a compound descriptor is defined by the following expression.
- a kernel of a combination of a protein descriptor and a compound descriptor can be defined as follows.
- the distance of the inquired vector from the separation plane can be an indicator of the ease of the first interaction. In other words, even if the vector is classified into a certain class, it is highly likely that the vector is close to the separation plane, and the vector that is far from the separation plane is likely to be classified incorrectly. Low. That is, the possibility that the combination of the protein and the compound corresponding to the inquired vector has the first interaction is represented by the distance from the separation plane of the vector. That is, in step (b), the distance from the separation plane of the query vector is calculated as a score indicating the possibility of interaction.
- Score indicating the likelihood of interaction is a value s c to the decision function values x of support vector machines converted by a sigmoid function.
- ⁇ and ⁇ are parameters determined based on the score distribution obtained from the cross-validation by the support vector machine.
- the parameters ⁇ and ⁇ are determined so as to minimize the following function F ( ⁇ , ⁇ ).
- i is an index of learning data.
- yi represents the presence or absence of interaction. If it is predicted that there is an interaction, that is, the first combination, y i becomes plus one. On the other hand, when it is predicted that there is no interaction, that is, the second combination, y i becomes minus one.
- the score calculated in step (b) includes at least a first combination of protein information and compound information corresponding to the first interacting protein and compound as teacher data. Obtained by machine learning, score obtained from activity value prediction, score obtained from selectivity prediction, score obtained from docking calculation, score obtained from synthesis possibility prediction, ADME-Tox prediction 1 or more selected from the score obtained from the prediction of physical properties, the score obtained from the prediction of physical properties, and the score obtained from the prediction of binding free energy obtained from the molecular dynamics method.
- the combination method of the score obtained by machine learning using the first combination of the protein information and the compound information corresponding to the first interacting protein and the compound as teacher data and another score is not limited.
- a score obtained by machine learning a score obtained from activity value prediction, a score obtained from selectivity prediction, a score obtained from docking calculation, a score obtained from synthesis possibility prediction, ADME- Combination by multiplying or adding one or more selected from the score obtained from Tox prediction, the score obtained from physical property prediction, and the score obtained from binding free energy prediction obtained from molecular dynamics method It is done.
- Activity value prediction is the prediction of the concentration of a compound that causes biological activity.
- concentration of the compound causing the biological activity is C
- the score obtained from the activity value prediction is expressed as minus logC.
- a specific example of the activity value prediction is quantitative structure activity relationship.
- Quantitative Structure-Activity Relationship is a quantitative method that uses a statistical method such as regression analysis to analyze the correlation between the molecular structure characteristics of a compound and biological activity. This is a technique for obtaining a simple correlation equation.
- the score calculated in step (b) is a score obtained by machine learning using at least a first combination of protein information and compound information corresponding to the first interacting protein and compound as teacher data. (Hereinafter referred to as “s c ”) and a score obtained from the activity value prediction (hereinafter referred to as “s q ”) will be described below.
- the evaluation function s is defined by a two-variable function of s c and s q .
- s q is a value calculated by the QSAR model and quantitatively predicts the strength of biological activity (minus logC, which is a value obtained from the concentration C of the compound that causes the biological activity of interest), It is defined as follows.
- the evaluation function s is represented by multiplication of s c and s q as shown in the following equation.
- the weighting coefficient w is set to 1 or an arbitrary number.
- step (c) the compound information is updated by the optimization method so as to increase the evaluation function s.
- the larger the numerical value of the evaluation function s the higher the possibility of interaction with the query protein and the stronger the biological activity.
- Selectivity prediction is a technique for predicting the selectivity of binding.
- a compound that selectively binds to a specific subtype and a plurality of subtypes There are compounds that bind non-selectively.
- a learning model is constructed by analyzing these pieces of compound information using a machine learning method such as a support vector machine. With the learning model, it is possible to predict whether or not the compound corresponding to the compound information selectively binds to the query protein.
- a compound that selectively binds to a protein to be predicted is defined as a first compound.
- a compound that binds non-selectively to the protein to be predicted, that is, binds to a protein other than the protein to be predicted is defined as the second compound.
- the score calculated in step (b) is a score obtained by machine learning using at least a first combination of protein information and compound information corresponding to the first interacting protein and compound as teacher data. (Hereinafter referred to as “s c ”) and a score obtained from the selectivity prediction (hereinafter referred to as “s s ”) will be described below.
- the evaluation function s is defined by a two-variable function of s c and s q .
- the evaluation function s is represented by multiplication of s s and s c as in the following equation.
- the weighting coefficient w is set to 1 or an arbitrary number.
- step (c) the compound information is updated by the optimization method so that the possibility of interaction with the query protein is increased.
- the larger the numerical value of the evaluation function s the higher the possibility of interaction with the query protein and the higher the selectivity for the query protein.
- ADME-Tox is an abbreviation consisting of absorption, distribution, metabolism, excretion, and toxicity, and is predicted by calculating pharmacokinetics and toxicity in the living body as scores.
- step (c) the compound information is updated by the optimization method so that the possibility of interaction with the query protein is increased with the score calculated in step (b) as a reference.
- an optimization method an evolutionary algorithm or swarm intelligence (SI) can be used.
- particle swarm optimization method particle Swarm Optimization is preferably used.
- the particle swarm optimization method (Particle Swarm Optimization, PSO) is an optimization method that efficiently and comprehensively searches for a position where a particle swarm having a position and velocity in a multidimensional search space corresponds to an optimum solution.
- PSO particle Swarm Optimization
- the mathematical formula of the particle swarm optimization method is generally expressed by the following mathematical formula.
- the compound information is defined as particle i.
- a score indicating the possibility of interaction between the compound corresponding to the position of the particle i and the query protein is calculated in step (b).
- the score and the particle i are According to the optimal score obtained so far (best solution found for particle i) and the best score obtained so far for all particles (best solution found for all particles), the position of particle i (particle i ) And the velocity (the velocity vector of particle i) are updated.
- step (d) step (b) and step (c) are repeated. That is, for the particle i updated in step (c), a score indicating the possibility of interaction between the compound corresponding to the position of the updated particle i and the query protein is calculated. The position and velocity of the particle i are updated according to the optimal score and the optimal score among all particles. By repeating step (b) and step (c) multiple times, the particle i is updated in position and velocity so that the score indicating the possibility of interaction becomes high, and finally corresponds to the optimal solution. Reach position.
- the compound generated by the compound information updated in step (c) (hereinafter referred to as vector X) is generated by combining fragments.
- vector Y the compound information corresponding to the compound
- the vector Y which is compound information corresponding to the compound, is scattered discontinuously on the compound space, it may not match the vector X vector Y.
- one that approximates the vector X updated in step (c) is selected from a plurality of vectors Y, the compound corresponding to the compound information, and the query protein input from the input means A score indicating the possibility of interaction with is calculated.
- the compound that approximates the updated compound information is the compound information that most closely approximates the updated compound information among the compound information corresponding to the compound.
- another processing means of the present invention selects compound information corresponding to a compound from compound information approximated to the compound information updated in step (c) following step (c), and the compound Executing step (c1), wherein the information is updated compound information
- step (c1) compound information that approximates the updated compound information is selected from the compound information corresponding to the compound.
- the updated compound information is selected from the compound information corresponding to the compound. Is to select the compound information closest to
- the compound design apparatus of the present invention includes a storage means to be described later.
- the storage means stores the updated compound information as a history, and the processing means follows step (c1). , Referring to the history stored in the storage means, and determining whether the selected compound information is the same as the compound information included in the history in step (c2) and step (c2) If it is determined, another compound information is selected and step (c2) is executed again. If it is determined that the information is not the same in step (c2), the compound information is updated as the compound information ( c3).
- step (c1) the compound information corresponding to the compound is selected from the compound information approximated to the updated compound information, and the selected compound information is the same as the compound information selected in the past.
- the history stored in the storage means so as to select compound information that is not the same as compound information selected in the past, it is determined whether or not they are the same.
- Select compound information Of the compound information corresponding to the compound, the compound information closest to the updated compound information is selected, and when it is determined to be the same, the other compound information selected is the next to the compound information determined to be the same. Approximate compound information.
- the approximate compound information is a position close to the updated position vector, and a similarity index such as a distance is used for the calculation.
- the similarity index includes Euclidean distance, Mahalanobis distance, tanimoto coefficient, and the like, but Euclidean distance can be preferably used.
- the compound information is expressed as the example X using the discrete expression
- the general formula of the particle optimization method described above cannot be applied.
- the velocity V of the particle X is expressed by the following mathematical formula.
- the step of updating compound information when the compound information is represented as a matrix X will be specifically described.
- the position X 0 of each particle is selected by randomly selecting fragments.
- the initial velocity V 0 is also set to random.
- V pbest is defined by a velocity matrix corresponding to the best solution found by each particle
- V gbest is defined by a velocity matrix corresponding to the best solution for all particles.
- V t + 1 is updated according to the following formula.
- w is an inertia constant
- r 1 and r 2 are uniform random numbers from 0 to 1
- c 1 and c 2 are constants representing the size at which particles are attracted to the best solution.
- X t + 1 is updated stochastically using roulette selection, ranking selection, tournament selection, elite selection, and the like.
- a selection method it is preferable to use roulette selection.
- the selection probability Pr (X t ) of the position X t based on the Boltzmann distribution is expressed by the following equation.
- T (> 0) is a constant that determines the degree of probability called a temperature parameter.
- T ⁇ it is updated to X that maximizes V.
- the compound corresponding to the updated position of the particle i may be the same as the compound selected in the past. Therefore, referring to the history, it is determined whether or not it is the same as the compound selected in the past, and if it is determined to be the same, it approximates the position of the updated particle i and corresponds to another compound. Let the position be the position of a new particle i.
- the particles may converge to the local solution at an early stage, and in such a case, the global optimal solution may be missed. There is a problem. Therefore, it is preferable to adjust the behavior of the particles so that they do not converge to the local solution, and to take measures to escape from the converged state.
- the particles used in the particle swarm optimization method are divided. Specifically, the particle group is divided into a plurality of groups, and one group includes only adjacent particles. Information exchange between particles is performed only within the same group, and a good solution of gbest between adjacent groups is rewritten.
- a second layer for performing a wide-area search is set, and when the solution of the second layer is good, Set to rewrite the solution of one layer.
- the widest diversity is maintained by initializing gbest of the second hierarchy at an arbitrary cycle.
- step (e) executes step (b) and step (c) when the number of executions of step (b) and step (c) is less than the prescribed number of times, and step (b) and step (c) When the number of executions of step (c) reaches the specified number, it may be a step of ending the process.
- a prescribed score value indicating the possibility of interaction may be determined, and step (b) and step (c) may be repeated until the score reaches the prescribed value. That is, in step (e), when the score indicating the possibility of interaction is less than the specified value, steps (b) and (c) are executed, and the score indicating the possibility of interaction is When the specified value is reached, this may be a step of ending the processing.
- Step (e) may be performed before step (b) or may be performed before step (c).
- step (c1), step (c2), or step (c3) are repeatedly executed following step (c). That is, step (d) includes steps (b), (c), and (c1) that are repeated a plurality of times, and steps (b), (c), (c1), and (c2) are a plurality of steps. Or a step that repeats the steps (b) and (c), step (c1), step (c2), and step (c3) a plurality of times.
- Storage means The compound design device of the present invention may comprise a storage means.
- the storage means stores at least a learning model obtained by machine learning using teacher information as a first combination of protein information and compound information corresponding to the first interacting protein and compound.
- the processing means accesses the storage means in which the learning model is stored, and calculates a score.
- the storage means includes an activity value prediction model, a selectivity prediction model, a docking calculation model, a synthesis possibility prediction model, an ADME-Tox prediction model, a score obtained from physical property prediction, a molecule
- a prediction model of a chemical property of a compound such as a kinetic method model may be stored.
- the processing means accesses the storage means in which the prediction model is stored, and calculates a score.
- the storage means stores the compound information selected in step (b1) as a history.
- step (d1) the history stored in the storage means is referred to, and it is determined whether the selected compound information is the same as the compound information included in the history.
- step (d3) If it is determined in step (d2) that they are the same, another compound information is selected and step (d2) is executed again. If it is determined in step (d2) that the information is not the same, Is the updated compound information, and in step (d4), the compound information selected in step (d3) is stored in the storage means as a history.
- the storage means may store a database consisting of fragment information constituting the compound information.
- the processing means can access the fragment database stored in the storage means and generate one or a plurality of compound information from the fragment information contained in the library.
- Output means The compound design device of the present invention may include an output means.
- the output means outputs the compound information determined to have the highest possibility of interacting with the query protein by the processing means or the chemical structure corresponding to the compound information.
- the compound output by the output means may be a compound having a new chemical structure as well as a compound having a known chemical structure.
- the present invention also provides a compound design method using a computer and a computer program for causing a computer to design a compound.
- FIG. 2 shows an embodiment and a flowchart of the compound design apparatus 1 of the present invention.
- the compound design apparatus 1 includes an input unit 2, a processing unit 3, a storage unit 4, and an output unit 5.
- FIG. 3 is a flowchart of the processing executed by the processing means 3 of the compound design device 1 of the present invention, and the relationship between each flow and the storage means 4.
- Protein information corresponding to the query protein is input to the input unit 2 of the compound design apparatus 1. Further, the processing means 3 generates another protein information corresponding to the query protein based on the input protein information.
- the protein name of the query protein is input from the input means 2, and the processing means 3 searches the amino acid sequence corresponding to the protein name with reference to the protein database stored in the storage means 4.
- the amino acid sequence is decomposed into amino acid sequences of fixed length k, and the number of frequencies of amino acid sequence patterns of length k allowing up to m mismatches is generated as a protein descriptor.
- the storage unit 4 stores a fragment database.
- the fragment database stores fragment descriptors and chemical structures obtained by fragmenting the chemical structures of known compounds at the cutting positions according to the RECAP rule. For chemical structures having a plurality of cutting positions, fragment descriptors and chemical structures obtained from all possible combinations of the plurality of cutting positions are stored.
- the processing means 3 randomly generates a compound descriptor from the fragment descriptor and chemical structure stored in the fragment database based on the prescribed frame.
- the processing means 3 calculates a score indicating the possibility of interaction between the compound corresponding to the generated compound information and the query protein.
- the storage unit 4 stores an interaction learning model.
- the interaction learning model is obtained by machine learning using at least a first interaction protein and compound, that is, a combination of a protein and a compound known to interact as positive examples as teacher data. is there,
- the interaction learning model is a feature vector that combines the descriptors of proteins and compounds that are known to interact as positive examples.
- a separation plane that separates positive and negative examples is constructed on the feature space by a support vector machine, and a score indicating the possibility of interaction based on the distance from the separation plane Calculated.
- the processing unit 3 refers to the interaction learning model stored in the storage unit 4, calculates the descriptor of the compound corresponding to the generated plurality of compound information, and separates the feature vector combined with the query protein descriptor A score indicating the possibility of interaction is calculated based on the distance from the surface.
- the possibility of interaction is higher if it is far from the separation plane.
- the feature vector is classified as the negative example side, it is separated if it is within the negative example side. The closer to the surface, the more likely the interaction is predicted.
- the processing means 3 confirms whether or not the update of the compound information has reached the specified number of times. If the number of updates of the compound information is less than the specified number, the compound information is updated by an optimization method based on a score indicating the possibility of interaction.
- the compound information is updated by an optimization method.
- the particle swarm optimization method is adopted as the optimization method.
- the position and velocity are updated by the particle swarm optimization method based on a score indicating the possibility of each interaction.
- the updated compound information is recorded in the compound information update history of the storage unit 4 and processed so as not to newly select the same compound as the compound selected in the past.
- compound information is expressed as a direct sum of vectors existing in a space in which one or a plurality of principal components obtained as a result of principal component analysis of a plurality of fragment information is assigned to an axis.
- the flow after confirmation of reaching the specified number of times will be described in detail.
- the processing unit 3 updates the compound information by an optimization method.
- the space in which the compound information exists is continuous, whereas the compound information having the corresponding compound exists discretely in the space, so the compound does not correspond to the updated compound information.
- the processing means 3 selects compound information corresponding to the compound from the compound information that approximates the updated compound information.
- the processing means 3 refers to the update history of the compound information stored in the storage means 4 and confirms whether or not it is the same as the compound information in the update history. If they are the same, the process returns to the compound information selection step to select another compound information. If they are not the same, the compound information is recorded in the update history of the compound information.
- FIG. 4 is another aspect of the compound design apparatus of this invention.
- the storage unit 4 stores a model indicating chemical characteristics of a compound such as an activity value prediction model and a selectivity prediction model.
- the processing means 3 calculates a score of the chemical characteristics of the compound corresponding to the compound information with reference to the model stored in the storage means 4 (b *). Further, with reference to a score (b **) that combines the score and a score indicating the possibility of interaction, the compound information is updated by an optimization method in the following steps.
- this invention provides the method of designing a compound by performing the above-mentioned process using a computer.
- the compound design method using a computer of the present invention is as follows.
- D In the processing means provided in the computer, the step of updating the compound information so as to increase the possibility of interaction with the optimization method based on the score calculated in the score calculation step (c);
- Including Step (c) and step (d) are repeated a plurality of times, and the score calculated in step (c) includes at least protein information and compound information corresponding to the first interacting protein and compound.
- the machine learning is a support vector machine, and in addition to the first combination, the protein and the compound information corresponding to the second interacting protein and the compound, A separation plane for separating the first combination and the second combination is obtained, and the score includes compound information to be score-calculated and a protein to be score-calculated It represents the distance from the separation surface in combination with information.
- one or more selected from the group consisting of a swarm intelligence optimization method, an evolutionary calculation method, and a particle swarm optimization method is employed.
- step (d), (D1) including a step of selecting compound information corresponding to the compound from the compound information approximated to the compound information updated in step (D) and using the compound information as updated compound information.
- the storage means provided in the computer stores the updated compound information as a history
- the processing means provided in the computer continues to step (d), (D2) referring to the history stored in the storage means and determining whether the selected compound information is the same as the compound information included in the history; (D3) If it is determined in step (D2) that they are the same, another compound information is selected and step (D2) is executed again. In step (D2), it is determined that they are not the same. And the step of making the compound information updated compound information.
- the compound information is composed of fragment information corresponding to fragments generated by cleaving the chemical structure of the compound based on a predetermined rule.
- the predetermined rule is that, when a plurality of cleavage positions are present in the chemical structure of the same compound, it is preferable that fragments are generated by combinations of a plurality of cleavage positions.
- the compound information is a vector that exists in a space in which one or more principal components obtained as a result of the principal component analysis of a plurality of fragment information are assigned to axes. Expressed as a direct sum.
- the particle group optimization method is adopted as the optimization method, the number of constituent units of the designed compound fragment is set, and the position X of the particle representing the compound information Is represented by the following mathematical formula.
- m is the maximum number of elements of a fragment
- n is the number of structural units.
- the particle velocity V is expressed by the following equation.
- m is the maximum number of elements of a fragment
- n is the number of structural units.
- the present invention also provides a program that causes a computer to execute processing related to compound design by the above-described method.
- the computer executes the program, the computer functions as a compound design apparatus.
- a computer program for causing a computer to design a compound according to the present invention, (I) for one or more query proteins, receiving input of query protein information corresponding to the proteins; (Ii) generating one or more compound information; (Iii) calculating a score indicating the possibility of interaction between the compound corresponding to the compound information and the query protein; (Iv) updating the compound information so as to increase the possibility of interaction based on the score calculated in step (iii) by the optimization method; (V) repeating step (iii) and step (iv) multiple times; And the score calculated in step (iii) is a machine that uses at least a first combination of protein information and compound information corresponding to a protein and a compound that interact with each other as teacher data. It is a compound design computer program obtained by learning.
- the machine learning is a support vector machine, and in addition to the first combination, the protein that interacts with the second interaction, the protein information corresponding to the compound, and the compound information A separation plane for separating the first combination and the second combination is obtained, and the score includes compound information to be score-calculated and a protein to be score-calculated It represents the distance from the separation surface in combination with information.
- one or more selected from the group consisting of a swarm intelligence optimization method, an evolutionary calculation method, and a particle swarm optimization method are employed as the optimization method.
- step (Iv) including selecting compound information corresponding to the compound from the compound information approximated to the compound information updated in step (iv) and using the compound information as updated compound information.
- the storage means provided in the computer stores the updated compound information as a history
- the processing means provided in the computer continues to step (iv-1), (Iv-2) referring to the history stored in the storage means, determining whether the selected compound information is the same as the compound information included in the history; (Iv-3) If it is determined in step (iv-2) that they are the same, another compound information is selected and step (iv-2) is executed again, and the same in step (iv-2) If it is determined that it is not, the step of making the compound information updated compound information is included.
- the compound information is composed of fragment information corresponding to fragments generated by cleaving the chemical structure of the compound based on a predetermined rule.
- the predetermined rule is that, when a plurality of cleavage positions exist in the chemical structure of the same compound, it is preferable that a fragment is generated by a possible combination of the plurality of cleavage positions.
- the compound information is a vector that exists in a space in which one or more principal components obtained as a result of the principal component analysis of a plurality of fragment information are assigned to axes. Expressed as a direct sum.
- the particle group optimization method is adopted as the optimization method, the number of structural units of the designed compound fragment is set, and the position X of the particle representing the compound information Is represented by the following mathematical formula.
- m is the maximum number of elements of a fragment
- n is the number of structural units.
- the particle velocity V is expressed by the following equation.
- m is the maximum number of elements of a fragment
- n is the number of structural units.
- Example 1 Cross validation was performed using 4,700 Cyclin-Dependent Kinase 2 (CDK2) known active compounds. Of 4,700 compounds, 600 were used as learning data and used to construct an interaction learning model. The active compound descriptors were calculated using the DRAGON6 program, and the target protein descriptors were calculated by the spectral method.
- CDK2 Cyclin-Dependent Kinase 2
- DRAGON6 ver.6.0.30 (Talete srl) was used to calculate descriptors related to the structure and physical properties of the compounds. Specifically, block 1-2 (Constitutional descriptors and Ring descriptors), block 4-5 (Walk and path counts and Connectivity indices), block 8 (2D autocorrelations), block 10-11 (P_VSA-like descriptors and ETA indices), block 22-24 (Atom-centred A total of 894 descriptors including fragments, Atom-type E-state indices, and CATS 2D) and block 28 (Molecular properties) were calculated.
- a feature vector was constructed by combining descriptors of each interacting pair, and an interaction learning model was constructed using the LIBSVM program as a support vector machine.
- fragment information is combined to form compound information.
- Fragment generation based on known RECAP rules and (2) Fragment generation based on all possible combinations of multiple cleavage positions when there are multiple cleavage positions on the chemical structure of the same compound in the RECAP rule And a fragment and a combination of the fragments were obtained as follows.
- a compound is expressed as a combination of fragments. Therefore, it was found that the method (2) can obtain 56 times as many compounds as the method (1). When compound information was generated by the method (2), the possibility of designing a compound with higher accuracy than the method (1) was shown.
- the processing means is DRAGON6 as in the above-described method.
- Fragment descriptors were calculated using ver.6.0.30 (Talete srl) to construct a fragment database.
- the query protein was entered as Cyclin-Dependent Kinase 2 (CDK2).
- CDK2 Cyclin-Dependent Kinase 2
- the processing means searched for the amino acid sequence of CDK2, and based on this, calculated the protein descriptor of CDK2 by the spectrum method.
- principal component analysis is performed on all the fragment descriptors, three principal components are extracted in descending order of contribution ratio, and fragments corresponding to frames are selected at random, and a vector representing compound information was generated. 990 vectors representing compound information were generated.
- the distance from the separation plane of the feature vector combining each descriptor of chemical substances corresponding to 990 compound information and the protein descriptor of CDK2 is calculated as a score, and particle swarm optimization
- the position and velocity of the vector representing the compound information were updated by the method.
- the prescribed number of updates was 5,000.
- Example 2 Using the present invention, an antagonist was designed using ⁇ 2 adrenergic receptor ( ⁇ 2AR) as a query protein.
- ⁇ 2AR ⁇ 2 adrenergic receptor
- the assay experiment of the designed compound (R1: A to H, R2: 1 to 13) was conducted to examine whether the compound designed in the present invention interacts with ⁇ 2AR which is the query protein.
- the results are shown in FIG. Among the compounds that were assayed, when the hit threshold was set to less than 30 ⁇ M, the hit rate was as high as 38%. When the hit threshold was set to less than 150 ⁇ M, an even higher hit rate of 74% was obtained.
- Example 3 As another aspect of the present invention, a score obtained by machine learning using the first combination of protein information and compound information corresponding to the first interacting protein and compound in step (b) as teacher data; The compound was designed based on the score obtained by multiplying the score obtained from the activity value prediction. A compound database targeting CDK2 and V1b was used.
- a QSAR model was constructed as a model for activity value prediction.
- the linear ⁇ -SVR (Support Vector Regression) method was used to construct the QSAR model.
- the calculation parameters were set so that the cross-validation (5-fold) value was maximized.
- Table 2 shows the calculation conditions and results for CDK2 and V1b.
- FIG. 7 is a plot of the calculated value (predicted activity value) and the actually measured value obtained by the constructed QSAR model. The compounds plotted closer to the straight line indicate that the predicted activity value and the actually measured value are closer.
- the query protein CDK2 or V1b was input, the descriptors of each interacting pair were combined to construct a feature vector, and an interaction learning model was constructed using the LIBSVM program as a support vector machine.
- the evaluation function s is calculated as follows in step (b).
- s q is the predicted activity value obtained by the QSAR model
- s c is a feature that combines each descriptor of the chemical substance corresponding to the compound information with the protein descriptor of the query protein in the interaction learning model
- the distance from the vector separation plane is calculated as a score.
- the weighting factor w was set to 1.
- the processing means updates the position and velocity of the vector representing the compound information by the particle swarm optimization method so as to maximize the evaluation function s.
- the number of particles was 128, and the specified number of updates was 10,000.
- F-measure F value was adopted as an index for performance evaluation.
- Recall Prediction rate
- Precision Precision rate
- F value is a single index value for the recall rate and the precision rate that are in a trade-off relationship.
- Each evaluation value is defined by the following mathematical formula.
- the recall represents the proportion of known ligands that are correctly determined to be positive by calculation.
- the relevance rate represents the proportion of known ligands contained in a compound predicted to be positive by calculation.
- the F value is defined as the harmonic average of the precision and recall, and as the values of both the precision and recall become larger, the F value becomes higher while approaching 1.
- tp, fn, fp, and tn represent the number of compounds corresponding to the logical bonds (TP, FN, FP, TN) in Table 4.
- Table 4 is a 2 ⁇ 2 contingency table showing the relationship between experimental results and calculation results regarding biological activity.
- Example 4 Furthermore, for the purpose of designing a novel compound having ⁇ 2 adrenergic receptor as a target molecule and selective activity for other adrenergic receptors, in step (b), a protein and a compound that have a first interaction are formed. A compound was designed based on a score obtained by multiplying a score obtained by machine learning using the first combination of corresponding protein information and compound information as teacher data and a score obtained from selectivity prediction. .
- Adrenergic receptors are classified into three types, ⁇ 1, ⁇ 2, and ⁇ , and three further subtypes ( ⁇ 1A, ⁇ 1B, ⁇ 1D, ⁇ 2A, ⁇ 2B, ⁇ 2C, ⁇ 1, ⁇ 2, and ⁇ 3).
- the number of known ligands for each subtype is shown in Table 5. These data are derived from commercially available compound databases, public databases (ChEMBL, etc.) and databases collected by the inventors themselves from papers and patents. In any case, a compound showing a target activity having an IC50 value of 30 ⁇ M or less in an assay experiment was defined as a known ligand.
- the processing means calculates the evaluation function s in step (b) as follows.
- s s is the selectivity probability value obtained by the selectivity prediction model
- s c is the interaction learning model
- each descriptor of the chemical substance corresponding to the compound information and the protein descriptor of the query protein The distance from the separation plane of the combined feature vector is calculated as a score.
- the weighting factor w was set to 1.
- the position and velocity of the vector representing the compound information are updated by the particle swarm optimization method so as to maximize the evaluation function s.
- the number of particles was 128, and the specified number of updates was 10,000.
- the F-measure (F value) was adopted as an index for performance evaluation.
- Table 6 shows the results of verification targeting the ⁇ 2 adrenergic receptor.
- Example 4 The result of the method of Example 4 was compared with the result of the same method of Example 1.
- the method of Example 4 which combined the selectivity prediction model and the interaction learning model had higher compound design performance than other methods. This suggests that generation of false positive compounds can be suppressed and a compound having high selectivity can be designed more efficiently.
- the selectivity prediction model is incorporated into the evaluation function of the optimization method and used in combination with the interaction prediction model, so that the real-time structure optimization considering the selectivity is effective.
Abstract
Description
(a)1又は複数の化合物情報を生成するステップと、
(b)化合物情報に対応する化合物と、問い合わせタンパク質との相互作用の可能性を示すスコアを算出するステップと、
(c)最適化手法により、ステップ(b)で算出されたスコアを基準として、相互作用の可能性が高まるように化合物情報を更新するステップと、
(d)ステップ(b)とステップ(c)とを複数回繰り返すステップと、
を実行する処理手段と、を備え、さらに、ステップ(b)で算出されるスコアは、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである。
(c1)ステップ(c)において更新された化合物情報に近似する化合物情報の中から、化合物に対応する化合物情報を選択し、該化合物情報を、更新された化合物情報とするステップ、を実行する。
(c2)記憶手段に記憶された履歴を参照し、選択された化合物情報が、履歴に含まれる化合物情報と同一であるかどうかを判断するステップと、
(c3)ステップ(c2)において、同一であると判断された場合に、別の化合物情報を選択して再度ステップ(c2)を実行し、ステップ(c2)において同一でないと判断された場合に、該化合物情報を更新された化合物情報とするステップと、
を実行する。
(ア) コンピュータの備える入力手段に、少なくとも、1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報が入力されるステップと、
(イ)コンピュータの備える処理手段において、1又は複数の化合物情報が生成されるステップと、
(ウ)コンピュータの備える処理手段において、化合物情報に対応する化合物と、問い合わせタンパク質との相互作用の可能性を示すスコアが算出されるステップと、
(エ)コンピュータの備える処理手段において、最適化手法により、スコア算出ステップ(ウ)で算出されたスコアを基準として、相互作用の可能性が高まるように化合物情報を更新するステップと、
を含み、ステップ(ウ)とステップ(エ)とを複数回繰返し、さらに、ステップ(ウ)で算出されるスコアは、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである。
(i)1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報の入力を受け付けるステップと、
(ii)1又は複数の化合物情報を生成するステップと、
(iii)化合物情報に対応する化合物と、問い合わせタンパク質との相互作用の可能性を示すスコアを算出するステップと、
(iv)最適化手法により、ステップ(iii)で算出されたスコアを基準として、相互作用の可能性が高まるように化合物情報を更新するステップと、
(v)ステップ(iii)とステップ(iv)とを複数回繰り返すステップと、
を実行させ、さらに、ステップ(iii)で算出されるスコアは、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである。
入力手段では、1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報が入力され、本発明の化合物設計装置がこれらの情報を受け付ける。
処理手段は、1又は複数の化合物情報を生成するステップ(a)と、化合物情報に対応する化合物と、問い合わせタンパク質との相互作用の可能性を示すスコアを算出するステップ(b)と、最適化手法により、ステップ(b)で算出されたスコアを基準として、相互作用の可能性が高まるように化合物情報を更新するステップ(c)と、ステップ(b)とステップ(c)とを複数回繰り返すステップ(d)と、を実行する。
Optimization)を用いることが好ましい。
また、本発明の化合物設計装置は記憶手段を備えていてもよい。記憶手段には、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られた学習モデルが記憶される。処理手段は、ステップ(b)で、学習モデルが記憶された記憶手段にアクセスし、スコアを算出する。
また、本発明の化合物設計装置は出力手段を備えていてもよい。出力手段は、処理手段により、問い合わせタンパク質と相互作用する可能性が最も高いと判断された化合物情報又は、該化合物情報に対応する化学構造を出力する。ここで、出力手段が出力する化合物は、既知の化学構造を有する化合物だけでなく、新規の化学構造を有する化合物である場合がある。
(ア) コンピュータの備える入力手段に、少なくとも、1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報が入力されるステップと、
(イ)コンピュータの備える処理手段において、1又は複数の化合物情報が生成されるステップと、
(ウ)コンピュータの備える処理手段において、化合物情報に対応する化合物と、問い合わせタンパク質との相互作用の可能性を示すスコアが算出されるステップと、
(エ)コンピュータの備える処理手段において、最適化手法により、スコア算出ステップ(ウ)で算出されたスコアを基準として、相互作用の可能性が高まるように化合物情報を更新するステップと、
を含み、
ステップ(ウ)とステップ(エ)とを複数回繰返し、さらに、ステップ(ウ)で算出されるスコアは、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである化合物設計方法である。
(エ1)ステップ(エ)において更新された化合物情報に近似する化合物情報の中から、化合物に対応する化合物情報を選択し、該化合物情報を、更新された化合物情報とするステップ、を含む。
(エ2)記憶手段に記憶された履歴を参照し、選択された化合物情報が、履歴に含まれる化合物情報と同一であるかどうかを判断するステップと、
(エ3)ステップ(エ2)において、同一であると判断された場合に、別の化合物情報を選択して再度ステップ(エ2)を実行し、ステップ(エ2)において同一でないと判断された場合に、該化合物情報を更新された化合物情報とするステップと、を含む。
(i)1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報の入力を受け付けるステップと、
(ii)1又は複数の化合物情報を生成するステップと、
(iii)化合物情報に対応する化合物と、問い合わせタンパク質との相互作用の可能性を示すスコアを算出するステップと、
(iv)最適化手法により、ステップ(iii)で算出されたスコアを基準として、相互作用の可能性が高まるように化合物情報を更新するステップと、
(v)ステップ(iii)とステップ(iv)とを複数回繰り返すステップと、
を実行させ、さらに、ステップ(iii)で算出されるスコアは、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである、化合物設計コンピュータプログラムである。
(iv-1)ステップ(iv)において更新された化合物情報に近似する化合物情報の中から、化合物に対応する化合物情報を選択し、該化合物情報を、更新された化合物情報とするステップ、を含む。
(iv-2)記憶手段に記憶された履歴を参照し、選択された化合物情報が、履歴に含まれる化合物情報と同一であるかどうかを判断するステップと、
(iv-3)ステップ(iv-2)において、同一であると判断された場合に、別の化合物情報を選択して再度ステップ(iv-2)を実行し、ステップ(iv-2)において同一でないと判断された場合に、該化合物情報を更新された化合物情報とするステップと、を含む。
4,700個のCyclin-Dependent Kinase 2(CDK2)の既知活性化合物を用いて、クロスバリデーションを行った。4,700個の化合物のうち、600個を学習用データとして、相互作用学習モデルの構築に用いた。DRAGON6プログラムを用いて活性化合物の記述子を計算し、さらに、その標的タンパク質の記述子をスペクトラム法により計算した。
descriptors)、ブロック4-5(Walk and path countsおよびConnectivity indices)、ブロック8(2D
autocorrelations)、ブロック10-11(P_VSA-like descriptorsおよびETA indices)、ブロック22-24(Atom-centred
fragments、Atom-type E-state indices、及びCATS 2D)、ブロック28(Molecular properties)の計894種類の記述子を計算した。
ver.6.0.30(Talete srl)を用いてこれらのフラグメントの記述子を計算し、フラグメント・データベースを構成した。
本発明を用いて、β2アドレナリン受容体(β2AR)を問い合わせタンパク質として、そのアンタゴニストの設計を行った。設計される化合物のフレームとしては、3つの構成単位が直列したものを選択した。ただし、中心の構成単位は母核として固定し、両端の置換フラグメント(R1及びR2)のみを更新した。
別の本発明として、ステップ(b)で第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたスコアと、活性値予測から得られたスコアとを乗算したスコアを基準とした化合物の設計を行った。CDK2及びV1bをターゲットとする化合物データベースを用いた。
さらに、β2アドレナリン受容体を標的分子とし、その他のアドレナリン受容体に対して選択的な活性を有する新規化合物の設計を目的として、ステップ(b)で第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたスコアと、選択性予測から得られたスコアとを乗算したスコアを基準とした化合物の設計を行った。
2 入力手段
3 処理手段
4 記憶手段
5 出力手段
Claims (14)
- 少なくとも、1又は複数の問い合わせタンパク質について、該タンパク質に対応するタンパク質情報が入力される入力手段と、
(a)1又は複数の化合物情報を生成するステップと、
(b)前記化合物情報に対応する化合物と、前記問い合わせタンパク質との相互作用の可能性を示すスコアを算出するステップと、
(c)最適化手法により、ステップ(b)で算出された前記スコアを基準として、前記相互作用の可能性が高まるように前記化合物情報を更新するステップと、
(d)前記ステップ(b)と前記ステップ(c)とを複数回繰り返すステップと、
を実行する処理手段と、を備え、
さらに、前記ステップ(b)で算出されるスコアは、少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである、
化合物設計装置 - 前記機械学習が、サポートベクターマシンであり、
前記第1の組み合わせに加えて、
第2の相互作用をするタンパク質と化合物に対応するタンパク質情報と化合物情報との第2の組み合わせを教師データとし、
前記第1の組み合わせと前記第2の組み合わせとを分離する分離面が求められ、
さらに、前記スコアは、スコア算出の対象となる化合物情報とスコア算出の対象となるタンパク質情報との組み合わせの、前記分離面からの距離を表したものである、
請求項1に記載の化合物設計装置 - 前記最適化手法が、群知能最適化手法、進化的計算手法、及び粒子群最適化手法からなる群より選択される一又は複数である、
請求項1又は2に記載の化合物設計装置 - 前記処理手段が、前記ステップ(c)に続いて、
(c1)前記ステップ(c)において更新された化合物情報に近似する化合物情報の中から、化合物に対応する化合物情報を選択し、該化合物情報を、更新された化合物情報とするステップ、を実行する
請求項1~3いずれか一項に記載の化合物設計装置 - 記憶手段を備え、
該記憶手段は、前記更新された化合物情報を履歴として記憶し、
前記処理手段が、前記ステップ(c1)に続いて、
(c2)前記記憶手段に記憶された前記履歴を参照し、選択された化合物情報が、前記履歴に含まれる化合物情報と同一であるかどうかを判断するステップと、
(c3)前記ステップ(c2)において、同一であると判断された場合に、別の化合物情報を選択して再度ステップ(c2)を実行し、前記ステップ(c2)において同一でないと判断された場合に、該化合物情報を更新された化合物情報とするステップと、
を実行する、
請求項4に記載の化合物設計装置 - 前記化合物情報が、化合物の化学構造を所定のルールに基づいて切断して生成されるフラグメントに対応するフラグメント情報から構成される、
請求項1~5いずれか一項に記載の化合物設計装置 - 前記所定のルールが、同一の化合物の化学構造に複数の切断位置が存在する場合に、前記複数の切断位置の取りうる組み合わせによりフラグメントが生成されるルールである、
請求項6に記載の化合物設計装置 - 前記化合物情報が、複数のフラグメント情報の主成分分析の結果として得られた1又は複数の主成分を軸に割り当てた空間に存在するベクトルの直和として表される、
請求項6又は7に記載の化合物設計装置 - 前記ステップ(b)で算出されるスコアが、
第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたスコアと、活性値予測から得られたスコア、選択性予測から得られたスコア、ドッキング計算から得られたスコア、合成可能性予測から得られたスコア、ADME-Tox予測から得られたスコア、物性予測から得られたスコア、及び分子動力学法から得られた結合自由エネルギー予測から得られたスコアから選択される1又は複数とを組み合わせたものである、
化合物設計装置 - コンピュータを用いた化合物設計方法であって、
(ア) 前記コンピュータの備える入力手段に、少なくとも、1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報が入力されるステップと、
(イ)1又は複数の化合物情報が生成されるステップと、
(ウ)前記化合物情報に対応する化合物と、前記問い合わせタンパク質との相互作用の可能性を示すスコアが算出されるステップと、
(エ) 最適化手法により、前記ステップ(ウ)で算出されたスコアを基準として、前記相互作用の可能性が高まるように前記化合物情報を更新するステップと、
を含み、
前記ステップ(ウ)と前記ステップ(エ)とを複数回繰返し、
さらに、前記ステップ(ウ)で算出されるスコアは、
少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである、化合物設計方法 - 前記機械学習が、サポートベクターマシンであり、
前記第1の組み合わせに加えて、
第2の相互作用をするタンパク質と化合物に対応するタンパク質情報と化合物情報との第2の組み合わせを教師データとし、
前記第1の組み合わせと前記第2の組み合わせとを分離する分離面が求められ、
さらに、前記スコアは、スコア算出の対象となる化合物情報とスコア算出の対象となるタンパク質情報との組み合わせの、前記分離面からの距離を表したものである、
請求項11に記載の化合物設計方法 - コンピュータに化合物を設計させるコンピュータプログラムであって、
前記コンピュータに、
(i)1又は複数の問い合わせタンパク質について、該タンパク質に対応する問い合わせタンパク質情報の入力を受け付けるステップと、
(ii)1又は複数の化合物情報を生成するステップと、
(iii)前記化合物情報に対応する化合物と、前記問い合わせタンパク質との相互作用の可能性を示すスコアを算出するステップと、
(iv)最適化手法により、前記ステップ(iii)で算出された前記スコアを基準として、前記相互作用の可能性が高まるように前記化合物情報を更新するステップと、
(v)前記ステップ(iii)と前記ステップ(iv)とを複数回繰り返すステップと、
を実行させ、さらに
前記ステップ(iii)で算出されるスコアは、
少なくとも、第1の相互作用をするタンパク質と化合物とに対応するタンパク質情報と化合物情報との第1の組み合わせを教師データとした機械学習により得られたものである、化合物設計コンピュータプログラム - 前記機械学習が、サポートベクターマシンであり、
前記第1の組み合わせに加えて、
第2の相互作用をするタンパク質と化合物に対応するタンパク質情報と化合物情報との第2の組み合わせを教師データとし、
前記第1の組み合わせと前記第2の組み合わせとを分離する分離面が求められ、
さらに、前記スコアは、スコア算出の対象となる化合物情報とスコア算出の対象となるタンパク質情報との組み合わせの、前記分離面からの距離を表したものである、
請求項13に記載の化合物設計コンピュータプログラム
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/424,701 US20150310162A1 (en) | 2012-08-27 | 2013-08-24 | Compound Design Device, Compound Design Method, And Computer Program |
EP13832325.8A EP2889791A4 (en) | 2012-08-27 | 2013-08-24 | DEVICE FOR DESIGNING A CONNECTION PROCESS FOR THE DESIGN OF A CONNECTION AND COMPUTER PROGRAM |
JP2014532989A JP5946045B2 (ja) | 2012-08-27 | 2013-08-24 | 化合物設計装置、化合物設計方法、及びコンピュータプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2012-186072 | 2012-08-27 | ||
JP2012186072 | 2012-08-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014034577A1 true WO2014034577A1 (ja) | 2014-03-06 |
Family
ID=50183390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2013/072630 WO2014034577A1 (ja) | 2012-08-27 | 2013-08-24 | 化合物設計装置、化合物設計方法、及びコンピュータプログラム |
Country Status (4)
Country | Link |
---|---|
US (1) | US20150310162A1 (ja) |
EP (1) | EP2889791A4 (ja) |
JP (1) | JP5946045B2 (ja) |
WO (1) | WO2014034577A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020054841A1 (ja) * | 2018-09-14 | 2020-03-19 | 富士フイルム株式会社 | 化合物探索方法、化合物探索プログラム、記録媒体、及び化合物探索装置 |
WO2020166486A1 (ja) * | 2019-02-12 | 2020-08-20 | Jsr株式会社 | データ処理方法、データ処理装置及びデータ処理システム |
WO2020213417A1 (ja) * | 2019-04-16 | 2020-10-22 | 富士フイルム株式会社 | 特徴量算出方法、特徴量算出プログラム、特徴量算出装置、スクリーニング方法、スクリーニングプログラム、及び化合物創出方法 |
WO2021033695A1 (ja) * | 2019-08-19 | 2021-02-25 | Jsr株式会社 | 化学構造発生装置、化学構造発生プログラム、及び化学構造発生方法 |
JP2021121927A (ja) * | 2015-12-02 | 2021-08-26 | 株式会社Preferred Networks | 薬物設計のための生成機械学習システム |
WO2021251413A1 (ja) * | 2020-06-09 | 2021-12-16 | 株式会社 Preferred Networks | 推定装置、推定方法、化学構造式及びプログラム |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11144832B2 (en) * | 2017-11-03 | 2021-10-12 | Cognizant Technology Solutions India Pvt. Ltd | System and method for determining optimal solution in a swarm of solutions using swarm intelligence |
US10847254B2 (en) * | 2017-12-05 | 2020-11-24 | Toyota Research Institute, Inc. | Artificial intelligence based stable materials discovery process |
CN111819441B (zh) | 2018-03-09 | 2022-08-09 | 昭和电工株式会社 | 聚合物的物理性质预测装置、存储介质、及聚合物的物理性质预测方法 |
JP7109339B2 (ja) | 2018-11-02 | 2022-07-29 | 昭和電工株式会社 | ポリマー設計装置、プログラム、および方法 |
CN109935278B (zh) * | 2019-02-28 | 2023-04-07 | 深圳晶泰科技有限公司 | 晶体结构碰撞快速检测方法 |
US10515715B1 (en) * | 2019-06-25 | 2019-12-24 | Colgate-Palmolive Company | Systems and methods for evaluating compositions |
CN110610742B (zh) * | 2019-09-20 | 2023-12-19 | 福建工程学院 | 一种基于蛋白质互作网络的功能模块检测方法 |
CN116157680A (zh) | 2020-09-30 | 2023-05-23 | 富士胶片株式会社 | 特征量计算方法、筛选方法及化合物创建方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007139037A1 (ja) | 2006-05-26 | 2007-12-06 | Kyoto University | ケミカルゲノム情報に基づく、タンパク質-化合物相互作用の予測と化合物ライブラリーの合理的設計 |
JP2008081435A (ja) | 2006-09-27 | 2008-04-10 | Nec Corp | 化合物の仮想スクリーニング方法および装置 |
WO2008053924A1 (fr) | 2006-10-31 | 2008-05-08 | Keio University | Procédé de classement de paires de protéines/composés |
JP2008217594A (ja) | 2007-03-06 | 2008-09-18 | Nec Corp | 化合物のスクリーニング方法及びそのスクリーニングシステム。 |
JP2009007302A (ja) | 2007-06-28 | 2009-01-15 | Nec Corp | 仮想スクリーニング方法及び装置 |
-
2013
- 2013-08-24 JP JP2014532989A patent/JP5946045B2/ja not_active Expired - Fee Related
- 2013-08-24 US US14/424,701 patent/US20150310162A1/en not_active Abandoned
- 2013-08-24 EP EP13832325.8A patent/EP2889791A4/en not_active Withdrawn
- 2013-08-24 WO PCT/JP2013/072630 patent/WO2014034577A1/ja active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007139037A1 (ja) | 2006-05-26 | 2007-12-06 | Kyoto University | ケミカルゲノム情報に基づく、タンパク質-化合物相互作用の予測と化合物ライブラリーの合理的設計 |
JP2008081435A (ja) | 2006-09-27 | 2008-04-10 | Nec Corp | 化合物の仮想スクリーニング方法および装置 |
WO2008053924A1 (fr) | 2006-10-31 | 2008-05-08 | Keio University | Procédé de classement de paires de protéines/composés |
JP2008217594A (ja) | 2007-03-06 | 2008-09-18 | Nec Corp | 化合物のスクリーニング方法及びそのスクリーニングシステム。 |
JP2009007302A (ja) | 2007-06-28 | 2009-01-15 | Nec Corp | 仮想スクリーニング方法及び装置 |
Non-Patent Citations (5)
Title |
---|
FABIAN DEY ET AL.: "Fragment-Based de Novo Ligand Design by Multiobjective Evolutionary Optimization", J. CHEM. INF. MODEL, vol. 48, no. 3, 29 February 2008 (2008-02-29), pages 679 - 690, XP008094137 * |
HARTENFELLER, M.; SCHNEIDER G. ET AL.: "Concept of combinatorial de novo design of drug-like molecules by particle swarm optimization", CHEMICALBIOLOGY & DRUG DESIGN, vol. 72, 2008, pages 16 - 26 |
HASSEN MOHAMMED ALSAFI ET AL.: "Rational Drug Design using Genetic Algorithm : Case of Malaria Disease", JOURNAL OF EMERGING TRENDS IN COMPUTING AND INFORMATION SCIENCES, vol. 3, no. 7, July 2012 (2012-07-01), pages 1093 - 1102, XP055188611 * |
IQBAL MUDASSAR ET AL.: "Protein Interaction Inference Using Particle Swarm Optimization Algorithm", LECT NOTES COMPUT SCI, vol. 4973, 2008, pages 61 - 70, XP019087791 * |
See also references of EP2889791A4 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2021121927A (ja) * | 2015-12-02 | 2021-08-26 | 株式会社Preferred Networks | 薬物設計のための生成機械学習システム |
US11900225B2 (en) | 2015-12-02 | 2024-02-13 | Preferred Networks, Inc. | Generating information regarding chemical compound based on latent representation |
JP7247258B2 (ja) | 2015-12-02 | 2023-03-28 | 株式会社Preferred Networks | コンピュータシステム、方法及びプログラム |
JPWO2020054841A1 (ja) * | 2018-09-14 | 2021-08-30 | 富士フイルム株式会社 | 化合物探索方法、化合物探索プログラム、記録媒体、及び化合物探索装置 |
WO2020054841A1 (ja) * | 2018-09-14 | 2020-03-19 | 富士フイルム株式会社 | 化合物探索方法、化合物探索プログラム、記録媒体、及び化合物探索装置 |
JP7116186B2 (ja) | 2018-09-14 | 2022-08-09 | 富士フイルム株式会社 | 化合物探索方法、化合物探索プログラム、記録媒体、及び化合物探索装置 |
JP7351317B2 (ja) | 2019-02-12 | 2023-09-27 | Jsr株式会社 | データ処理方法、データ処理装置及びデータ処理システム |
WO2020166486A1 (ja) * | 2019-02-12 | 2020-08-20 | Jsr株式会社 | データ処理方法、データ処理装置及びデータ処理システム |
JPWO2020213417A1 (ja) * | 2019-04-16 | 2020-10-22 | ||
WO2020213417A1 (ja) * | 2019-04-16 | 2020-10-22 | 富士フイルム株式会社 | 特徴量算出方法、特徴量算出プログラム、特徴量算出装置、スクリーニング方法、スクリーニングプログラム、及び化合物創出方法 |
JP7297057B2 (ja) | 2019-04-16 | 2023-06-23 | 富士フイルム株式会社 | 特徴量算出方法、特徴量算出プログラム、特徴量算出装置、スクリーニング方法、スクリーニングプログラム、及び化合物創出方法 |
WO2021033695A1 (ja) * | 2019-08-19 | 2021-02-25 | Jsr株式会社 | 化学構造発生装置、化学構造発生プログラム、及び化学構造発生方法 |
WO2021251413A1 (ja) * | 2020-06-09 | 2021-12-16 | 株式会社 Preferred Networks | 推定装置、推定方法、化学構造式及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
US20150310162A1 (en) | 2015-10-29 |
EP2889791A4 (en) | 2016-04-13 |
JPWO2014034577A1 (ja) | 2016-08-08 |
EP2889791A1 (en) | 2015-07-01 |
JP5946045B2 (ja) | 2016-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5946045B2 (ja) | 化合物設計装置、化合物設計方法、及びコンピュータプログラム | |
Sousa et al. | Generative deep learning for targeted compound design | |
Khan | Descriptors and their selection methods in QSAR analysis: paradigm for drug design | |
Baylon et al. | Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification | |
Polishchuk | Interpretation of quantitative structure–activity relationship models: past, present, and future | |
Segler et al. | Generating focused molecule libraries for drug discovery with recurrent neural networks | |
Subramanian et al. | Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches | |
Maggiora et al. | Molecular similarity in medicinal chemistry: miniperspective | |
Handley et al. | Potential energy surfaces fitted by artificial neural networks | |
Wang et al. | Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast | |
Ma et al. | Evaluating polymer representations via quantifying structure–property relationships | |
Amabilino et al. | Guidelines for recurrent neural network transfer learning-based molecular generation of focused libraries | |
Alvarsson et al. | Ligand-based target prediction with signature fingerprints | |
Shen et al. | Accuracy or novelty: what can we gain from target-specific machine-learning-based scoring functions in virtual screening? | |
Hinselmann et al. | Large-scale learning of structure− activity relationships using a linear support vector machine and problem-specific metrics | |
Abdo et al. | Prediction of new bioactive molecules using a bayesian belief network | |
Oliveira et al. | When machine learning meets molecular synthesis | |
Kaushik et al. | Ligand-based approach for in-silico drug designing | |
Niu et al. | Quantitative prediction of drug side effects based on drug-related features | |
Bhavani et al. | Substructure-based support vector machine classifiers for prediction of adverse effects in diverse classes of drugs | |
Tan et al. | A multitask approach to learn molecular properties | |
Shilpa et al. | Recent Applications of Machine Learning in Molecular Property and Chemical Reaction Outcome Predictions | |
Simmons et al. | Practical outcomes of applying ensemble machine learning classifiers to High-Throughput Screening (HTS) data analysis and screening | |
Labjar et al. | QSAR Anti-HIV Feature Selection and Prediction for Drug Discovery Using Genetic Algorithm and Machine Learning Algorithms | |
Gan et al. | Investigation of the use of spectral clustering for the analysis of molecular data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13832325 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2013832325 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013832325 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2014532989 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14424701 Country of ref document: US |