WO2023033027A1 - 化合物の安全性予測装置、化合物の安全性予測プログラム及び化合物の安全性予測方法 - Google Patents
化合物の安全性予測装置、化合物の安全性予測プログラム及び化合物の安全性予測方法 Download PDFInfo
- Publication number
- WO2023033027A1 WO2023033027A1 PCT/JP2022/032725 JP2022032725W WO2023033027A1 WO 2023033027 A1 WO2023033027 A1 WO 2023033027A1 JP 2022032725 W JP2022032725 W JP 2022032725W WO 2023033027 A1 WO2023033027 A1 WO 2023033027A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- prediction
- safety
- molecule
- safety evaluation
- unit
- Prior art date
Links
- 150000001875 compounds Chemical class 0.000 title claims abstract description 123
- 238000000034 method Methods 0.000 title claims description 48
- 238000011156 evaluation Methods 0.000 claims abstract description 306
- 238000004364 calculation method Methods 0.000 claims description 29
- 238000012795 verification Methods 0.000 claims description 20
- 230000000704 physical effect Effects 0.000 claims description 13
- 238000003077 quantum chemistry computational method Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 230000013016 learning Effects 0.000 description 52
- 230000010354 integration Effects 0.000 description 22
- 239000000126 substance Substances 0.000 description 16
- 238000010586 diagram Methods 0.000 description 13
- 231100000419 toxicity Toxicity 0.000 description 13
- 230000001988 toxicity Effects 0.000 description 13
- 238000012790 confirmation Methods 0.000 description 11
- 238000012360 testing method Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 241000512668 Eunectes Species 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000007059 acute toxicity Effects 0.000 description 5
- 231100000403 acute toxicity Toxicity 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 239000003814 drug Substances 0.000 description 5
- 230000010365 information processing Effects 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 230000007886 mutagenicity Effects 0.000 description 4
- 231100000299 mutagenicity Toxicity 0.000 description 4
- 230000036961 partial effect Effects 0.000 description 4
- 241000251468 Actinopterygii Species 0.000 description 3
- 241000238424 Crustacea Species 0.000 description 3
- 230000005791 algae growth Effects 0.000 description 3
- 231100000693 bioaccumulation Toxicity 0.000 description 3
- 239000002537 cosmetic Substances 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 230000002401 inhibitory effect Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000002085 persistent effect Effects 0.000 description 3
- KBPLFHHGFOOTCA-UHFFFAOYSA-N 1-Octanol Chemical compound CCCCCCCCO KBPLFHHGFOOTCA-UHFFFAOYSA-N 0.000 description 2
- 231100000111 LD50 Toxicity 0.000 description 2
- 206010070834 Sensitisation Diseases 0.000 description 2
- 206010043275 Teratogenicity Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 230000007794 irritation Effects 0.000 description 2
- 231100000191 repeated dose toxicity Toxicity 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 230000008313 sensitization Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 231100000211 teratogenicity Toxicity 0.000 description 2
- 238000010953 Ames test Methods 0.000 description 1
- 231100000039 Ames test Toxicity 0.000 description 1
- 208000031404 Chromosome Aberrations Diseases 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 238000006065 biodegradation reaction Methods 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 231100000005 chromosome aberration Toxicity 0.000 description 1
- 230000007665 chronic toxicity Effects 0.000 description 1
- 231100000160 chronic toxicity Toxicity 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 239000008406 cosmetic ingredient Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000009931 harmful effect Effects 0.000 description 1
- 238000004770 highest occupied molecular orbital Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 238000004768 lowest unoccupied molecular orbital Methods 0.000 description 1
- 231100000298 lowest-observed-adverse-effect level Toxicity 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 231100000062 no-observed-adverse-effect level Toxicity 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 231100000279 safety data Toxicity 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000005556 structure-activity relationship Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/30—Prediction of properties of chemical compounds, compositions or mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/40—Searching chemical structures or physicochemical data
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Definitions
- the present invention relates to a safety prediction device, a safety prediction program, and a safety prediction method for compounds.
- a compound safety prediction device for predicting the safety of a compound for example, means for learning and analyzing descriptors effective for specific evaluation of cosmetic materials from among descriptors calculated using information on cosmetic materials and means for searching for an evaluation model that is effective for a specific evaluation using the analyzed descriptors and obtaining predictive values for irritation, sensitization, or repeated dose toxicity of cosmetic ingredients. has been proposed (see, for example, Patent Document 1).
- Patent Document 1 is limited to predicting the irritation, sensitization, or repeated dose toxicity of cosmetic materials, so depending on the type of compound, such as a new compound that is different from the conventional one, the safety of the compound can be evaluated with high accuracy. There was a problem that there was a high possibility of being unpredictable.
- Patent Document 2 requires calculating the degree of similarity for all drug molecules registered in the database and referring to the safety data of similar molecules. However, there is a problem that the user's convenience is low.
- An object of one aspect of the present invention is to provide a compound safety prediction device capable of performing a highly accurate safety evaluation of a compound while enhancing user convenience.
- the present invention has the following configurations.
- an input unit for inputting structural formulas of one or more molecules a safety prediction unit that predicts the safety evaluation of the molecule and calculates the confidence of the prediction; a similar molecule data search unit that acquires safety evaluation data of similar molecules that are similar to the molecule; an output unit that outputs a prediction result of the safety evaluation of the molecule, the certainty of the prediction, and the safety evaluation data of the similar molecule;
- a compound safety prediction device comprising: [2] When the confidence of the prediction is high, the output unit outputs a message regarding the prediction result of the safety evaluation of the molecule and the confidence of the prediction, Safety of the compound according to [1], wherein when the confidence of the prediction is low, a message regarding the prediction result of the safety evaluation of the molecule, the confidence of the prediction, and the safety evaluation data of the similar molecule is output.
- the compound safety prediction device comprising a verification unit that determines [4] when the confidence of the prediction is high, the output unit outputs a message regarding the prediction result of the safety evaluation of the molecule and the confidence of the prediction; Safety of the compound according to [3], wherein when the confidence of the prediction is low, a message regarding the prediction result of the safety evaluation of the molecule, the confidence of the prediction, and the safety evaluation data of the similar molecule is output. prediction device.
- the safety prediction unit a feature amount calculation unit that calculates a feature amount of the molecule based on the structural formula of the molecule; a prediction unit that predicts the safety evaluation of the molecule based on the feature amount and calculates the certainty of the prediction;
- the compound safety prediction device according to any one of [1] to [5].
- the feature amount calculation unit uses a fingerprint based on the structural formula of the molecule, or a physical property value calculated by quantum chemical calculation based on the structural formula of the molecule, or a physical property estimated by quantitative structure-activity correlation. Predicting the safety of the compound according to [6], wherein the characteristic amount of the molecule is calculated using one or more of the values and predicted values by a trained model that has learned the relationship between the structural formula and the physical property value of the molecule.
- Device uses a fingerprint based on the structural formula of the molecule, or a physical property value calculated by quantum chemical calculation based on the structural formula of the molecule, or a physical property estimated by quantitative structure-activity correlation.
- the similar molecule data search unit The degree of similarity between the structural formula of the molecule input in the input unit and the structural formulas of a plurality of the evaluated molecules in a safety evaluation database storing the safety evaluation results of previously evaluated molecules a similarity evaluation unit that calculates a data search unit that acquires the safety evaluation result of the evaluated molecule with the high degree of similarity as the safety evaluation data of the similar molecule;
- the compound safety prediction device according to any one of [1] to [7].
- a compound safety prediction program that causes a computer to execute [10] an input step of inputting structural formulas of one or more molecules; a safety prediction step of predicting the safety evaluation of the molecule and calculating the confidence of the prediction; a similar molecule data search step of acquiring safety evaluation data of similar molecules similar to the molecule; an output step of outputting the prediction result of the safety evaluation of the molecule, the confidence of the prediction, and the safety evaluation data of the similar molecule;
- a compound safety prediction program that causes a computer to execute [10] an input step of inputting structural formulas of one or more molecules; a safety prediction step of predicting the safety evaluation of the molecule and calculating the confidence of the prediction; a similar molecule data search step of acquiring safety evaluation data of similar molecules similar to the molecule; an output step of outputting the prediction result of the safety evaluation of the molecule, the confidence of the prediction, and the safety evaluation data of the similar molecule;
- a method for predicting the safety of a compound comprising:
- One aspect of the compound safety prediction device, safety prediction program, and safety prediction method according to the present invention is to quantify the degree of confidence in molecular safety prediction, so that the safety of a compound can be appropriately evaluated.
- the degree of certainty is high, the prediction result can be used as it is, so that the safety of the compound can be evaluated quickly and easily while being highly accurately evaluated.
- one aspect of the compound safety prediction device, safety prediction program, and safety prediction method according to the present invention can highly accurately evaluate the safety of a compound while enhancing user convenience. .
- FIG. 1 is a block diagram showing a schematic configuration of a compound safety prediction device according to a first embodiment of the present invention
- FIG. FIG. 4 is a diagram showing an example of a table describing structural formulas (SMILES).
- FIG. 10 is an explanatory diagram showing an example of a case where the prediction certainty is considered to be high when the prediction certainty is 50% or more;
- FIG. 10 is a diagram showing an example of a table describing predicted results of molecular safety evaluation.
- FIG. 4 is a diagram showing an example of evaluation data of similar molecules;
- FIG. 10 is a diagram showing another example of evaluation data of similar molecules; It is a figure which shows an example of an integrated file. It is a figure which shows an example of a learning data table.
- FIG. 4 is a schematic diagram showing the configuration of a model learning unit;
- FIG. 4 is a flowchart for explaining a model learning method;
- 1 is a flow chart illustrating a compound safety prediction method according to a first embodiment of the present invention.
- FIG. 13 is a flowchart for explaining a confirmation step (step S22) in FIG. 12;
- FIG. 13 is a flow chart for explaining a step of predicting the safety evaluation of a molecule in FIG. 12 and calculating the degree of certainty of the prediction (step S23).
- FIG. 13 is a flowchart for explaining a similar molecule safety evaluation data search step (step S24) in FIG. 12.
- FIG. 13 is a flowchart for explaining an integration step (step S25) in FIG. 12;
- FIG. FIG. 2 is a block diagram showing a schematic configuration of a compound safety prediction device according to a second embodiment of the present invention;
- FIG. 2 is a flow chart illustrating a compound safety prediction method according to a second embodiment of the present invention.
- FIG. 1 is a block diagram showing the hardware configuration of a compound safety prediction device;
- FIG. 1 is a block diagram showing a schematic configuration of a compound safety prediction device according to this embodiment.
- a compound safety prediction device (hereinafter simply referred to as "safety prediction device") 1A includes an input unit 10, a safety prediction unit 20, a similar molecule data search unit 30, an integration unit 40, a storage A unit 50 , a model learning unit 60 , a characteristic prediction model 70 and an output unit 80 are provided.
- the safety prediction device 1A outputs the prediction result of molecular safety evaluation obtained by the safety prediction unit 20 and the degree of certainty of the prediction, and the safety evaluation data obtained by the similar molecule data search unit 30. Thereby, the user (user) can adopt the prediction result as it is when the degree of certainty is high, and consider whether to adopt the prediction result or the safety evaluation data when the degree of certainty is low. Therefore, the safety prediction device 1A quantifies and outputs the degree of certainty, so that the prediction result of the molecular safety evaluation obtained by the safety prediction unit 20 and the safety evaluation data obtained by the similar molecule data search unit 30 The user can judge the safety of the compound based on at least one of Therefore, the safety prediction device 1A can enhance user convenience and improve the accuracy of compound safety evaluation.
- the output includes display on the screen, sound, etc., as described later.
- High confidence and low confidence are the same as high confidence and low confidence, which will be described later. It can be set as appropriate. For example, when the threshold is set to 50%, the certainty is considered to be high if the certainty is equal to or higher than the threshold.
- Safety is an index that expresses the magnitude of the burden that a compound has on humans and the environment, and includes biodegradability, bioaccumulation, mutagenicity, acute toxicity, chronic toxicity, inhibitory toxicity, and repeated toxicity.
- the input unit 10 inputs the structural formulas of one or more molecules that are evaluation targets for safety evaluation.
- SMILES is a character string representation of the molecular structure of a compound.
- FIG. 2 shows an example of a table describing structural formulas (SMILES). As shown in FIG. 2, SMILES assigns A1 . . . as an ID number to each compound and displays the SMILES of each compound.
- a table containing the structural formula of each molecule may be obtained from the data in a format such as CSV, Excel spreadsheet software.
- the input unit 10 may input a table in which SMILES of each molecule are described as shown in FIG.
- the input unit 10 may check whether there are any mistakes in the structural formula of the input molecule. When the user inputs the structural formula, there is a possibility of inputting it incorrectly. The input unit 10 can determine that the structural formula of the input molecule is incorrect by confirming the erroneous input of the structural formula.
- the input unit 10 converts the structural formula of an input molecule into a molecular Mol object using, for example, RDKit included in a library such as Anaconda (registered trademark) which is software distributed by Anaconda, Inc. in the United States. By confirming the presence or absence of , it may be determined whether there is an error in the structural formula of the input molecule. If the structural formula is SMILES, MolFromSmiles included in the RDKit is used to read the SMILES character string and read the structural formula of the molecule. When SMILES is converted to a Mol object and a molecule Mol object is normally created, it can be determined that there is no entry error in the structural formula of the input molecule. On the other hand, if the SMILES is not converted to a Mol object and a molecular Mol object is not created, it can be determined that the structural formula of the input molecule is incorrect.
- RDKit included in a library such as Anaconda (registered trademark) which is software distributed by Anacon
- the input unit 10 may separately create a table containing structural formulas without description errors and a table containing structural formulas with description errors, and output the tables by the output unit 80, which will be described later. Thereby, even if the user fails to input the structural formula, the safety evaluation can be predicted without abnormal termination of the safety prediction device 1A.
- the safety prediction unit 20 predicts the safety evaluation of a molecule and calculates the degree of certainty of the prediction.
- the safety prediction unit 20 includes a feature quantity calculation unit 21 and a prediction unit 22 .
- the feature amount calculation unit 21 calculates feature amounts based on the molecular structural formula.
- the feature value can be obtained based on the structural formula of a molecule that does not contain any writing errors.
- the feature amount is calculated using the Morgan fingerprint (Circular fingerprint) implemented in the RDkit, the structural formula of the molecule such as a fingerprint equivalent to EXTENDED Connectivity Fingerprints (ECFP), another fingerprint such as AtomPair based fingerprints can be used.
- the feature quantity may be a physical property such as the octanol/water partition coefficient (logP), which represents the lipophilicity of the molecule. Fingerprints may be expressed by the presence or absence of partial structures as 1 or 0, the number of partial structures, or the ratio of partial structures obtained by dividing the number of partial structures by the number of constituent atoms. You may
- Feature values are physical property values calculated by quantum chemical calculations based on the molecular structural formula, physical property values obtained by quantitative structure-activity correlation between the molecular structural formula and physical property values, and molecular structural formula and physical property values. may be calculated using any one or more of the predicted values by a trained model that has learned the relationship between .
- HOMO, LUMO, charge, refractive index, frequency and the like are listed as physical property values calculated by quantum chemical calculation.
- the structure-activity relationship refers to the correlation between chemical structural features (or physicochemical constants) of a substance and biological activity (eg, degradability, accumulation, various toxicity endpoints, etc.).
- the feature amount may be a physical property value measurable by experiment, such as melting point, viscosity, and specific surface area.
- the prediction unit 22 predicts the safety evaluation of the molecule based on the feature amount calculated by the feature amount calculation unit 21 and calculates the certainty of the prediction.
- biochemical oxygen demand can be used as an index for molecular safety evaluation.
- BOD biochemical oxygen demand
- the safety of the molecule can be evaluated as good.
- Prediction confidence can be calculated using the property prediction model 70 .
- the prediction unit 22 inputs the feature amount calculated by the feature amount calculation unit 21 as an explanatory variable to the characteristic prediction model 70, and outputs the classification probability P(OK) that the classification result is "OK".
- the prediction unit 22 calculates the prediction certainty (unit: %) for the classification probability P(OK) that the classification result is "OK” using the following equation (1). Confidence of prediction (%) ⁇ 100 ⁇ 2 ⁇
- Prediction confidence ranges from 0% to 100%, and the closer the prediction confidence is to 100%, the higher the accuracy rate of the prediction results. Therefore, the user can easily determine whether or not the prediction result is reliable from the certainty of the prediction.
- the prediction confidence level corresponds to the classification probability, and the prediction confidence level changes according to the magnitude of the classification probability.
- FIG. 3 shows an example of the case where the prediction certainty is regarded as high when the prediction certainty is 50% or more.
- the prediction confidence is 50% or more and 100% or less, and is regarded as "high confidence NG”.
- the classification probability is greater than 0.25 and less than 0.50
- the confidence of the prediction is greater than 0% and less than 50%, and is regarded as "low confidence NG”.
- the classification probability is 0.50 or more and less than 0.75
- the prediction confidence is 0% or more and less than 50%, and is regarded as "OK with low confidence”.
- the prediction confidence is 50% or more and 100% or less, and is regarded as "OK with high confidence”.
- the threshold for judging high confidence and low confidence can be set as appropriate according to the type of molecule whose safety is to be evaluated, and is preferably 50%, for example.
- the prediction unit 22 can create a table of molecular safety evaluation prediction results that includes the structural formula of each molecule, the prediction result, and the prediction confidence.
- FIG. 4 shows an example of a table describing prediction results of molecular safety evaluation.
- SMILES is used for the molecular structural formula
- ID numbers A1 . . . of each compound, and SMILES of each compound are used.
- BOD is used as an index for molecular safety evaluation. Molecules were evaluated as good (OK) for safety when the BOD was 60% or more, and as poor (NG) for safety when the BOD was less than 60%.
- the feature quantity calculation unit 21 creates a table of the molecular safety evaluation prediction results including the molecular safety evaluation prediction results and the confidence of the prediction, as shown in FIG. can be output. As a result, the user can easily grasp the prediction results regarding the molecular safety evaluation.
- the similar molecule data search unit 30 acquires safety evaluation data of similar molecules similar to the molecule to be evaluated.
- the similar molecule data search unit 30 includes a similarity evaluation unit 31 and a data search unit 32 .
- the similarity evaluation unit 31 calculates and evaluates the degree of similarity between the structural formula of the molecule input by the input unit 10 and the structural formulas of multiple evaluated molecules stored in the safety evaluation database 33 . Note that the similarity evaluation unit 31 may use SMILES as the molecular structural formula.
- the safety evaluation database 33 stores safety evaluation data of previously evaluated molecules.
- the degree of similarity can be obtained by calculating the Tanimoto coefficient using Bulk Tanimoto Similarity implemented in RDkit.
- the similarity may be Dice coefficient, cosine similarity, or the like.
- the similarity evaluation unit 31 can appropriately change the number of safety evaluation data of similar molecules to be acquired according to the purpose, ease of use, etc. Data from the highest degree to a predetermined number (for example, the top 20 cases) may be obtained as similar molecule safety evaluation data (similar molecule data).
- Permanent change substances refer to change substances that remain after the biodegradation test under the Chemical Substances Control Law, etc.
- the structural formula of the molecule whose ID is A5 in FIG. 4 is displayed in the first row as the molecule to be evaluated.
- information on similar molecules recorded in the past data is displayed on the second and subsequent lines.
- the similarity evaluation unit 31 collectively displays the information about the molecule to be evaluated and the information about the similar molecule in a table containing the safety evaluation data of the similar molecule, so that the molecule to be evaluated and the similar molecule can be visualized. Therefore, the user can easily determine which of the similar molecules the safety evaluation data should be referred to.
- the similarity evaluation unit 31 may create a table containing the safety evaluation data of similar molecules as shown in FIGS. 5 and 6, and output it from the output unit 80 described later. This allows the user to grasp information about similar molecules.
- the data search unit 32 acquires safety evaluation data of similar molecules with a high degree of similarity.
- the integration unit 40 includes a prediction result file containing the prediction result of the safety evaluation of the molecule to be evaluated and the prediction certainty obtained by the safety prediction unit 20, and a similar molecule data search unit. Integrate with the assessment data file containing the safety assessment data obtained at 30. As a result, the integration unit 40 generates a prediction result file (see FIG. 4) obtained by the safety prediction unit 20 and an evaluation data file (FIGS. 5 and 5) obtained by the similar molecule data search unit 30, as shown in FIG. 6) are integrated to create an integrated file.
- the prediction sheet describes the contents of the prediction result file, and the A1 sheet, A2 sheet, .
- the integration unit 40 may cause the output unit 80, which will be described later, to output the integrated file.
- the user can easily comprehend the information on the molecule to be evaluated and the information on the safety evaluation of similar molecules, which are included in the integrated file.
- the storage unit 50 stores related data in which the molecular structural formula of the compound, the safety evaluation, the feature amount of the compound, the characteristics of the compound, etc. are associated with each other as learning data.
- An example of the learning data table is shown in FIG.
- the learning data includes the CAS registration number and SMILES of the molecule of the compound, the BOD which is the safety evaluation result as the target variable of the compound, the judgment result of the Chemical Substances Control Law as the characteristic of the compound, the persistent change substance Including the correspondence such as the type of Note that "-" in FIG. 8 indicates "not applicable”.
- the feature amount of a compound is calculated from the SMILES of the corresponding compound by a technique such as ECFP.
- the feature amount of a compound is expressed in numerical matrix form as feature amounts 1 and 2, etc. calculated by ECFP.
- the storage unit 50 may input the molecular structural formula of the compound (for example, SMILES, etc.), the characteristic amount of the compound, the characteristics of the compound, etc. to the related data, and update the related data.
- the molecular structural formula of the compound for example, SMILES, etc.
- the model learning unit 60 uses the related data stored in the storage unit 50 as learning data to learn the model.
- the model learning unit 60 uses the molecular structural formula (for example, SMILES, etc.) of the compound stored in the storage unit 50 and the feature amount of the compound as explanatory variables, and uses the desired property of the compound as the objective variable. . Thereby, the model learning unit 60 learns a model that identifies the correspondence between the feature amount of the compound and the property of the compound, and generates a learned model (property prediction model 70). The model learning unit 60 learns the model so that the correspondence relationship approaches the correspondence relationship of the learning data through machine learning.
- molecular structural formula for example, SMILES, etc.
- supervised learning for example, linear regression, logistic regression, random forest, boosting, support vector machine (SVM), neural network ) and the like.
- a neural network can use deep learning with a neural network having more than three layers.
- Types of neural networks include, for example, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), General Regression Neural Network, etc. can be done.
- the model may be represented by a formula such as a function.
- Anaconda registered trademark
- Anaconda registered trademark
- Anaconda (registered trademark) includes a group of libraries used in machine learning such as scikit-learn, and the model learning unit 60 uses one or more of these to perform machine learning. you can
- model learning unit 60 obtains from the safety evaluation data newly stored in the storage unit 50 the molecular structural formula of the compound (for example, SMILES) and the characteristic amount of the compound as explanatory variables, and the characteristics of the compound as objective variables. You may re-learn about the trained model by using as .
- FIG. 10 is a schematic diagram showing the configuration of the model learning unit 60.
- the model learning unit 60 has a first acquisition unit 61 , a second acquisition unit 62 , a function unit 63 , a determination unit 64 , a model 65 and a storage unit 66 .
- the first acquisition unit 61 acquires learning data including a table listing molecular structural formulas of compounds (for example, SMILES, etc.) and a table listing properties of compounds.
- Learning data can be saved as files in formats such as CSV and spreadsheet software Excel.
- the second acquisition unit 62 acquires the molecular structure of one molecule from the learning data acquired by the first acquisition unit 61.
- one molecule of SMILES is preferred.
- the function unit 63 calculates feature amounts based on the molecular structure of one molecule acquired by the second acquisition unit 62 . Since the feature calculation method can be performed in the same manner as the feature amount calculation unit 21, details thereof will be omitted.
- the determination unit 64 determines whether or not the feature values of all molecules included in the learning data have been calculated.
- the model 65 is learned by the model learning unit 60 using the molecular structural formula of the compound and the feature amount of the compound stored in the storage unit 50 as explanatory variables and the characteristics of the compound as objective variables.
- the storage unit 66 stores the learned model generated by the model learning unit 60 having the model 65 perform learning.
- the characteristic prediction model 70 is a trained model generated by the model learning unit 60 causing the model 65 to learn.
- the degree of certainty of prediction can be appropriately set according to a predetermined value of the classification probability.
- the degree of certainty is 50% or more, and low prediction certainty means, for example, the case where the certainty of prediction is less than 50%.
- the output unit 80 outputs the prediction result of the safety evaluation of the molecule, the degree of certainty of the prediction, and the safety evaluation data of the similar molecule obtained by the integration unit 40 . That is, the output unit 80 outputs the integrated file.
- the output includes display on a monitor, etc., sound, etc., and any method that can notify the user may be used.
- the output unit 80 may output a table of structural formulas (for example, SMILES) without writing errors and a table of structural formulas with writing errors created by the input unit 10 .
- the output unit 80 may output a table of molecular safety evaluation prediction results, which includes the molecular safety evaluation prediction results and the certainty of the prediction, created by the safety prediction unit 20.
- the similar molecule safety evaluation data including information on the similar molecule created by the degree evaluating unit 31 may be output.
- the output unit 80 may refer to the integrated file and output safety evaluation data of similar molecules when the prediction confidence of the safety evaluation of the molecule is low.
- the output unit 80 outputs a message regarding the molecular safety evaluation prediction result and the prediction confidence when the molecular safety evaluation prediction confidence is high (high confidence).
- the confidence of the prediction of the molecule is low (low confidence)
- a message regarding the prediction result of the safety evaluation of the molecule, the confidence of the prediction, and the safety evaluation data of the similar molecule may be output.
- the prediction confidence when the prediction confidence is high, the content of the message is such as "The prediction result of the safety evaluation of the molecule is high, and the prediction confidence is 50% or more.” In some cases, "the prediction result of molecular safety evaluation is low, and the prediction confidence is less than 50%.”
- Safety prediction program A program having the following structure can be used as the safety prediction program for the compound according to the present embodiment (hereinafter simply referred to as "safety prediction program").
- the safety prediction program is an input step of inputting structural formulas of one or more molecules; a safety prediction step of predicting the safety evaluation of the molecule and calculating the confidence of the prediction; a similar molecule data search step of acquiring safety evaluation data of similar molecules similar to the molecule; an output step of outputting the prediction result of the safety evaluation of the molecule, the confidence of the prediction, and the safety evaluation data of the similar molecule; can be used at least by a computer.
- the safety prediction method to which the safety prediction device according to the present embodiment is applied is a method of predicting the safety evaluation of a compound using the safety prediction device 1A having the configuration as shown in FIG.
- the learning method of the property prediction model 70 used in the safety prediction method will be explained. Since the model 65 constructed by the model learning unit 60 is applied to the characteristic prediction model 70 as described above, the learning method of the characteristic prediction model 70 will be described as the learning method of the model 65 .
- FIG. 11 is a flowchart explaining the model learning method.
- the model learning method is such that the model learning unit 60 configured as shown in FIG. This is a method of learning a model using learning data in which explanatory variables including feature values of formulas and compounds and objective variables including properties of compounds are associated with each other.
- the safety prediction device 1A acquires learning data using the first acquisition unit 61 (learning data acquisition step: step S11).
- the learning data includes a table listing the molecular structural formulas of compounds (for example, SMILES, etc.) and a table listing the properties of the compounds.
- the safety prediction device 1A uses the second acquisition unit 62 to acquire the structural formula of one molecule from the learning data (step of acquiring the structural formula of one molecule: step S12).
- the structural formula of one molecule may be SMILES of one molecule.
- the safety prediction device 1A uses the function unit 63 to use the structural formula of one molecule acquired by the second acquisition unit 62 to obtain libraries included in Anaconda (registered trademark) such as scikit-learn and RDKit. (feature amount calculation step: step S13).
- the safety prediction device 1A uses the determination unit 64 to determine whether or not the feature values of all molecules included in the learning data have been calculated (step of determining feature values of all molecules: step S14).
- step S14 If the feature values of all molecules have not been calculated (step S14: No), the process returns to the step of obtaining the structural formula of one molecule (step S12), and the structural formulas of the remaining molecules whose feature values have not been calculated are obtained. get.
- step S14 When the feature values of all molecules have been calculated (step S14: Yes), the model learning unit 60 associates the explanatory variables including the feature values of all molecules with the objective variables including the characteristics of all molecules. Learning is performed using the attached learning data, and a model 65 is constructed (learning step: step S15).
- the learning unit 15 causes the model to learn so that the output matches the objective variable linked to the explanatory variable according to the input of the explanatory variable included in the learning data.
- the safety prediction device 1A uses the storage unit 66 to store the model constructed by the learning unit 15 (storage step: step S16).
- FIG. 12 is a flowchart for explaining the safety prediction method according to this embodiment.
- the input unit 10 of the safety prediction device 1A inputs structural formulas of one or more molecules, which are evaluation targets for safety evaluation (input step: step S21).
- the safety prediction device 1A uses the safety prediction unit 20 to check for entry errors in the input structural formula (confirmation step: step S22).
- step S22 Details of the confirmation step (step S22) will be described later. Note that the confirmation step (step S22) may not be performed.
- the safety prediction device 1A uses the safety prediction unit 20 to predict the safety evaluation of the molecule and calculate the confidence of the prediction, including the prediction of the safety evaluation of the molecule and the confidence of the prediction.
- a table of prediction results of molecular safety evaluation is obtained (step of predicting molecular safety evaluation and calculating certainty of the prediction: step S23).
- step S23 The details of the step of predicting the safety evaluation of molecules and calculating the certainty of the prediction (step S23) will be described later.
- the safety prediction device 1A searches and acquires the safety evaluation data of similar molecules of the molecule whose safety is to be evaluated by the similar molecule data search unit 30 (similar molecule safety evaluation data search step: step S24).
- step S24 The details of the similar molecule safety evaluation data search step (step S24) will be described later.
- the safety prediction apparatus 1A causes the integration unit 40 to predict the molecular safety evaluation results obtained in the step of predicting molecular safety evaluation and calculating the certainty of the prediction (step S23), and the prediction result. and the similar molecule safety evaluation data obtained in the similar molecule safety evaluation data search step (step S24) are integrated to obtain integrated data (integration step: step S25).
- step S25 Details of the integration step (step S25) will be described later.
- the safety prediction device 1A uses the output unit 80 to output the integrated data integrated by the integration unit 40 (output step: step S26).
- the safety prediction device 1A uses the output unit 80 to output the prediction result and the prediction confidence in the integrated data when the prediction confidence is high, and when the safety prediction confidence is low
- the safety evaluation data of similar molecules may be output by display or the like.
- step S23 The step of predicting the safety evaluation of a molecule and calculating the degree of certainty of the prediction (step S23) may be performed at the same time as the step of retrieving the safety evaluation data of a similar molecule (step S24). It may be performed after the evaluation data search step (step S24).
- FIG. 13 is a flow chart for explaining the confirmation step (step S22) of FIG.
- the safety prediction device 1A uses the safety prediction unit 20 to input all structural formulas of molecules to be evaluated for safety evaluation (step of inputting structural formulas of all molecules to be evaluated). : step S221).
- SMILES as shown in FIG. 2 may be obtained.
- the safety prediction device 1A uses the safety prediction unit 20 to acquire the structural formula of one molecule out of all the molecules input as evaluation targets (step of acquiring the structural formula of one molecule: step S222 ).
- the safety prediction device 1A uses the safety prediction unit 20 to check for entry errors in the structural formula of one molecule (entry error confirmation step: step S223).
- the safety prediction device 1A uses the safety prediction unit 20 to determine whether or not a structural formula calculation error has been confirmed for all molecules (description error determination step: step S224).
- step S224 If calculation errors have not been confirmed for all molecules (step S224: No), the structural formulas of unconfirmed molecules are obtained again (step S222).
- step S224 When calculation errors are confirmed for all molecules (step S224: Yes), the safety prediction device 1A causes the safety prediction unit 20 to output a table of structural formulas without description errors to a file ( Step of outputting a table of structural formulas without description errors: step S225).
- the safety prediction device 1A uses the safety prediction unit 20 to output a table of structural formulas with description errors to a file (process for outputting structural formulas with description errors: step S226).
- FIG. 14 is a flow chart for explaining the step of predicting the safety evaluation of molecules in FIG. 12 and calculating the certainty of the prediction (step S23).
- the safety prediction device 1A acquires the model obtained by the model learning unit 60 as the property prediction model 70 by the safety prediction unit 20 (property prediction model acquisition step: step S231).
- the safety prediction device 1A uses the safety prediction unit 20 to acquire a table of structural formulas without writing errors (structural formula acquisition step: step S232).
- the safety prediction device 1A uses the safety prediction unit 20 to acquire the structural formula of one molecule out of all the molecules listed in the table of structural formulas that have no description errors (the Structural Formula Acquisition Step: Step S233).
- the safety prediction device 1A uses the safety prediction unit 20 to generate a feature amount of one molecule (step of generating a feature amount of one molecule: S234).
- the safety prediction device 1A uses the safety prediction unit 20 to predict the safety evaluation of one molecule and calculate the certainty of the prediction (prediction of the safety evaluation of the molecule and the certainty of the prediction). degree calculation step: S235).
- the safety prediction device 1A determines whether or not the safety prediction unit 20 has predicted the safety evaluation for all molecules and calculated the confidence of the prediction (safety prediction for all molecules). Judgment step of prediction of evaluation and calculation of certainty of the prediction: step S236).
- step S236 If the prediction of safety evaluation and the calculation of the certainty of the prediction have not been performed for all molecules (step S236: No), the structural formulas of unconfirmed molecules are obtained again (step S232). .
- step S236 When the safety evaluation prediction and the calculation of the confidence of the prediction have been performed for all molecules (step S236: Yes), the molecule containing the prediction of the safety evaluation of all molecules and the confidence of the prediction to a file (step of outputting a table of molecular safety evaluation prediction results: step S237).
- FIG. 15 is a flow chart for explaining the step of obtaining safety evaluation data for similar molecules (step S24) in FIG.
- the safety prediction device 1A acquires the safety evaluation data of all molecules from the safety evaluation database by the similar molecule data search unit 30 (the step of obtaining the safety evaluation data of all molecules). : step S241).
- the safety prediction device 1A uses the similar molecule data search unit 30 to acquire a structural formula table that does not contain description errors (structural formula table acquisition step: step S242).
- the safety prediction device 1A acquires the structural formula of one molecule out of all the molecules listed in the table of structural formulas that do not have a description error by the similar molecule data search unit 30 (one Molecular structural formula acquisition step: step S243).
- the safety prediction device 1A uses the similar molecule data search unit 30 to calculate the degree of similarity between the acquired molecule and all the molecules in the safety evaluation database (similarity degree calculation step: step S244). .
- the safety prediction device 1A causes the similar molecule data search unit 30 to perform a predetermined number of safety safety evaluation data is acquired (step of acquiring a predetermined number of safety evaluation data: step S245).
- the safety prediction device 1A determines whether or not similar molecules have been searched for for all molecules listed in the table of structural formulas that do not have description errors by the similar molecule data search unit 30 ( Judgment step of similar molecule search for all molecules: step S246).
- step S246 If similar molecules have not been searched for all molecules (step S246: No), the structural formulas of unconfirmed molecules are obtained again (step S243).
- step S246 When similar molecules have been searched for all molecules (step S246: Yes), a table of safety evaluation data for each similar molecule of all molecules is output (step S247).
- FIG. 16 is a flow chart for explaining the integration step (step S25) of FIG.
- the safety prediction device 1A uses the integration unit 40 to predict the safety evaluation of molecules and calculate the certainty of the prediction (Step S23). and the prediction result table of the molecular safety evaluation including the confidence of the prediction is obtained from the safety prediction unit 20 (step of obtaining the table of the prediction result of the molecular safety evaluation: step S251).
- the safety prediction apparatus 1A causes the integration unit 40 to display the safety evaluation data of each similar molecule of all molecules obtained in the step of acquiring the safety evaluation data of similar molecules (step S24). is acquired from the similar molecule data search unit 30 (step of acquiring safety evaluation data of similar molecules: step S252).
- the safety prediction device 1A integrates and integrates the prediction result table of the molecular safety evaluation and the safety evaluation data table of all similar molecules into one table by the integration unit 40.
- a file is created (table integration step: step S253).
- the safety prediction device 1A uses the output unit 80 to output an integrated file as shown in FIG. 7 (integrated file output process: step S254).
- a safety prediction device 1A includes an input unit 10, a safety prediction unit 20, a similar molecule data search unit 30, and an output unit 80.
- the safety prediction unit 20 calculates the prediction of the safety evaluation of the molecule and the degree of certainty of the prediction
- the similar molecule data search unit 30 acquires the safety evaluation data of the similar molecule.
- the safety prediction device 1A can appropriately provide the user with the prediction result of the safety evaluation of the compound by quantifying and outputting the certainty of the prediction of the safety evaluation of the molecule.
- the confidence of the prediction is high, the user can use the prediction result as it is, so that the safety of the compound can be evaluated quickly, accurately, and easily.
- the prediction confidence is low, the user can quickly and easily evaluate the safety of the compound by considering whether to adopt the prediction results or the safety evaluation data. . Therefore, the safety prediction device 1A can highly accurately evaluate the safety of a compound while enhancing user convenience.
- the output unit 80 when the prediction certainty is high, the output unit 80 outputs a message regarding the prediction result of the molecular safety evaluation and the prediction certainty, and when the prediction certainty is low, , the prediction result of the molecular safety evaluation, the confidence of the prediction, and the safety evaluation data.
- the user can accurately determine the safety evaluation content of the compound. Therefore, the safety prediction device 1A can appropriately and highly accurately evaluate the safety of a compound while improving user's convenience.
- the safety prediction unit 20 can include the feature value calculation unit 21 and the prediction unit 22. Thereby, the safety prediction device 1A can calculate the feature amount based on the structural formula of the molecule by the feature amount calculation unit 21 and predict the safety of the molecule based on the feature amount calculated by the prediction unit 22 . Therefore, the safety prediction device 1A can more accurately evaluate the safety of compounds.
- the safety prediction device 1A can input the structural formula of the molecule to the characteristic prediction model 70 in the feature quantity calculation unit 21 to calculate the feature quantity of the molecule.
- the safety prediction unit 20 can easily and accurately predict the safety evaluation of a molecule from the structural formula of the molecule and the degree of certainty of the prediction, and can reduce the burden and time required for calculation. Therefore, the safety prediction device 1A can predict the safety evaluation of a compound with high accuracy, simply, and at a low computational cost.
- the similar molecule data search unit 30 can include a similarity evaluation unit 31 and a data search unit 32.
- the safety prediction device 1A uses the similarity evaluation unit 31 to evaluate the degree of similarity between the input molecule and a plurality of molecules listed in the safety evaluation database 33, and the data search unit 32 evaluates the degree of similarity. It is possible to obtain safety evaluation data for highly similar molecules. Therefore, the safety prediction device 1A can more accurately evaluate the safety of compounds.
- the safety prediction device 1A can include an output unit 80.
- the safety prediction apparatus 1A can visually present information on the prediction result of the safety evaluation of the predicted compound and information on similar molecule data to the user, so that the user can receive information on the compound. can be easily grasped.
- the safety prediction device 1A can predict the safety of a compound simply and at a low calculation cost with high accuracy. can be predicted with high accuracy, it can be suitably used for safe research and development, product manufacturing, and the like.
- the safety prediction device 1A can be effectively used for evaluation tests such as biodegradability, bioaccumulation, mutagenicity, fish acute toxicity, crustacean immobility toxicity, algae growth inhibition toxicity, mammalian repeated toxicity, and the like.
- Mutagenicity evaluation tests include reverse mutation tests (Ames test), chromosomal aberration tests, and the like.
- fish acute toxicity evaluation tests include measurement of LC50 (median lethal concentration) according to "Fish acute toxicity test-JIS K 0102.71-”.
- Evaluation tests for crustacean immobilization toxicity include measurement of 50% immobilization concentration (EC 50 ) and the like.
- Evaluation tests for algae growth inhibitory toxicity include measurement of 50% growth inhibitory concentration (EC 50 ) and the like.
- Evaluation tests for mammalian repeated toxicity include measurement of the lowest observed adverse effect level (NOAEL) and the like.
- FIG. 17 is a block diagram showing a schematic configuration of a safety prediction device according to this embodiment.
- the safety prediction device 1B further includes a verification unit 110 in addition to the configuration of the safety prediction device 1A according to the first embodiment. Since the components other than the verification unit 110 are the same as the safety prediction device 1A according to the above-described first embodiment, the details are omitted.
- the verification unit 110 determines the validity of the molecular safety evaluation prediction results by determining the degree of matching between the molecular safety evaluation prediction results and the safety evaluation data.
- the verification unit 110 determines the degree of matching between the prediction result of the safety evaluation of the molecule and the safety evaluation data of the similar molecule. If the prediction result and the prediction result of the safety evaluation data of the similar molecule match, the verification unit 110 regards the prediction result as valid although the prediction confidence is low (low confidence OK). If the prediction result and the prediction result of the molecular safety evaluation data do not match, the verification unit 110 regards the prediction result as having a low certainty and being invalid (low certainty NG). The verification unit 110 refers to the safety evaluation data of the similar molecule only when the prediction confidence is low, thereby reducing the frequency of use of the safety evaluation data of the similar molecule, thereby improving convenience for the user. planned.
- the validity of the prediction result of molecular safety evaluation may be determined based on the majority of the safety evaluation data of a plurality (for example, 20) of similar molecules from the safety evaluation data.
- a predetermined number e.g., 11
- the unit 110 may determine that the molecule to be predicted has an OK safety evaluation and good resolution, and may be regarded as OK with low confidence.
- the safety evaluation of the molecule to be predicted is OK and exhibits good degradability. Therefore, the prediction result of the safety evaluation of the molecule obtained from the safety evaluation data is consistent with the safety evaluation data of the similar molecule. Therefore, the verification unit 110 can determine that the predicted result of molecular safety evaluation is valid.
- the verification unit 110 determines that the molecule to be predicted is difficult to decompose, and can be regarded as NG with a low degree of confidence.
- the safety evaluation of the molecule to be predicted is OK, indicating good degradability, but when referring to the safety evaluation data of similar molecules, the safety evaluation of the molecule to be predicted is NG, Because of its persistence, the prediction results of the safety evaluation of the molecule do not match the safety evaluation data of similar molecules. Therefore, the verification unit 110 can determine that the predicted result of molecular safety evaluation is invalid.
- the verification unit 110 determines the number of safety evaluation data of the similar molecule by majority vote. The determination may be made based on the sum of the similarities of the similar molecules, or the sum of the values obtained by multiplying the similarities of the similar molecules by weights.
- the weight may be the same value for each similar molecule, or may be a different value.
- the output unit 80 outputs a message indicating that the prediction result of the safety evaluation of the molecule matches the safety evaluation data of the similar molecule when the prediction certainty is low and the matching degree is high. However, if the prediction certainty and matching are low, a message may be output indicating that the prediction result of the safety evaluation of the molecule does not match the safety evaluation data of the similar molecule.
- the content of the message may be: The consistency with the evaluation data is high.”
- the prediction confidence and match are low, the content of the message is "Prediction confidence is less than 50% and the prediction result of the molecule safety evaluation is consistent with the safety evaluation data of the similar molecule. is also low.”
- a safety prediction method to which the safety prediction device according to this embodiment is applied is a method of predicting the safety of a compound using a safety prediction device 1B having a configuration as shown in FIG.
- FIG. 18 is a flowchart for explaining the safety prediction method according to this embodiment.
- the input unit 10 of the safety prediction device 1B inputs structural formulas of one or more molecules that are evaluation targets for safety evaluation (input step: step S31).
- the safety prediction device 1B uses the safety prediction unit 20 to check for entry errors in the input structural formula (confirmation step: step S32).
- the confirmation step (step S32) is the same as the confirmation step (step S22) of the safety prediction method according to the first embodiment shown in FIG. 12, so details will be omitted. Note that the confirmation step (step S32) may not be performed.
- the safety prediction device 1B uses the safety prediction unit 20 to predict the molecular safety evaluation and calculate the confidence of the prediction, including the prediction of the safety evaluation of the molecule and the confidence of the prediction.
- a table of prediction results of molecular safety evaluation is obtained (step of predicting molecular safety evaluation and calculating certainty of the prediction: step S33).
- the molecular safety evaluation prediction and prediction confidence calculation step (step S33) includes the molecular safety evaluation prediction and the prediction confidence of the safety prediction method according to the first embodiment shown in FIG. Since this is the same as the degree calculation step (step S23), the details are omitted.
- the safety prediction device 1B uses the similar molecule data search unit 30 to search and acquire safety evaluation data of similar molecules of the molecule to be evaluated for safety (similar molecule safety evaluation data search step: step S34).
- the similar molecule safety evaluation data search step (step S34) is the same as the similar molecule safety evaluation data search step (step S24) of the safety prediction method according to the first embodiment shown in FIG. Therefore, details are omitted.
- the safety prediction device 1B uses the verification unit 110 to determine whether or not the confidence of the prediction is 50% or more after the process of predicting the safety evaluation of the molecule and calculating the confidence of the prediction (step S33). (prediction certainty determination step: step S35).
- step S35 if the prediction certainty is 50% or more (step S35: Yes), the safety prediction device 1B causes the output unit 80 to perform molecular safety evaluation.
- the prediction result is output (step of outputting the prediction result: step S36).
- step S35 When the prediction certainty is less than 50% (step S35: No), the safety prediction device 1B causes the verification unit 110 to search for safety evaluation data of similar molecules (step S34), It is determined whether or not there is a high degree of matching between the prediction result of the safety evaluation and the safety evaluation data of the similar molecule (step of determining the degree of matching: step S37).
- step S37 When the degree of matching between the molecular safety evaluation prediction result and the similar molecule safety evaluation data is high (step S37: Yes), the safety prediction device 1B causes the verification unit 110 to determine that the prediction certainty is low. considers the prediction result of molecular safety evaluation to be valid (OK with low confidence), and outputs a table of prediction results of molecular safety evaluation by the output unit 80 (prediction result of molecular safety evaluation Table output process: step S36).
- the safety prediction device 1A When the degree of matching between the molecular safety evaluation prediction result and the similar molecule safety evaluation data is low (step S37: No), the safety prediction device 1A causes the verification unit 110 to determine that the prediction certainty is low. , considers the predicted outcome of the safety assessment of the molecule to be invalid (low confidence NG).
- the safety prediction apparatus 1A causes the integration unit 40 to generate a table of molecular safety evaluation prediction results obtained in the step of predicting molecular safety evaluation and calculating confidence in the prediction (step S33), and similar molecule are integrated with the safety evaluation data of similar molecules obtained in the evaluation data search step (step S34) to obtain integrated data (integration step: step S38).
- the integration step (step S38) is the same as the integration step (step S25) of the safety prediction method according to the first embodiment shown in FIG. 12, so details will be omitted.
- the safety prediction device 1B uses the output unit 80 to output integrated data (see FIG. 7) integrated by the integration unit 40 (output step: step S39).
- step S33 the step of predicting the safety evaluation of a molecule and calculating the certainty of the prediction is performed simultaneously with the step of retrieving safety evaluation data of similar molecules (step S34). Alternatively, it may be performed after the similar molecule safety evaluation data search step (step S34).
- a safety prediction device 1B according to the present embodiment further includes a verification unit 110 in addition to the configuration of the safety prediction device 1A according to the first embodiment.
- the validity of the prediction results is verified, and the degree of agreement between the molecular safety evaluation prediction results and the safety evaluation data is determined.
- the safety prediction device 1B refers to the safety evaluation data of similar molecules and determines the degree of matching between the prediction result of the safety evaluation of the molecule and the safety evaluation data, even when the prediction certainty is low. By doing so, even for compounds whose safety evaluation is difficult to predict, the safety evaluation of the compound can be performed with higher accuracy. Therefore, the safety prediction apparatus 1A can perform the safety evaluation of the compound with higher accuracy while further improving user's convenience.
- the output unit 80 outputs a message regarding the prediction result of the molecular safety evaluation and the prediction confidence when the prediction confidence is high, and outputs a message regarding the prediction confidence when the prediction confidence is low. , the prediction result of the molecular safety evaluation, the confidence of the prediction, and the safety evaluation data.
- the user can more accurately judge the safety evaluation of the compound by checking the content of the output message. Therefore, also in the safety prediction apparatus 1B, the safety evaluation of the compound can be appropriately performed with high accuracy while improving the convenience for the user.
- the safety prediction device 1B can output a message indicating that the prediction result of the molecular safety evaluation matches the safety evaluation data when the prediction confidence is low and the matching degree is high. , when the prediction confidence and match are low, a message can be output indicating that the molecular safety evaluation prediction results do not match the safety evaluation data.
- the safety prediction device 1B can provide the user with the prediction result of the safety evaluation of the predicted compound and the degree of agreement with the safety evaluation data. By confirming the content of the output message, the user can more accurately judge the safety evaluation of the compound. Therefore, the safety prediction apparatus 1B can more appropriately and accurately evaluate the safety of a compound, especially a compound whose safety evaluation is difficult to predict, while improving convenience for the user.
- the safety prediction device 1B can predict the safety of a compound easily and with high accuracy at a low calculation cost. Since the safety of a compound can be predicted with high accuracy, it can be suitably used for safe research and development, manufacturing of products, and the like.
- the safety prediction device 1B like the safety prediction device 1A, evaluates biodegradability, bioaccumulation, mutagenicity, fish acute toxicity, crustacean immobilization toxicity, algae growth inhibition toxicity, mammal repeat toxicity, etc. It can be effectively used for testing.
- FIG. 19 is a block diagram showing the hardware configuration of safety prediction devices 1A and 1B.
- the safety prediction devices 1A and 1B are configured by an information processing device (computer), and physically include a CPU (Central Processing Unit: processor) 101 which is an arithmetic processing unit, and a main storage device.
- a computer system including a RAM (Random Access Memory) 102 and a ROM (Read Only Memory) 103, an input device 104 as an input device, an output device 105, a communication module 106, an auxiliary storage device 107 such as a hard disk, and the like. can. These are interconnected by a bus 108 .
- the output device 105 and the auxiliary storage device 107 may be provided outside.
- the CPU 101 controls the overall operation of the safety prediction devices 1A and 1B and performs various types of information processing.
- the CPU 101 executes a safety prediction program stored in the ROM 103 or the auxiliary storage device 107 to control display operations of the measurement recording screen and the analysis screen.
- the RAM 102 is used as a work area for the CPU 101 and may include a non-volatile RAM that stores main control parameters and information.
- the ROM 103 stores basic input/output programs and the like.
- the safety prediction program may be stored in ROM 103 .
- the input device 104 is a keyboard, mouse, operation buttons, touch panel, or the like.
- the output device 105 is a monitor display or the like.
- the output device 105 displays prediction results and the like, and the screen is updated according to input/output operations via the input device 104 and the communication module 106 .
- the communication module 106 is a data transmission/reception device such as a network card, and functions as a communication interface that takes in information from an external data recording server or the like and outputs analysis information to other electronic devices.
- Auxiliary storage device 107 is a storage device such as SSD (Solid State Drive) and HDD (Hard Disk Drive), and stores, for example, various data, files, etc. necessary for the operation of safety prediction devices 1A and 1B. .
- SSD Solid State Drive
- HDD Hard Disk Drive
- each function of the safety prediction devices 1A and 1B shown in FIGS. the CPU 101 executes a safety prediction program or the like stored in the ROM 103 or the auxiliary storage device 107 .
- Each function of the safety prediction devices 1A and 1B is realized by operating the input device 104, the output device 105, and the communication module 106, and reading and writing data in the RAM 102, the ROM 103, the auxiliary storage device 107, and the like. be. That is, by executing the safety prediction program according to the present embodiment on a computer, the safety prediction devices 1A and 1B can realize functions as the respective processing units in FIGS. 1 and 17.
- the safety prediction program is stored, for example, in the storage device of the computer.
- a part or all of the safety prediction program may be transmitted via a transmission medium such as a communication line, received by the communication module 106 or the like provided in the computer, and recorded (including installation).
- part or all of the safety prediction program is stored in a portable storage medium such as CD-ROM, DVD-ROM, flash memory, etc., and is recorded (including installation) in the computer. may be configured.
- the program executed by the information processing device has a module configuration including each processing unit of the safety prediction devices 1A and 1B described above. Each processing unit is generated on a memory such as the RAM 102 or the like.
- the safety prediction devices 1A and 1B may be configured as a system in which a plurality of information processing devices are communicatively connected, and each processing unit described above may be distributed to the plurality of information processing devices. .
- it may be a virtual machine that operates on a cloud system.
- 1A, 1B compound safety prediction device 10 input unit 20 safety prediction unit 21 feature amount calculation unit 22 prediction unit 30 similar molecule data search unit 31 similarity evaluation unit 32 data search unit 33 safety evaluation database 40 integration unit 50 storage Section 60 Model Learning Section 70 Characteristic Prediction Model 80 Output Section 110 Verification Section
Landscapes
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Crystallography & Structural Chemistry (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
[1] 一つ以上の分子の構造式を入力する入力部と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測部と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索部と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力部と、
を備える、化合物の安全性予測装置。
[2] 前記出力部は、前記予測の確信度が高い場合には、前記分子の安全性評価の予測結果及び前記予測の確信度に関するメッセージを出力し、
前記予測の確信度が低い場合には、前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データに関するメッセージを出力する[1]に記載の化合物の安全性予測装置。
[3] 前記類似分子の安全性評価データより前記分子の安全性評価の予測結果の妥当性を検証し、前記分子の安全性評価の予測結果と前記類似分子の安全性評価データとの合致度を判定する検証部を備える[1]に記載の化合物の安全性予測装置。
[4] 前記出力部は、前記予測の確信度が高い場合には、前記分子の安全性評価の予測結果及び前記予測の確信度に関するメッセージを出力し、
前記予測の確信度が低い場合には、前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データに関するメッセージを出力する[3]に記載の化合物の安全性予測装置。
[5] 前記予測の確信度が低い場合に、
前記出力部は、前記合致度が高い場合には、前記分子の安全性評価の予測結果が前記類似分子の安全性評価データと整合することを示すメッセージを出力し、
前記合致度が低い場合には、前記分子の安全性評価の予測結果が前記類似分子の安全性評価データと整合しないことを示すメッセージを出力する[4]に記載の化合物の安全性予測装置。
[6] 前記安全性予測部は、
前記分子の構造式に基づいて前記分子の特徴量を算出する特徴量算出部と、
前記特徴量に基づいて前記分子の安全性評価を予測すると共に前記予測の確信度を算出する予測部と、
を備える[1]~[5]の何れか一つに記載の化合物の安全性予測装置。
[7] 前記特徴量算出部は、前記分子の構造式に基づくフィンガープリント、又は前記分子の構造式に基づいて、量子化学計算により計算された物性値、定量的構造活性相関により推算された物性値及び前記分子の構造式と物性値との関係を学習した学習済みモデルによる予測値の何れか一つ以上を用いて前記分子の特徴量を算出する[6]に記載の化合物の安全性予測装置。
[8] 前記類似分子データ検索部は、
前記入力部で入力された前記分子の構造式と、過去に評価された評価済み分子の安全性評価結果が格納された安全性評価データベース中の複数の前記評価済み分子の構造式との類似度を計算する類似度評価部と、
前記類似度が高い前記評価済み分子の安全性評価結果を前記類似分子の安全性評価データとして取得するデータ検索部と、
を備える[1]~[7]の何れか一つに記載の化合物の安全性予測装置。
[9] 一つ以上の分子の構造式を入力する入力工程と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測工程と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索工程と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力工程と、
をコンピュータに実行させる、化合物の安全性予測プログラム。
[10] 一つ以上の分子の構造式を入力する入力工程と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測工程と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索工程と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力工程と、
を含む、化合物の安全性予測方法。
<化合物の安全性予測装置>
本発明の第1の実施形態に係る化合物の安全性予測装置について説明する。図1は、本実施形態に係る化合物の安全性予測装置の概略構成を示すブロック図である。図1に示すように、化合物の安全性予測装置(以下、単に「安全性予測装置」という)1Aは、入力部10、安全性予測部20、類似分子データ検索部30、統合部40、記憶部50、モデル学習部60、特性予測モデル70及び出力部80を備える。
予測の確信度(%)≡100×2×|0.5-P(OK)| ・・・(1)
(式(1)中、P(OK)は、分類結果が「OK」である分類確率である。)
本実施形態に係る化合物の安全性予測プログラム(以下、単に「安全性予測プログラム」という)は、以下の構成のプログラムを用いることができる。
一つ以上の分子の構造式を入力する入力工程と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測工程と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索工程と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力工程と、
を少なくともコンピュータに実行させるプログラムを用いることができる。
次に、本実施形態に係る安全性予測装置を適用した化合物の安全性予測方法(以下、単に「安全性予測方法」という)について説明する。本実施形態に係る安全性予測装置を適用した安全性予測方法は、図1に示すような構成を有する安全性予測装置1Aを用いて、化合物の安全性評価の予測を行う方法である。
<安全性予測装置>
本発明の第2の実施形態に係る安全性予測装置について説明する。図17は、本実施形態に係る安全性予測装置の概略構成を示すブロック図である。図17に示すように、安全性予測装置1Bは、上述の第1の実施形態に係る安全性予測装置1Aの構成に加えて、さらに検証部110を備える。検証部110以外は、上述の第1の実施形態に係る安全性予測装置1Aと同様であるため、詳細は省略する。
次に、本実施形態に係る安全性予測装置を適用した安全性予測方法について説明する。本実施形態に係る安全性予測装置を適用した安全性予測方法は、図17に示すような構成を有する安全性予測装置1Bを用いて、化合物の安全性の予測を行う方法である。
次に、安全性予測装置1A及び1Bのハードウェア構成の一例について説明する。図19は、安全性予測装置1A及び1Bのハードウェア構成を示すブロック図である。図19に示すように、安全性予測装置1A及び1Bは、情報処理装置(コンピュータ)で構成され、物理的には、演算処理部であるCPU(Central Processing Unit:プロセッサ)101、主記憶装置であるRAM(Random Access Memory)102及びROM(Read Only Memory)103、入力デバイスである入力装置104、出力装置105、通信モジュール106並びにハードディスク等の補助記憶装置107等を含むコンピュータシステムとして構成することができる。これらは、バス108で相互に接続されている。なお、出力装置105及び補助記憶装置107は、外部に設けられていてもよい。
10 入力部
20 安全性予測部
21 特徴量算出部
22 予測部
30 類似分子データ検索部
31 類似度評価部
32 データ検索部
33 安全性評価データベース
40 統合部
50 記憶部
60 モデル学習部
70 特性予測モデル
80 出力部
110 検証部
Claims (10)
- 一つ以上の分子の構造式を入力する入力部と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測部と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索部と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力部と、
を備える、化合物の安全性予測装置。 - 前記出力部は、前記予測の確信度が高い場合には、前記分子の安全性評価の予測結果及び前記予測の確信度に関するメッセージを出力し、
前記予測の確信度が低い場合には、前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データに関するメッセージを出力する請求項1に記載の化合物の安全性予測装置。 - 前記類似分子の安全性評価データより前記分子の安全性評価の予測結果の妥当性を検証し、前記分子の安全性評価の予測結果と前記類似分子の安全性評価データとの合致度を判定する検証部を備える請求項1に記載の化合物の安全性予測装置。
- 前記出力部は、前記予測の確信度が高い場合には、前記分子の安全性評価の予測結果及び前記予測の確信度に関するメッセージを出力し、
前記予測の確信度が低い場合には、前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データに関するメッセージを出力する請求項3に記載の化合物の安全性予測装置。 - 前記予測の確信度が低い場合に、
前記出力部は、前記合致度が高い場合には、前記分子の安全性評価の予測結果が前記類似分子の安全性評価データと整合することを示すメッセージを出力し、
前記合致度が低い場合には、前記分子の安全性評価の予測結果が前記類似分子の安全性評価データと整合しないことを示すメッセージを出力する請求項4に記載の化合物の安全性予測装置。 - 前記安全性予測部は、
前記分子の構造式に基づいて前記分子の特徴量を算出する特徴量算出部と、
前記特徴量に基づいて前記分子の安全性評価を予測すると共に前記予測の確信度を算出する予測部と、
を備える請求項1~5の何れか一項に記載の化合物の安全性予測装置。 - 前記特徴量算出部は、前記分子の構造式に基づくフィンガープリント、又は前記分子の構造式に基づいて、量子化学計算により計算された物性値、定量的構造活性相関により推算された物性値及び前記分子の構造式と物性値との関係を学習した学習済みモデルによる予測値の何れか一つ以上を用いて前記分子の特徴量を算出する請求項6に記載の化合物の安全性予測装置。
- 前記類似分子データ検索部は、
前記入力部で入力された前記分子の構造式と、過去に評価された評価済み分子の安全性評価結果が格納された安全性評価データベース中の複数の前記評価済み分子の構造式との類似度を計算する類似度評価部と、
前記類似度が高い前記評価済み分子の安全性評価結果を前記類似分子の安全性評価データとして取得するデータ検索部と、
を備える請求項1~7の何れか一項に記載の化合物の安全性予測装置。 - 一つ以上の分子の構造式を入力する入力工程と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測工程と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索工程と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力工程と、
をコンピュータに実行させる、化合物の安全性予測プログラム。 - 一つ以上の分子の構造式を入力する入力工程と、
前記分子の安全性評価を予測すると共に前記予測の確信度を算出する安全性予測工程と、
前記分子と類似する類似分子の安全性評価データを取得する類似分子データ検索工程と、
前記分子の安全性評価の予測結果、前記予測の確信度及び前記類似分子の安全性評価データを出力する出力工程と、
を含む、化合物の安全性予測方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280058866.0A CN117882139A (zh) | 2021-09-06 | 2022-08-31 | 化合物的安全性预测装置、化合物的安全性预测程序及化合物的安全性预测方法 |
EP22864609.7A EP4401082A1 (en) | 2021-09-06 | 2022-08-31 | Compound safety prediction device, compound safety prediction program, and compound safety prediction method |
JP2023545632A JP7485229B2 (ja) | 2021-09-06 | 2022-08-31 | 化合物の安全性予測装置、化合物の安全性予測プログラム及び化合物の安全性予測方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021144755 | 2021-09-06 | ||
JP2021-144755 | 2021-09-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023033027A1 true WO2023033027A1 (ja) | 2023-03-09 |
Family
ID=85411351
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/032725 WO2023033027A1 (ja) | 2021-09-06 | 2022-08-31 | 化合物の安全性予測装置、化合物の安全性予測プログラム及び化合物の安全性予測方法 |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP4401082A1 (ja) |
JP (1) | JP7485229B2 (ja) |
CN (1) | CN117882139A (ja) |
WO (1) | WO2023033027A1 (ja) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007153767A (ja) | 2005-12-01 | 2007-06-21 | Univ Of Tokushima | 化学構造の類似度を算出し化合物の安全性を評価する方法及びこれを用いた医薬品安全性情報システム |
WO2009025045A1 (ja) * | 2007-08-22 | 2009-02-26 | Fujitsu Limited | 化合物の物性予測装置、物性予測方法およびその方法を実施するためのプログラム |
JP5512077B2 (ja) | 2006-11-22 | 2014-06-04 | 株式会社 資生堂 | 安全性評価方法、安全性評価システム及び安全性評価プログラム |
KR20200072585A (ko) * | 2018-11-30 | 2020-06-23 | 이율희 | 인공지능에 기반한 대상 물질의 유해성과 위해성 예측 방법 |
JP2021144755A (ja) | 2019-10-15 | 2021-09-24 | 明豊ファシリティワークス株式会社 | マンアワーシステム |
-
2022
- 2022-08-31 WO PCT/JP2022/032725 patent/WO2023033027A1/ja active Application Filing
- 2022-08-31 EP EP22864609.7A patent/EP4401082A1/en active Pending
- 2022-08-31 JP JP2023545632A patent/JP7485229B2/ja active Active
- 2022-08-31 CN CN202280058866.0A patent/CN117882139A/zh active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007153767A (ja) | 2005-12-01 | 2007-06-21 | Univ Of Tokushima | 化学構造の類似度を算出し化合物の安全性を評価する方法及びこれを用いた医薬品安全性情報システム |
JP5512077B2 (ja) | 2006-11-22 | 2014-06-04 | 株式会社 資生堂 | 安全性評価方法、安全性評価システム及び安全性評価プログラム |
WO2009025045A1 (ja) * | 2007-08-22 | 2009-02-26 | Fujitsu Limited | 化合物の物性予測装置、物性予測方法およびその方法を実施するためのプログラム |
KR20200072585A (ko) * | 2018-11-30 | 2020-06-23 | 이율희 | 인공지능에 기반한 대상 물질의 유해성과 위해성 예측 방법 |
JP2021144755A (ja) | 2019-10-15 | 2021-09-24 | 明豊ファシリティワークス株式会社 | マンアワーシステム |
Also Published As
Publication number | Publication date |
---|---|
EP4401082A1 (en) | 2024-07-17 |
JP7485229B2 (ja) | 2024-05-16 |
JPWO2023033027A1 (ja) | 2023-03-09 |
CN117882139A (zh) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Axelrod et al. | GEOM, energy-annotated molecular conformations for property prediction and molecular generation | |
Bauer | NONMEM tutorial part II: estimation methods and advanced examples | |
Benfenati et al. | Integrating in silico models and read-across methods for predicting toxicity of chemicals: A step-wise strategy | |
Cedersund et al. | Systems biology: model based evaluation and comparison of potential explanations for given biological data | |
Sonego et al. | ROC analysis: applications to the classification of biological sequences and 3D structures | |
Kwon et al. | Neural message passing for NMR chemical shift prediction | |
Melagraki et al. | Enalos KNIME nodes: Exploring corrosion inhibition of steel in acidic medium | |
Cortés-Ciriano et al. | Concepts and applications of conformal prediction in computational drug discovery | |
Hansen et al. | Visual Interpretation of Kernel‐based prediction models | |
Xu et al. | QSPR study of Setschenow constants of organic compounds using MLR, ANN, and SVM analyses | |
Tang et al. | A merged molecular representation deep learning method for blood–brain barrier permeability prediction | |
Sahlin et al. | Applicability domain dependent predictive uncertainty in QSAR regressions | |
Zankov et al. | QSAR modeling based on conformation ensembles using a multi-instance learning approach | |
US20130173503A1 (en) | Compound selection in drug discovery | |
Gogishvili et al. | Nonadditivity in public and inhouse data: implications for drug design | |
Lee et al. | A comparative study of the performance for predicting biodegradability classification: the quantitative structure–activity relationship model vs the graph convolutional network | |
Zaslavskiy et al. | ToxicBlend: virtual screening of toxic compounds with ensemble predictors | |
Parrot et al. | Integrating synthetic accessibility with AI-based generative drug design | |
Konovalov et al. | Statistical confidence for variable selection in QSAR models via Monte Carlo cross-validation | |
DiFranzo et al. | Nearest neighbor gaussian process for quantitative structure–activity relationships | |
Lombardo et al. | Development of new QSAR models for water, sediment, and soil half-life | |
Tong et al. | Classification of the urinary metabolome using machine learning and potential applications to diagnosing interstitial cystitis | |
WO2023033027A1 (ja) | 化合物の安全性予測装置、化合物の安全性予測プログラム及び化合物の安全性予測方法 | |
Li et al. | PLA-MoRe: a protein–ligand binding affinity prediction model via comprehensive molecular representations | |
Dimova et al. | Quantifying the fingerprint descriptor dependence of structure–activity relationship information on a large scale |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22864609 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023545632 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280058866.0 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022864609 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022864609 Country of ref document: EP Effective date: 20240408 |