WO2025033296A1 - 情報処理システム、情報処理方法、プログラム、および分子化合物を製造する方法 - Google Patents
情報処理システム、情報処理方法、プログラム、および分子化合物を製造する方法 Download PDFInfo
- Publication number
- WO2025033296A1 WO2025033296A1 PCT/JP2024/027412 JP2024027412W WO2025033296A1 WO 2025033296 A1 WO2025033296 A1 WO 2025033296A1 JP 2024027412 W JP2024027412 W JP 2024027412W WO 2025033296 A1 WO2025033296 A1 WO 2025033296A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- molecule
- prediction
- building block
- value
- block combination
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/50—Molecular design, e.g. of drugs
Definitions
- One aspect of the present disclosure relates to an information processing system, a molecular design device, an information processing method, a program, and a method for producing a molecular compound.
- Patent Document 1 machine learning information processing technology has been used in the pharmaceutical field to reduce the burden of drug discovery. Attempts are being made to use predictions made by machine learning to efficiently discover molecules with the properties required for drugs.
- Machine learning predictions involve uncertainty, so using machine learning predictions without taking this into account can lead to unexpected results.
- one aspect of the present disclosure aims to provide technology that further reduces the burden required for drug discovery.
- An information processing system for identifying molecules suitable as drug candidates comprising: a property prediction unit that calculates a property prediction value of a molecule, which is an element of a building block combination information set that is a set of building block combination information of a plurality of different molecules, using a prediction model for predicting the property of the molecule from the building block combination information of the molecule, and estimates the uncertainty of the prediction; a candidate molecule identifying unit that searches for candidates of molecules having desired properties based on the predicted property values and an estimated value of the uncertainty of the prediction;
- An information processing system comprising: [A2]
- the building block combination information is the molecular sequence information. The information processing system according to A1.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety. An information processing system according to any one of A1 to A4.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- An information processing system according to any one of A1 to A5.
- a prediction information processing unit that calculates a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction, the candidate molecule identifying unit identifies at least one candidate molecule from the elements of the building block combination information set based on the characteristic quality value; The information processing system according to any one of A1 to A6.
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate; An information processing system according to any one of A1 to A7.
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the prediction information processing unit calculates an average variance as an objective function that gives the characteristic quality value.
- the prediction uncertainty estimate is the standard deviation of the property prediction; An information processing system according to any one of A1 to A10.
- the property quality value increases in response to an increase in the property prediction value, and the building block combination information set is obtained using a combinatorial optimization algorithm in response to a decrease in the prediction uncertainty estimate, and an extraction parameter of the combinatorial optimization algorithm is updated based on the property prediction value of the molecule or the property quality value.
- the combinatorial optimization algorithm is a tree-structured Parzen estimator.
- the prediction model is a prediction model generated by learning based on building block combination information of a plurality of molecules for training and the results of the property evaluation of the molecules.
- a molecular design apparatus comprising a control unit for inferring a property of a molecule from building block combination information of the molecule using a predetermined prediction model,
- the control unit is A building block combination information set is obtained, which is a set of building block combination information of a plurality of different molecules; calculating a predicted property value of a molecule that is an element of the building block combination information set from the building block combination information of the molecule using the prediction model, and estimating the uncertainty of the prediction; searching for candidate molecules having desired properties based on the predicted property values and an estimate of the uncertainty of the predictions; Molecular design equipment.
- the molecular design apparatus according to B1 wherein the building block combination information is sequence information of the molecule.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound;
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- the molecular design apparatus according to any one of B1 to B5.
- the control unit further calculates a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction; identifying at least one candidate molecule from the elements of the building block combination information set based on the characteristic quality value;
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate;
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the molecular design apparatus according to B8. The control unit calculates an average variance as an objective function that gives the characteristic quality value.
- the prediction uncertainty estimate is the standard deviation of the property prediction;
- the control unit acquires the building block combination information set using a combinatorial optimization algorithm, and updates an extraction parameter of the combinatorial optimization algorithm based on the characteristic predicted value or the characteristic quality value.
- the combinatorial optimization algorithm is a tree-structured Parzen estimator.
- An output unit that outputs building block combination information of the candidate molecule is further provided.
- the prediction model is a prediction model generated by learning based on building block combination information of a plurality of molecules for training and the results of the property evaluation of the molecules.
- An information processing system according to any one of B1 to B14.
- An information processing system for identifying molecules suitable as drug candidates comprising: a building block combination information processing unit that acquires a building block combination information set, which is a set of building block combination information of a plurality of different molecules, in accordance with a combinatorial optimization algorithm; a property prediction unit that calculates a property prediction value of a molecule that is an element of the building block combination information set, using a prediction model for predicting the property of the molecule from the building block combination information of the molecule; a candidate molecule identifying unit that searches for candidates of molecules having desired properties based on the predicted property values;
- the building block combination information processing unit includes: and updating extraction parameters of the combinatorial optimization algorithm based on the predicted property values of the molecules so that molecules having more desired properties are included.
- the candidate molecule identifying unit searches for candidate molecules having desired properties based on a first property predicted value calculated for molecules that are elements of a first building block combination information set, which is a building block combination information set obtained before the extraction parameters are updated, and a second property predicted value calculated for molecules that are elements of a second building block combination information set, which is a building block combination information set obtained after the extraction parameters are updated.
- C1 information processing system [C3]
- the first building block combination information set and the second building block combination information set are sets including building block combination information of different molecules.
- the building block combination information is sequence information of the molecule. An information processing system according to any one of C1 to C3.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- An information processing system according to any one of C1 to C4.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- the characteristic prediction unit calculates the characteristic prediction value and calculates an estimate of the uncertainty of the prediction.
- the building block combination information processing unit includes: updating extraction parameters of a combinatorial optimization algorithm based on the predicted property values of the molecules and the estimate of the uncertainty of the predictions to include molecules having more desirable properties; An information processing system according to any one of C1 to C9.
- the candidate molecule identifying unit identifies at least one molecule from the building block combination information set based on the characteristic quality value.
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate;
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the prediction information processing unit calculates an average variance as an objective function that gives the characteristic quality value.
- the prediction uncertainty estimate is the standard deviation of the property prediction;
- a tree-structured Parzen estimator is used as the combinatorial optimization algorithm.
- a molecular design apparatus comprising a control unit for inferring a property of a molecule from building block combination information of the molecule using a predetermined prediction model,
- the control unit is A building block combination information set is obtained according to a combinatorial optimization algorithm, the building block combination information set being a set of building block combination information of a plurality of different molecules; calculating the predicted property value for a molecule that is an element of the building block combination information set using the prediction model; updating extraction parameters of the combinatorial optimization algorithm so as to include molecules having more desirable properties based on the property prediction values for each molecule obtained by the prediction model; searching for candidate molecules having desired properties based on the predicted property values; Molecular design equipment.
- the control unit is Obtain a first building block combination information set which is a set of building block combination information of a plurality of different molecules before updating the extraction parameters; calculating a first property prediction value for a molecule that is an element of the first building block combination information set using the prediction model; After updating the extraction parameters, a second building block combination information set is further obtained, which is a set of building block combination information of a plurality of different molecules; calculating a second property prediction value for a molecule that is an element of the second building block combination information set using the prediction model; searching for candidate molecules having desired properties based on the first property prediction value and the second property prediction value; The molecular design apparatus according to D1.
- the first building block combination information set and the second building block combination information set are sets including sequence information of different molecules.
- D2. A molecular design apparatus according to claim 2.
- the building block combination information is sequence information of the molecule.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- a molecular design apparatus according to any one of D1 to D5.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- a molecular design apparatus according to any one of D1 to D6.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- the control unit calculates the characteristic prediction value and further calculates an estimate of the uncertainty of the prediction.
- a molecular design apparatus according to any one of D1 to D8.
- the control unit acquires the building block combination information set so as to include molecules having more desirable properties based on a predicted value of the molecular properties and an estimated value of the uncertainty of the prediction.
- An information processing system according to any one of D1 to D9.
- the control unit further calculates a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction;
- the candidate molecule identifying unit identifies at least one molecule from the building block combination information set based on the characteristic quality value.
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate;
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the control unit calculates an average variance as an objective function that gives the characteristic quality value.
- the prediction uncertainty estimate is the standard deviation of the property prediction;
- a tree-structured Parzen estimator is used as the combinatorial optimization algorithm.
- the prediction model is a model trained based on building block combination information that differs for each molecule and training data that indicates characteristics of the molecule.
- An information processing method in an information processing system for identifying molecules suitable as drug candidates comprising the steps of: calculating a predicted property value of a molecule that is an element of a building block combination information set, which is a set of building block combination information of a plurality of different molecules, using a prediction model for predicting the properties of the molecule from the building block combination information of the molecule, and estimating the uncertainty of the prediction; and searching for candidate molecules having more desirable properties based on the predicted property values and the uncertainty of the predictions.
- the building block combination information is information on the sequence of the molecule. The information processing method according to E1.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is information of an amino acid sequence.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- An information processing method according to any one of E1 to E5. further comprising the step of calculating a characteristic quality value based on the characteristic prediction and an estimate of the uncertainty of the prediction; The searching step includes identifying at least one molecule from the building block combination information set based on the characteristic quality value.
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate; The information processing method according to E7.
- the characteristic quality value is an output value of an arbitrary function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the step of calculating the characteristic quality value calculates a mean variance as an objective function that gives the characteristic quality value.
- the estimate of the uncertainty of the prediction is the standard deviation of the prediction;
- the searching step includes a step of identifying molecules having the characteristic quality value equal to or greater than a predetermined value from the building block combination information set.
- the searching step includes the steps of: determining a ranking of the characteristic quality values for each molecule; and identifying molecules within a predetermined ranking.
- E17 Using a combinatorial optimization algorithm to obtain the building block combination information set, and updating an extraction parameter of the combinatorial optimization algorithm based on the characteristic prediction value or the characteristic quality value; An information processing method according to any one of E1 to E16.
- the combinatorial optimization algorithm is a tree-structured Parzen estimator.
- the prediction model is a model trained based on building block combination information that differs for each molecule and training data that indicates characteristics of the molecule. An information processing method according to any one of E1 to E18.
- a step of calculating a predicted property value of a molecule which is an element of a building block combination information set, which is a set of building block combination information of a plurality of different molecules, using a prediction model for predicting properties of the molecule from the building block combination information of the molecule, and estimating the uncertainty of the prediction; searching for candidate molecules having more desirable properties based on the predicted property values and the uncertainty of the predictions; A program for executing.
- the building block combination information is sequence information of the molecule. The program described in F1.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is information of an amino acid sequence.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- the method further comprises the step of calculating a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction,
- the searching step includes a step of identifying at least one molecule from the building block combination information set based on the characteristic quality value.
- the program according to any one of F1 to F6. [F8] the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate; The program described in F7. [F9]
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the step of calculating the characteristic quality value includes calculating a mean variance as an objective function that gives the characteristic quality value.
- the uncertainty estimate is the standard deviation of the predicted value;
- [F12] a step of searching for candidate molecules having more desired properties and outputting building block combination information of the identified candidate molecules;
- [F13] Searching for candidate molecules having more desired properties and further outputting information on predicted property values of the identified candidate molecules;
- the searching step includes a step of identifying a molecule having a characteristic quality value equal to or greater than a predetermined value from the building block combination information set.
- the searching step includes the steps of: determining a ranking of the characteristic quality values for each molecule; and identifying molecules within a predetermined ranking.
- the method further includes a step of updating an extraction parameter for obtaining the building block combination information set based on the predicted property value or the property quality value of the molecule using a combinatorial optimization algorithm.
- the program according to any one of F1 to F16 is
- the combinatorial optimization algorithm is a tree-structured Parzen estimator.
- the prediction model is a model trained based on building block combination information that differs for each molecule and training data that indicates characteristics of the molecule.
- [G1] A method for producing a molecular compound, comprising: accessing a building block combination information set which is a set of building block combination information of a plurality of different molecules; an input step of inputting the building block combination information set into a prediction model; an inference step of searching for molecules having more desired properties from the building block combination information set based on the property prediction value and the prediction uncertainty estimate value for each molecule included in the building block combination information set output from the prediction model, and identifying the molecules as candidate molecules; an output step of outputting building block combination information relating to the candidate molecule; a generating step of generating the molecular compound having a molecular sequence indicated in the building block combination information;
- the method according to claim 1, [G2] The candidate molecule has a biological sequence, and the building block combination information is sequence information.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety. A method according to any one of G1 to G4.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- a method according to any one of G1 to G5. The method further comprises the steps of: calculating a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction; and identifying at least one candidate molecule from the building block combination information set based on the characteristic quality value.
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate; The method described in G7.
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the method described in G8. [G10] The step of calculating the characteristic quality value calculates a mean variance as an objective function that gives the characteristic quality value. A method according to any one of G7 to G9. [G11] the uncertainty estimate is the standard deviation of the predicted value; A method according to any one of G1 to G10. [G12] The information on the candidate molecule includes building block combination information of the candidate molecule. A method according to any one of G1 to G11. [G13] The information on the candidate molecule includes a predicted characteristic value of the candidate molecule. A method according to any one of G1 to G12.
- the step of searching for a candidate molecule includes a step of identifying a molecule having a characteristic quality value equal to or greater than a predetermined value as the candidate molecule from the building block combination information set. A method according to any one of G7 to G13.
- the step of searching for candidate molecules includes a step of determining a rank of the characteristic quality value for each molecule, and determining a molecule within a predetermined rank as the candidate molecule. A method according to any one of G7 to G13.
- the step of searching for a candidate molecule includes a step of selecting, from the building block combination information set, at least one molecule for which the predicted property value and the estimated value of the prediction uncertainty each satisfy a predetermined condition.
- a method according to any one of G1 to G15 [G17]
- the building block combination information set is obtained using a combinatorial optimization algorithm; The method further includes updating an extraction parameter of the combinatorial optimization algorithm based on the predicted property value or the property quality value of the molecule.
- the combinatorial optimization algorithm is a tree-structured Parzen estimator. The method described in G17.
- An information processing method in an information processing system for identifying molecules suitable as drug candidates comprising: A step of obtaining a building block combination information set, which is a set of building block combination information of a plurality of different molecules, according to a combinatorial optimization algorithm; A step of calculating a predicted property value of a molecule that is an element of the building block combination information set, using a prediction model for predicting the property of the molecule from the building block combination information of the molecule; searching for molecules having more desirable properties based on the predicted property values; updating extraction parameters of the combinatorial optimization algorithm based on the predicted property values of the molecules so that molecules having more desirable properties are included; Information processing methods.
- the first building block combination information set and the second building block combination information set are sets including building block combination information of different molecules.
- the building block combination information is information on the sequence of the molecule.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- An information processing system according to any one of H1 to H4.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- the information processing method according to any one of H1 to H7. [H9] calculating the predicted characteristic and an estimate of the uncertainty of the prediction; The information processing method according to any one of H1 to H8. [H10] calculating a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction; the searching step includes identifying at least one molecule from the building block combination information set based on the characteristic quality value;
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate;
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic predicted value and the prediction uncertainty.
- the step of calculating the characteristic quality value includes a step of calculating a mean variance as an objective function that gives the characteristic quality value.
- the prediction uncertainty estimate is the standard deviation of the property prediction;
- a tree-structured Parzen estimator is used as a combinatorial optimization algorithm for updating the extraction parameters.
- an acquisition step of acquiring a building block combination information set which is a set of building block combination information of a plurality of different molecules, according to a combinatorial optimization algorithm; a prediction step of calculating a predicted property value of a molecule that is an element of the building block combination information set, using a prediction model for predicting the property of the molecule from the building block combination information of the molecule; an updating step of updating extraction parameters of the combinatorial optimization algorithm based on the predicted property values for each molecule so that molecules having more desired properties are included; a search procedure for searching for molecules having desired properties based on the predicted property values; A program for executing the above.
- the acquisition step includes: A step of acquiring a first building block combination information set which is a set of building block combination information of a plurality of different molecules before updating the extraction parameters; and a step of further acquiring a second building block combination information set, which is a set of building block combination information of a plurality of different molecules, after updating the extraction parameters
- the prediction step comprises: calculating a first property prediction value for a molecule that is an element of the first building block combination information set using the prediction model; and calculating a second predicted property value for a molecule that is an element of the second building block combination information set using the prediction model
- the search procedure includes: The program described in I1, further comprising a search procedure for searching for candidate molecules having desired properties from the first building block combination information set and the second building block combination information set based on the first property predicted value and the second property predicted value.
- the first building block combination information set and the second building block combination information set are sets including building block combination information of different molecules.
- the building block combination information is information on the sequence of the molecule.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- [I9] calculating the predicted characteristic value and calculating an estimate of the uncertainty of the prediction; A program according to any one of I1 to I8.
- the method further comprises the step of calculating a characteristic quality value based on the characteristic prediction value and an estimate of the uncertainty of the prediction,
- the search procedure includes: identifying at least one molecule from the building block combination information set based on the characteristic quality value;
- the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate;
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the step of calculating the characteristic quality value includes calculating a mean variance as an objective function that gives the characteristic quality value.
- the program according to any one of I10 to I12.
- the prediction uncertainty estimate is the standard deviation of the property prediction;
- a tree-structured Parzen estimator is used as a combinatorial optimization algorithm for updating the building block combination information set.
- the prediction model is a model trained based on building block combination information that differs for each molecule and training data that indicates characteristics of the molecule.
- a method for producing a molecular compound comprising: An acquisition step of acquiring a building block combination information set, which is a set of building block combination information of a plurality of different molecules, according to a combinatorial optimization algorithm; an input step of inputting the building block combination information set into a prediction model; an updating step of updating extraction parameters of the combinatorial optimization algorithm based on the property prediction values of each molecule included in the building block combination information set output from the prediction model so that molecules having more desired properties are included; searching for molecules having more desirable properties from the building block combination information set based on the property prediction value, and identifying the molecules as candidate molecules; an output step of outputting building block combination information relating to the candidate molecule; a generating step of generating the molecular compound having a molecular sequence indicated in the building block combination information;
- the method according to claim 1 [J2] The method according to claim 1, further comprising a step of searching for the candidate molecule based on a first property predicted value calculated for a molecule that is an element of
- the first building block combination information set and the second building block combination information set are sets including building block combination information of different molecules.
- the candidate molecule has a biological sequence.
- the building block combination information is sequence information of the molecule.
- the molecule is at least one of a nucleic acid, a peptide, a cyclic peptide, a protein, an antibody, and a small molecule compound.
- the molecule is a protein, an antibody, a peptide, or a cyclic peptide, and the building block combination information is amino acid sequence information.
- the property is at least one of binding ability, pharmacological activity, physical properties, kinetics, and safety.
- the molecule is a molecule that binds to a target molecule, and the property is the ability to bind to the target molecule.
- a method according to any one of J1 to J8. [J10] calculating the predicted characteristic and an estimate of the uncertainty of the prediction; A method according to any one of J1 to J9.
- the inference step of searching for the candidate molecule includes a step of identifying at least one molecule from the building block combination information set as the candidate molecule based on the characteristic quality value.
- the method described in J10. [J12] the characteristic quality value increases with increasing characteristic prediction value and decreases with decreasing prediction uncertainty estimate; The method according to claim J11.
- the characteristic quality value is an output value of a predetermined function having two input variables, the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the step of calculating the characteristic quality value includes a step of calculating a mean variance as an objective function that gives the characteristic quality value.
- the prediction uncertainty estimate is the standard deviation of the property prediction;
- a tree-structured Parzen estimator is used as a combinatorial optimization algorithm for updating the building block combination information set.
- a computer system comprising a processor and a memory, the memory configured to store one or more instructions;
- the instructions include causing the processor to: calculating a predicted property value of a molecule that is an element of a building block combination information set, which is a set of building block combination information of a plurality of different molecules, using a prediction model for predicting properties of the molecule from the building block combination information of the molecule, and estimating the uncertainty of the prediction; searching for candidate molecules having more desirable properties based on the predicted property values and the uncertainty of the predictions; Computer system.
- a non-transitory computer-readable storage medium storing one or more instructions, comprising: The instructions are sent to the computer: calculating a predicted property value of a molecule that is an element of a building block combination information set, which is a set of building block combination information of a plurality of different molecules, using a prediction model for predicting properties of the molecule from the building block combination information of the molecule, and estimating the uncertainty of the prediction; searching for candidate molecules having more desirable properties based on the predicted property values and the uncertainty of the predictions; A non-transitory computer-readable storage medium.
- a computer system comprising a processor and a memory, the memory configured to store one or more instructions;
- the instructions include causing the processor to: obtaining a building block combination information set, which is a set of building block combination information of a plurality of different molecules, according to a combinatorial optimization algorithm; calculating a predicted property value of a molecule that is an element of the building block combination information set using a prediction model for predicting the property of the molecule from the building block combination information of the molecule; updating extraction parameters of the combinatorial optimization algorithm based on the predicted property values for each molecule so that molecules having more desirable properties are included; searching for molecules having desired properties based on the predicted property values; Computer system.
- a non-transitory computer-readable storage medium storing one or more instructions, comprising: The instructions are sent to the computer: obtaining a building block combination information set, which is a set of building block combination information of a plurality of different molecules, according to a combinatorial optimization algorithm; calculating a predicted property value of a molecule that is an element of the building block combination information set using a prediction model for predicting the property of the molecule from the building block combination information of the molecule; updating extraction parameters of the combinatorial optimization algorithm based on the predicted property values for each molecule so that molecules having more desirable properties are included; searching for molecules having desired properties based on the predicted property values; A non-transitory computer-readable storage medium.
- One aspect of the present disclosure can further reduce the burden of drug discovery.
- FIG. 1 is a diagram showing an example of a drug discovery system including a molecular design device according to a first embodiment.
- FIG. 13 is a diagram showing an example of a drug discovery system including a molecular design device according to a second embodiment.
- FIG. 2 is an explanatory diagram for explaining an example of a plurality of pieces of molecular sequence information according to an embodiment of the present application.
- FIG. 1 is a diagram showing an example of a hardware configuration of a molecular design apparatus according to an embodiment of the present application.
- FIG. 10 is a flowchart showing an example of a flow of a process executed by a molecular design device according to a second embodiment.
- 13 is a flowchart showing an example of a combinatorial optimization process in the second embodiment.
- 13 is a flowchart showing another example of the combinatorial optimization process in the second embodiment.
- FIG. 1 is a schematic diagram of an optimization problem setting according to an application example of an embodiment of the present application.
- FIG. 13 is a diagram showing an example of calculation of an objective function value by optimization in TPE according to the first verification example.
- FIG. 13 is a graph showing the relationship between the predicted average value and the predicted standard deviation value of the array obtained by sampling using TPE in the first verification example.
- FIG. 13 is a graph showing the density distribution of edit distances of sequences obtained by sampling using TPE in the first verification example.
- FIG. 13 is a graph showing the distribution of pseudo-correct model scores for the proposed sequence in the first validation example.
- FIG. 13 is a diagram showing edit distances of proposed sequences in the first verification example.
- FIG. 13 is a graph showing predicted standard deviation values of proposed sequences in the first verification example.
- 13 is a table showing examples of amino acid candidates for each mutation candidate site in a search space according to a first verification example.
- 13 is a table showing an example of parameter settings of the TPE according to the first verification example.
- FIG. 13 is a diagram illustrating the distribution of predicted mean values and predicted standard deviation values of an array obtained by sampling using TPE in the second verification example.
- FIG. 13 is a t-SNE visualization diagram illustrating an array based on sampling of TPEs for the second validation example.
- FIG. 13 is a diagram illustrating a predicted average value of a proposed sequence according to a second verification example.
- FIG. 13 is a diagram illustrating the predicted variance values of the proposed sequences in the second verification example.
- FIG. 13 is a diagram illustrating the distribution of expression levels of proposed sequences in the second verification example.
- FIG. 13 is a diagram illustrating the distribution of octet values for each sequence according to the second verification example.
- Amino acids in the present specification, the term may include natural amino acids and non-natural amino acids.
- the amino acids are represented by one-letter code or three-letter code, or both, such as Ala/A, Leu/L, Arg/R, Lys/K, Asn/N, Met/M, Asp/D, Phe/F, Cys/C, Pro/P, Gln/Q, Ser/S, Glu/E, Thr/T, Gly/G, Trp/W, His/H, Tyr/Y, Ile/I, and Val/V.
- Amino acid modification For modifying an amino acid in the amino acid sequence of an antigen-binding molecule, known methods such as site-directed mutagenesis (Kunkel et al. (Proc. Natl. Acad. Sci. USA (1985) 82, 488-492)) and overlap extension PCR may be appropriately employed. In addition, as a method for modifying an amino acid by substituting an amino acid other than a natural amino acid, several known methods may also be employed (Annu. Rev. Biophys. Biomol. Struct. (2006) 35, 225-249, Proc. Natl. Acad. Sci. USA (2003) 100 (11), 6353-6357).
- a cell-free translation system (Clover Direct (Protein Express)) containing a tRNA in which a non-natural amino acid is bound to a complementary amber suppressor tRNA of the UAG codon (amber codon), which is one of the stop codons, may also be used.
- the term "antigen" is not limited to a specific structure as long as it contains an epitope to which an antigen-binding domain binds.
- the antigen is a peptide, polypeptide, or protein of 4 or more amino acids. Examples of the above antigens include membrane molecules that are expressed on the cell membrane, and soluble molecules that are secreted outside the cells.
- Antigen-binding domain may refer to any domain of any structure as long as it binds to the target antigen.
- domains include the variable regions of the heavy and light chains of an antibody, a module called an A domain of about 35 amino acids contained in Avimer, a cell membrane protein present in the body (International Publication WO2004/044011, WO2005/040229), Adnectin containing the 10Fn3 domain, which is a domain that binds to a protein in fibronectin, a glycoprotein expressed in the cell membrane (International Publication WO2002/032925), Affibody using an IgG-binding domain consisting of a bundle of three helices consisting of 58 amino acids of Protein A as a scaffold (International Publication WO1995/001937), and DARPins (Designed Ankyrin Repeats (AR)), which are regions exposed on the molecular surface of ankyrin repeats (AR) having a structure in which a turn
- anticalin which is a four loop region supporting one side of a barrel structure in which eight highly conserved antiparallel strands twist toward the center in lipocalin molecules such as lipocalin (NGAL) (International Publication WO2003/029462), and a concave region of a parallel sheet structure inside a horseshoe-shaped structure in which leucine-rich-repeat (LRR) modules are repeatedly stacked in the variable lymphocyte receptor (VLR) that does not have an immunoglobulin structure and serves as the adaptive immune system of jawless animals such as lampreys and hagfish (International Publication WO2008/016854).
- NGAL lipocalin
- LRR leucine-rich-repeat
- VLR variable lymphocyte receptor
- an antigen-binding molecule containing an antigen-binding domain is used in the broadest sense, and specifically, various molecular types are included as long as they contain an antigen-binding domain.
- the antigen-binding molecule may be a molecule consisting of only an antigen-binding domain, or may be a molecule containing an antigen-binding domain and other domains.
- examples include a complete antibody and an antibody fragment.
- the antibody may include a single monoclonal antibody (including agonist and antagonist antibodies), a human antibody, a humanized antibody, a chimeric antibody, and the like.
- a scaffold molecule in which a three-dimensional structure such as an existing stable ⁇ / ⁇ barrel protein structure is used as a scaffold (foundation), and only a part of the structure is made into a library for constructing an antigen-binding domain, may also be included in the antigen-binding molecule of the present disclosure.
- antibody is used herein in the broadest sense and encompasses a variety of antibody structures, including, but not limited to, monoclonal antibodies, polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), and antibody fragments, so long as they exhibit the desired antigen-binding activity.
- Antibodies may be isolated from natural sources such as plasma or serum in which they naturally occur, or from the culture supernatant of hybridoma cells that produce the antibodies, or may be partially or completely synthesized by using techniques such as genetic recombination. Examples of antibodies include immunoglobulin isotypes and their isotype subclasses.
- the antibodies of the present disclosure may include IgG1, IgG2, IgG3, and IgG4 among these isotypes.
- IgG1, IgG2, IgG3, and IgG4 among these isotypes.
- constant regions of human IgG1, human IgG2, human IgG3, and human IgG4 multiple allotype sequences due to genetic polymorphism are described in Sequences of proteins of immunological interest, NIH Publication No. 91-3242, and any of these may be used in the present disclosure.
- the amino acid sequence at positions 356-358 as represented by EU numbering may be DEL or EEM.
- the human Ig ⁇ (Kappa) constant region and the human Ig ⁇ (Lambda) constant region multiple allotype sequences due to genetic polymorphisms are described in Sequences of proteins of immunological interest, NIH Publication No. 91-3242, and any of these may be used in the present disclosure.
- Antibody fragment refers to a molecule other than a complete antibody that contains a portion of the complete antibody that binds to the antigen to which the complete antibody binds.
- Examples of antibody fragments include, but are not limited to, Fv, Fab, Fab', Fab'-SH, F(ab')2; diabodies; linear antibodies; single-chain antibody molecules (e.g., scFv); and multispecific antibodies formed from antibody fragments.
- full length antibody “full length antibody,” “complete antibody,” and “whole antibody” are used interchangeably herein and refer to an antibody having a structure substantially similar to a native antibody structure or having a heavy chain that includes an Fc region as defined herein.
- variable region refers to the domain of an antibody heavy or light chain that is involved in binding the antibody to an antigen.
- the heavy and light chain variable domains (VH and VL, respectively) of natural antibodies usually have a similar structure, with each domain containing four conserved framework regions (FR) and three hypervariable regions (HVR).
- FR conserved framework regions
- HVR hypervariable regions
- antibodies that bind to a particular antigen may be isolated by screening a complementary library of VL or VH domains, respectively, with a VH or VL domain from an antibody that binds to that antigen. See, e.g., Portolano et al., J. Immunol. 150:880-887 (1993); Clarkson et al., Nature 352:624-628 (1991).
- molecular weight means the sum of the atomic weights of the atoms that make up a compound molecule (unit: "g/mol"), and is obtained by calculating the sum of the atomic weights of the atoms included in the molecular structure. In this specification, the unit of molecular weight may be omitted. Note that molecular weight can be measured, for example, by liquid chromatography mass spectrometry (LC/MS).
- LC/MS liquid chromatography mass spectrometry
- a medium molecular compound, or a medium molecule is a compound having a molecular weight of 500 g/mol or more and less than 30,000 g/mol.
- the medium molecular weight compound may be, for example, a compound having a molecular weight of 500 g/mol or more and less than 6000 g/mol, or may be a compound having a molecular weight of 500 g/mol or more and less than 4000 g/mol, or may be a compound having a molecular weight of 600 g/mol or more and less than 4000 g/mol, or may be a compound having a molecular weight of 700 g/mol or more and less than 3000 g/mol.
- the medium molecular weight compound may be, for example, a peptide compound containing a peptide chain, a nucleic acid, or a sugar chain, and may be a peptide compound, a peptide compound containing 5 to 30 amino acid residues, a peptide compound containing 7 to 25 amino acid residues, or a peptide compound containing 9 to 20 amino acid residues.
- the medium molecular weight compound is, for example, a peptide compound having a molecular weight of 500 g/mol or more and less than 6000 g/mol, or may be a peptide compound having a molecular weight of 500 g/mol or more and less than 4000 g/mol, or may be a peptide compound having a molecular weight of 600 g/mol or more and less than 4000 g/mol, or may be a peptide compound having a molecular weight of 700 g/mol or more and less than 3000 g/mol.
- a low molecular weight compound is a compound with a molecular weight of less than 500 g/mol.
- a high molecular weight compound is a compound with a molecular weight of 30,000 g/mol or more.
- a prediction model is generated by performing machine learning based on molecular sequence information and evaluation result information of the molecular property evaluation.
- molecular property evaluation include, but are not limited to, molecular binding ability evaluation, pharmacological activity evaluation, physical property evaluation, kinetic evaluation, and safety evaluation.
- the method of evaluating the binding ability of a target molecule-binding molecule to a target molecule is not particularly limited, but it is possible to quantitatively evaluate the binding of the target molecule-binding molecule to the target molecule.
- the target molecule is, for example, a target protein.
- the target molecule-binding molecule is, for example, an antigen-binding molecule, and the target molecule is, for example, an antigen.
- the target molecule is an antigen, it can be evaluated by measuring the binding activity of the antigen-binding molecule and the antigen.
- Binding activity refers to the total strength of non-covalent interactions between one or more binding sites of a molecule (e.g., an antibody) and the binding partner of the molecule (e.g., an antigen).
- binding activity is not strictly limited to 1:1 interactions between members of a binding pair (e.g., an antibody and an antigen).
- the binding activity refers to an inherent binding affinity (sometimes simply referred to as "affinity").
- affinity affinity
- binding activity of molecule X to its partner Y can generally be expressed by the dissociation constant (KD) or "analyte binding amount per unit amount of ligand". Binding activity can be measured by conventional methods known in the art, including those described herein. Conditions other than the concentration of the target tissue-specific compound can be appropriately determined by those skilled in the art.
- the antigen-binding molecule provided herein is an antibody
- the binding activity of the antibody is a dissociation constant (KD) of ⁇ 1 ⁇ M, ⁇ 100 nM, ⁇ 10 nM, ⁇ 1 nM, ⁇ 0.1 nM, ⁇ 0.01 nM, or ⁇ 0.001 nM (e.g., 10 ⁇ 8 M or less, e.g., 10 ⁇ 8 M to 10 ⁇ 13 M, e.g., 10 ⁇ 9 M to 10 ⁇ 13 M).
- KD dissociation constant
- the binding activity of the antibody is measured by a ligand capture method using, for example, a BIACORE® T200 or BIACORE® 4000 (GE Healthcare, Uppsala, Sweden) based on the principle of surface plasmon resonance analysis.
- the instrument is operated using, for example, BIACORE® Control Software.
- an amine coupling kit (GE Healthcare, Uppsala, Sweden) is used according to the supplier's instructions to immobilize ligand capture molecules, such as anti-tag antibodies, anti-IgG antibodies, protein A, etc., on a carboxymethyldextran-coated sensor chip (GE Healthcare, Uppsala, Sweden).
- the ligand capture molecules are diluted with 10 mM sodium acetate solution of appropriate pH and injected at an appropriate flow rate and injection time.
- the binding activity is measured using a buffer containing 0.05% polysorbate 20 (also known as Tween (registered trademark) 20) as the measurement buffer, at a flow rate of 10-30 ⁇ L/min, and at a measurement temperature of, for example, 25°C or 37°C.
- a buffer containing 0.05% polysorbate 20 also known as Tween (registered trademark) 20
- Tween also known as Tween 20
- the antibody is injected to capture the desired amount, and then a serial dilution (analyte) of the antigen and/or Fc receptor prepared using the measurement buffer is injected.
- the antigen and/or Fc receptor is injected to capture the desired amount, and then a serial dilution (analyte) of the antibody prepared using the measurement buffer is injected.
- the measurement results are analyzed using BIACORE® Evaluation Software.
- Kinetics parameter calculations are performed by simultaneously fitting the binding and dissociation sensorgrams using a 1:1 binding model, and the binding rate (kon or ka), dissociation rate (koff or kd), and equilibrium dissociation constant (KD) can be calculated.
- the binding activity is weak, particularly when dissociation is rapid and kinetic parameter calculation is difficult, the equilibrium dissociation constant (KD) may be calculated using a steady state model.
- the "amount of analyte bound per unit amount of ligand” can also be calculated by dividing the amount of analyte bound (RU) at a specific concentration by the amount of ligand captured (RU).
- KD dissociation rate constant
- apparent kd apparent dissociation rate constant
- kd dissociation rate constant
- apparent KD apparent dissociation rate constant
- a selection method of antigen-binding molecules using a display library can be mentioned.
- panning using phage display can be mentioned.
- affinity evaluation a phage library displaying multiple different antigen-binding molecules is prepared, and the target antigen is contacted with the prepared phage, and then unbound phages are washed to concentrate the phages displaying antigen-binding molecules that interact with the target antigen.
- By analyzing the nucleic acid sequence encoding the antigen-binding molecule contained in the concentrated phage it is possible to identify a sequence that has affinity for the target antigen.
- panning using mammalian cell display can be mentioned.
- a library containing multiple different antigen-binding molecules is expressed in a target mammalian cell, and reporter activity, etc. is changed depending on the action that it shows on the same cell, so that cells having antigen-binding molecule genes with the desired pharmacological activity can be isolated using a flow cytometer, etc.
- a library containing multiple different antigen-binding molecules is expressed in target mammalian cells, and the expression level is stained with an antibody specific to the antigen-binding molecule, allowing cells having an antigen-binding molecule gene that can be stably and highly expressed to be isolated using a flow cytometer or the like.
- Characterization of antigen-binding molecules by panning is not limited to the above-mentioned method using phages or mammalian cells, and various methods can be used as long as the antigen-binding molecule can be displayed, including, but not limited to, a method of displaying the antigen-binding molecule on ribosomes, a method of displaying the antigen-binding molecule on mRNA, a method of displaying the antigen-binding molecule on viruses other than phages, and a method of displaying the antigen-binding molecule on bacteria such as E. coli.
- an antibody gene sequence from immune cells derived from an individual, or a method of obtaining an antibody protein sequence from serum.
- affinity evaluation in which an antibody gene sequence is extracted from immune cells it is possible to identify a sequence having affinity for the target antigen by inducing immune sensitization by administering a target antigen protein to an individual and extracting genes from immune cells having an antibody gene that binds to the target antigen.
- the antigen that induces immune sensitization is not limited to the above-mentioned protein, but may be a gene encoding the protein or a cell expressing the protein.
- the target individual may be, but is not limited to, a human, a mouse, a rat, a hamster, a rabbit, a monkey, a chicken, a camel, a llama, or an alpaca.
- methods for analyzing the nucleic acid sequence or occurrence frequency include, but are not limited to, a method of cloning a genetically modified organism having the nucleic acid sequence of each antigen-binding molecule and analyzing the cloned organism by the Sanger method using capillary electrophoresis, and a method of analyzing the cloned organism using a next-generation sequencer.
- the techniques for obtaining information on antigen-binding molecules derived from the display library or an individual can be applied to various property evaluations and are not limited to those mentioned above.
- Pharmacological activity evaluation The method of evaluating the pharmacological activity of a molecule is not particularly limited, and can be evaluated, for example, by measuring the neutralizing activity, agonist activity, or cytotoxic activity exhibited by the molecule.
- examples include antibody-dependent cell-mediated cytotoxicity (ADCC) activity, complement-dependent cytotoxicity (CDC) activity, T-cell-dependent cytotoxicity (TDCC) activity, and antibody-dependent cellular phagocytosis (ADCP) activity.
- ADCC antibody-dependent cell-mediated cytotoxicity
- CDC complement-dependent cytotoxicity
- TDCC T-cell-dependent cytotoxicity
- ADCP antibody-dependent cellular phagocytosis
- ADCC activity means an activity in which immune cells or the like bind to the Fc region of an antigen-binding molecule containing an antigen-binding domain that binds to a membrane-type molecule expressed on the cell membrane of a target cell via an Fc ⁇ receptor expressed on the immune cell, and the immune cell inflicts damage on the target cell.
- TDCC activity means the activity of a T cell damaging a target cell by bringing the target cell and the T cell into close proximity using a bi-specific antibody containing an antigen-binding domain that binds to a membrane molecule expressed on the cell membrane of the target cell and an antigen-binding domain for any of the constituent subunits of the T cell receptor (TCR) complex on the T cell, particularly an antigen-binding domain that binds to the CD3 epsilon chain.
- TCR T cell receptor
- Neutralizing activity refers to an activity that inhibits the biological activity of a ligand that has biological activity against cells, such as a virus or a toxin. That is, a substance that has neutralizing activity refers to a substance that binds to the ligand or a receptor to which the ligand binds, and inhibits the binding of the ligand to the receptor. A receptor that has been prevented from binding to a ligand by neutralizing activity cannot exert biological activity through the receptor.
- the antigen-binding molecule is an antibody
- an antibody having such neutralizing activity is generally called a neutralizing antibody, and the neutralizing activity can be measured by measuring the inhibitory activity of the binding of the ligand to the receptor.
- Ligands that have biological activity against cells are not limited to viruses and toxins, and the inhibitory activity of physiological actions caused by the binding of endogenous ligands such as cytokines and chemokines to receptors is also understood as neutralizing activity.
- neutralizing activity is not limited to the case of inhibiting the binding of a ligand to a receptor, but also includes the activity of inhibiting the function of a protein having biological activity, and an example of the function of the protein is enzyme activity.
- the method of evaluating the physical properties of a molecule is not particularly limited, but examples of physical properties include thermal stability, chemical stability, solubility, viscosity, light stability, long-term storage stability, non-specific adsorption, lipophilicity, and membrane permeability, and the various physical property evaluations exemplified above can be measured by methods known to those skilled in the art.
- the evaluation method is not particularly limited, but stability evaluations such as thermal stability, chemical stability, light stability, stability against mechanical stimuli, and long-term storage stability can be evaluated by measuring the decomposition, chemical modification, and association of the molecule before and after the treatment such as heat treatment, exposure to a low pH environment, light exposure, mechanical stirring, and long-term storage that are the purpose of the stability evaluation.
- Non-limiting examples of the measurement method for performing such stability evaluation include, but are not limited to, methods using chromatography such as ion exchange chromatography and size exclusion chromatography, mass spectrometry, and electrophoresis, and can be measured by various methods known to those skilled in the art.
- Other examples of physical property evaluations include, but are not limited to, evaluation of protein solubility using polyethylene glycol precipitation method, evaluation of viscosity using small angle X-ray scattering method, and evaluation of non-specific binding based on binding to extra cellular matrix (ECM).
- ECM extra cellular matrix
- physical property evaluations such as protein expression level, binding to a purification resin or purification ligand, and surface charge can be performed as long as they are measurable by methods known to those skilled in the art.
- the method of molecular kinetic evaluation is not particularly limited, but it can be evaluated by administering to animals such as mice, rats, monkeys, dogs, etc. and measuring the amount of the molecule in the blood over time after administration, and can be evaluated by a method widely known to those skilled in the art as Pharmacokinetics (PK) evaluation.
- PK Pharmacokinetics
- the method of evaluating the safety of a molecule is not particularly limited, and examples thereof include immunogenicity prediction tools such as ISPRI Web-Based Immunogenicity Screening (EpiVax), HLA binding evaluation of fragment peptides of antigen-binding molecules, detection of T cell epitopes and evaluation of immunogenicity using MAPPs (MHC-Associated Peptide Proteomics) or T cell proliferation evaluation, etc.
- evaluation can be performed as long as it is measurable by methods known to those skilled in the art, such as binding to rheumatoid factor (RF), evaluation of immune responses using PBMC or whole blood, and platelet aggregation evaluation.
- RF rheumatoid factor
- MBO Model Based Optimization
- TPE (Tree-structured Parzen Estimator) is a type of Bayesian optimization. TPE involves the process of calculating the expected improvement of an output value for a certain input value based on the conditional probability of the output value for the input value and the probability for the output value for the function to be optimized. In other words, TPE is a method of optimizing a function using input values that maximize the expected improvement calculated in this way.
- a method of making an antibody includes culturing a host cell containing a nucleic acid encoding the antibody under conditions suitable for expression of the antibody candidate molecular compound described herein, and optionally recovering the antibody from the host cell (or host cell culture medium).
- the isolated nucleic acid encoding the antibody may encode an amino acid sequence comprising the VL and/or an amino acid sequence comprising the VH of the antibody (e.g., the light and/or heavy chains of the antibody).
- the host cell containing such a nucleic acid may contain (e.g., transformed with) (1) a vector containing a nucleic acid encoding an amino acid sequence comprising the VL and an amino acid sequence comprising the VH of the antibody, or (2) a first vector containing a nucleic acid encoding an amino acid sequence comprising the VL of the antibody and a second vector containing a nucleic acid encoding an amino acid sequence comprising the VH of the antibody.
- the host cell is eukaryotic (e.g., Chinese Hamster Ovary (CHO) cells) or lymphoid cells (e.g., Y0, NS0, Sp2/0 cells).
- Suitable host cells for cloning or expressing antibody-encoding vectors include prokaryotic or eukaryotic cells.
- antibodies may be produced in bacteria, particularly if glycosylation and Fc effector functions are not required.
- U.S. Patent Nos. 5,648,237, 5,789,199, and 5,840,523. See also Charlton, Methods in Molecular Biology, Vol.
- the antibody may be isolated in a soluble fraction from the bacterial cell paste and can be further purified.
- the target substance is a peptide compound or a cyclic peptide compound
- it can be produced by liquid phase synthesis, solid phase synthesis using Fmoc synthesis, Boc synthesis, etc., or a combination of these.
- Liquid phase synthesis and solid phase synthesis can be carried out by methods well known to those skilled in the art.
- Solid phase synthesis is a method in which a compound is bound to a solid and the compound is chemically reacted with a reagent on the solid resin to synthesize the target compound.
- Solid phase peptide synthesis is a method in which a desired amino acid or peptide is bound to a solid resin, and further desired amino acids or peptides are sequentially linked to the amino acids or peptides bound to the solid resin to extend the peptide chain and synthesize the peptide.
- the peptide bound to this solid resin can be separated from the solid resin to obtain the target peptide.
- FIG. 1 is a block diagram showing an example of a drug discovery system 100 including a molecular design device 1 according to the first embodiment.
- the drug discovery system 100 is a system for creating new objects suitable as drug candidates.
- the system provides a method for generating new objects having predetermined properties, such as a specific biological activity (e.g., binding to a specific protein).
- Drugs include, but are not limited to, small molecule drugs, medium molecule drugs, biological drugs, cells, nucleic acid drugs, biopharmaceuticals, or potential active agents such as other active agents.
- Objects include molecular structures that have a desired or defined biological activity (e.g., binding to a specific protein preferentially over other proteins).
- Molecules that are drug candidates include biomolecules and compounds, including various molecules such as nucleic acids, peptides, cyclic peptides, proteins, antibodies, target molecule binding molecules, polymer compounds, medium molecule compounds, and small molecule compounds.
- the drug discovery system 100 may include a selection device for a molecule that interacts with a drug target, a lead molecule creation device, etc.
- the drug discovery system 100 may be, for example, an information processing system configured including the disclosure of WO2020/246617.
- the drug discovery system 100 includes a molecular design device 1.
- the molecular design device 1 searches for candidate molecules having desired properties and outputs information on the identified candidates.
- the output information is building block combination information of the candidate molecules.
- the molecular design device 1 identifies candidate molecules and outputs building block combination information of the identified candidate molecules.
- a candidate molecule is a molecule that is expected to have desired properties.
- the candidate molecule building block combination information is information about the candidate molecule, and is building block combination information of some or all of the candidate molecule.
- the output candidate molecule building block combination information may include information indicating one candidate molecule, or may include information indicating multiple candidate molecules.
- the drug discovery system 100 uses the candidate molecule building block combination information output from the molecular design device 1 to select a new object suitable as a drug candidate. For example, the drug discovery system 100 generates candidate molecules based on the candidate molecule building block combination information output from the molecular design device 1, experimentally evaluates the properties of the candidate molecules, and selects a molecule having the desired properties as a new object suitable as a drug candidate based on the results of the property evaluation. In other words, the drug discovery system 100 can create a new object suitable as a drug candidate based on the results of the property evaluation performed by actually generating candidate molecules using the candidate molecule building block combination information output from the molecular design device 1. In such cases, the candidate molecule can be said to be a molecule that can be a verification target for narrowing down candidates for the main components of a pharmaceutical product.
- the building block combination information of a molecule is information on a combination of some or all of the building blocks of the molecule.
- the range of the sequence may be arbitrarily settable.
- a building block is a unit that constitutes a molecule.
- Building block combination information of a molecule relates to a combination of building blocks that constitute the molecule.
- an array containing individual components may be referred to as a "combination.”
- sequence may also be used as an example of a building block combination.
- the molecule to be designed is, for example, a protein.
- the building blocks are amino acids
- the building block combination information of the molecule is, for example, information of the amino acid sequence of the protein.
- the molecule to be designed is, for example, a nucleic acid.
- the building blocks are nucleotides.
- the sequence of the molecule is, for example, the nucleotide sequence of the nucleic acid.
- the molecular building block combination information is information about the nucleotide sequence. More specifically, when the designed molecule is an antibody, the molecular sequence is an amino acid sequence, and the building block is an amino acid.
- the building block combination information of the molecule is, for example, the full-length antibody sequence, for example, the amino acid sequence of VH or VL, or the sequence of a part of the antibody, such as CDR, FR, etc.
- the molecular sequence is an amino acid sequence containing unnatural amino acids
- the building blocks are natural amino acids and unnatural amino acids.
- the building block combination information of the molecule is information on the amino acid sequence containing the unnatural amino acids.
- the building block combination of the molecule is a combination of fragments, and the building blocks are fragments (fragment molecules that make up a small molecule).
- the molecular sequence is the base sequence and the building blocks are bases.
- the desired property is a property required for a new object suitable for a drug candidate, and can be set arbitrarily.
- Non-limiting examples of the property include the binding ability to a specific in vivo target, binding ability, pharmacological activity, physical properties, kinetics, and safety, but are not limited to these.
- the molecule to be designed is an antibody, the property is, for example, the binding ability of the drug to a specific antigen.
- the property is, for example, the translation ability of the protein.
- the molecular design device 1 is an example of an inference device that infers a molecule having desired properties from prediction information described below.
- the desired properties include the ability to bind to a target molecule, efficacy, and drug-like properties including membrane permeability.
- inferring a molecule having desired properties from prediction information may be rephrased as identifying a candidate molecule.
- a molecule expected to have a desired property is one that exhibits good prediction value and low prediction uncertainty by a predictive model for the desired property.
- the molecular design device 1 has an inference unit 111 including, for example, a sequence information processing unit 111a, a property prediction unit 111b, a prediction information processing unit 111c, and a candidate molecule specifying unit 111d.
- the sequence information processing unit 111a prepares a sequence information set and outputs the prepared sequence information set to the property prediction unit 111b.
- the sequence information set is a set of sequence information of multiple molecules.
- the sequence information processing unit 111a may autonomously generate sequence information for each molecule, or may input sequence information for each molecule from another device.
- the sequence information set may be referred to as a building block combination information set, and the sequence information processing unit 111a may be referred to as a building block combination information processing unit.
- the property prediction unit 111b predicts the properties of a molecule for each element of the sequence information set of molecules input from the sequence information processing unit 111a, and outputs prediction information relating to the predicted properties to the prediction information processing unit 111c.
- the prediction information processing unit 111c acquires prediction information for each molecule from the property prediction unit 111b, and outputs the acquired prediction information to the candidate molecule identifying unit 111d.
- the candidate molecule specifying unit 111d specifies sequence information of at least one candidate molecule based on the prediction information.
- the candidate molecule specifying unit 111d may output output data indicating the specified sequence information to another device, or may store the output data in the storage unit 14 (described later).
- the sequence information set may be a set of sequence information indicating a virtual molecular sequence (sometimes called a "virtual sequence") generated by machine learning, or may be a set of sequence information of a real molecular sequence (sometimes called a "real sequence"), or may be a set including both sequence information indicating a virtual sequence and sequence information indicating a real sequence.
- the sequence information set may have sequence information of a virtual sequence generated by a virtual sequence generation model.
- the sequence information set may also have sequence information indicating a real sequence obtained, for example, from an existing database or as an experimental result. It may also be sequence information extracted from all candidate combinations of building blocks by combinatorial optimization, which will be described later.
- the sequence information processing unit 111a performs a process of preparing a sequence information set.
- the sequence information processing unit 111a may further include a sequence information set acquisition unit (not shown) that executes an acquisition process for acquiring sequence information that is output for certain input information using a machine learning model.
- the sequence information processing unit 111a acquires a sequence information set and outputs the acquired sequence information set to the characteristic prediction unit 111b.
- the property prediction unit 111b receives sequence information of a plurality of molecules from the sequence information processing unit 111a.
- the property prediction unit 111b calculates a property prediction value and an estimate of the prediction uncertainty for each of the input sequence information of a plurality of molecules.
- the property prediction unit 111b includes a predicted value calculation unit 111x and a prediction uncertainty estimation unit 111y.
- the predicted value calculation unit 111x inputs molecular sequence information and calculates a property predicted value using a prediction model.
- the property predicted value is a property value predicted using the prediction model.
- the prediction uncertainty estimation unit 111y estimates the uncertainty of the property predicted value calculated by the predicted value calculation unit 111x. A value indicating the estimated uncertainty is called an estimated value of prediction uncertainty.
- the predictive model is generated, for example, by learning based on training data having sequence information of individual molecules and multiple sets of characteristic evaluations for the molecules.
- the predicted value calculation unit 111x may calculate characteristic predicted values for a plurality of characteristics. When characteristic values are predicted for a plurality of characteristics, the prediction uncertainty estimation unit 111y may estimate the prediction uncertainty for the characteristic predicted value of each of the characteristics, or may estimate the prediction uncertainty for any one of the characteristic predicted values.
- the characteristic prediction unit 111b outputs, as prediction information for each molecule, a characteristic prediction value and an estimate of the prediction uncertainty of the characteristic prediction value to the prediction information processing unit 111c.
- the candidate molecule identifying unit 111d performs a process of inferring molecules having desired properties based on the prediction information, and identifies candidate molecules.
- the candidate molecule specifying unit 111d infers a molecule having a desired property based on the property prediction value and the estimated value of the prediction uncertainty for the sequence information of each molecule output from the property prediction unit 111b.
- the candidate molecule specifying unit 111d specifies at least one candidate molecule from a plurality of molecules whose sequence information is included in the sequence information set from the sequence information processing unit 111a based on the prediction information obtained from the prediction information processing unit 111c.
- the candidate molecule specifying unit 111d may use a property prediction value for at least one property, or may use property prediction values for a plurality of properties. In specifying the candidate molecule, the candidate molecule specifying unit 111d may use a plurality of property prediction values and an estimated value of the prediction uncertainty for at least one property prediction value as prediction information. That is, in specifying the candidate molecule, the constraint of the second property value (the prediction uncertainty for the property prediction value) may be taken into consideration when optimizing the first property value (the property prediction value). Furthermore, the candidate molecule identifying unit 111d may identify molecules having desired characteristics as candidate molecules based on characteristic quality values, which will be described later.
- the candidate molecule specifying unit 111d may select, for example, at least one molecule whose predicted property value and prediction uncertainty each satisfy a predetermined condition as a candidate molecule.
- the candidate molecule specifying unit 111d may select, for example, at least one molecule whose predicted property values and an estimated value of the prediction uncertainty for at least one predicted property value each satisfy a predetermined condition as a candidate molecule.
- the candidate molecule identifying unit 111d may identify at least one candidate molecule based on a characteristic quality value calculated based on the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the characteristic quality value may be calculated, for example, in the prediction information processing unit 111c or in the characteristic prediction unit 111b.
- the characteristic quality value is a value based on the characteristic prediction value and an estimate of the uncertainty of the prediction.
- the characteristic quality value can also be regarded as a response variable calculated using a predetermined function with the characteristic prediction value and the estimate of the uncertainty of the prediction as explanatory variables.
- the characteristic quality value may be an index value that gives a smaller value as the estimated value of the uncertainty of the characteristic prediction becomes larger, and gives a larger value as the characteristic prediction value becomes larger.
- the characteristic quality value may be any value that increases as the characteristic prediction value increases and decreases as the estimated value of the uncertainty of the prediction increases.
- the characteristic quality value is the difference between the characteristic prediction value and a predetermined coefficient multiplied by the estimated value of the uncertainty of the prediction in a linear region, or the difference between the characteristic prediction value multiplied by the predetermined coefficient and the estimated value of the uncertainty of the prediction.
- the predetermined coefficient is a positive constant and indicates the degree of contribution of the characteristic value or the estimated value of the uncertainty of the prediction to the characteristic quality value.
- the property quality value is an index that shows a larger value as the property prediction value increases and the estimated value of the prediction uncertainty decreases.
- a larger property quality value means that the molecule actually generated is more likely to exhibit the expected property.
- the characteristic quality value does not have to be the difference between the characteristic prediction value in the linear domain and a predetermined coefficient multiplied by the estimated value of the prediction uncertainty.
- the characteristic quality value may be a value obtained by raising either the characteristic prediction value or the estimated value of the prediction uncertainty a predetermined number of times and calculating the difference or quotient with the other value.
- the characteristic prediction value may be determined by dividing the characteristic prediction value by the estimated value of the prediction uncertainty or by other types of calculations such as calculations in the logarithmic domain.
- the characteristic quality value may be calculated using both the prediction value and its uncertainty, and there are no limitations on the functions, procedures, etc.
- the candidate molecule identifying unit 111d identifies molecules having desired properties based on the calculated property quality value.
- the candidate molecule identifying unit 111d increases the likelihood that a molecule having a desired property will be identified as a molecule having a desired property as long as the molecule has a better property quality value among the molecules for which property prediction values have been obtained by the property prediction unit 111b.
- good property quality value means that a property quality value indicating the possibility of being predicted to have a desired property or a high level of the property is obtained.
- the candidate molecule specifying unit 111d may specify, as a molecule having a desired property, a molecule whose property quality value satisfies a predetermined condition among the molecules whose property predicted values have been calculated in the property prediction unit 111b.
- the candidate molecule specifying unit 111d may rank and rearrange a plurality of molecules whose property predicted values have been calculated in the property prediction unit 111b using the property quality value, and specify, from the rearranged molecules, molecules within a predetermined range as molecules having a desired property.
- the candidate molecule specifying unit 111d may specify a certain number of a plurality of molecules as molecules having a desired property so that molecules having better property quality values are given priority.
- the candidate molecule specifying unit 111d may specify, as a molecule having a desired property, a molecule whose property predicted value and an estimated value of the prediction uncertainty each satisfy a predetermined condition.
- the accuracy of the property prediction value calculated by the property prediction unit 111b affects the identification of the molecule.
- differences occur in the accuracy of property prediction based on sequence information.
- the accuracy of prediction may be rephrased as the reliability of prediction or the certainty of prediction.
- Low prediction accuracy can be rephrased as low prediction reliability, low prediction certainty, or high prediction uncertainty.
- High prediction accuracy can be rephrased as high prediction reliability, high prediction certainty, or low prediction uncertainty.
- not only the predicted property value but also the estimated value of the uncertainty of the prediction is used as an index in the inference process of a molecule having a desired property.
- the candidate molecule identifying unit when identifying a candidate molecule, uses a mathematical model that predicts the property value and estimates the uncertainty of the prediction, rather than a mathematical model that predicts the property value. This makes it possible to prevent a molecule with a high predicted property value but a low prediction probability from being identified as a candidate molecule by the inference process. That is, when a molecule is actually generated, it is possible to prevent the identification of a molecule that is highly likely to not have the properties as expected. This makes it possible to further reduce the burden required for drug discovery.
- Second Embodiment 2 is a block diagram showing an example of a drug discovery system 100 including a molecular design device 1 according to the second embodiment.
- the following description will focus on the differences from the first embodiment. Unless otherwise specified, the description of the first embodiment will be used for the points in common.
- the molecular design device 1 is an example of an inference device that infers molecules expected to have desired properties from prediction information, which will be described later.
- the molecular design device 1 has an inference unit 111 including, for example, a sequence information processing unit 111a, a property prediction unit 111b, a prediction information processing unit 111c, and a candidate molecule specifying unit 111d.
- the inference unit 111 executes a combinatorial optimization algorithm (sometimes simply referred to as "combinatorial optimization" in this application) to infer molecules that are expected to have desired properties.
- the sequence information processing unit 111a executes combinatorial optimization to extract sequence information of an arbitrary number of molecules from a preset search space. That is, according to the sequence information processing unit 111a, a process of extracting sequence information of a part of molecules expected to have desired properties from candidates of sequences of building blocks in the search space by combinatorial optimization using updated extraction parameters is repeated.
- the search space can be set arbitrarily. For example, the search space may be set based on training data used for training a prediction model used for property prediction.
- the number of times of sequence information acquisition may be any natural number equal to or greater than 1.
- the number of times of sequence information acquisition may be set in advance in the inference unit 111.
- the inference unit 111 may count the number of times that the sequence information processing unit 111a actually extracts sequence information as the number of repetitions. In this embodiment, it means that the sequence information of the molecule included in the search space can be an element of a sequence information set.
- the sequence information processing unit 111a outputs the extracted sequence information of the multiple molecules to the property prediction unit 111b.
- the property prediction unit 111b uses a prediction model to generate prediction information indicating the properties of a molecule based on the sequence information of each molecule input from the sequence information processing unit 111a, and outputs the prediction information generated for each molecule to the prediction information processing unit 111c.
- the prediction model is generated by learning based on training data configured to include, for example, multiple sets of sequence information for each molecule and property evaluation results for the molecule.
- the prediction information for each molecule includes, for example, a predicted characteristic value for each molecule.
- the characteristic prediction unit 111b may include a predicted value calculation unit 111x that calculates the predicted characteristic value from the sequence information for each molecule.
- the prediction information for each molecule may include, for example, a predicted characteristic value for each molecule and an estimate of the uncertainty of the predicted characteristic value.
- the characteristic prediction unit 111b may be configured to include a prediction uncertainty estimation unit 111y that calculates an estimate of the uncertainty of the predicted characteristic value.
- the prediction information for each molecule output from the characteristic prediction unit 111b may include not only the predicted characteristic value for each molecule, but also the estimate of the uncertainty of the predicted characteristic value.
- the predicted information processing unit 111c acquires predicted information for each molecule from the property prediction unit 111b.
- the predicted information includes a property predicted value and an evaluation value of the uncertainty of the property predicted value
- the predicted information processing unit 111c may calculate a property quality value for each molecule of sequence information of each molecule included in the sequence information set based on the property predicted value and the uncertainty of the property predicted value.
- the sequence information processing unit 111a uses the property quality value calculated by the predicted information processing unit 111c as predicted information to update the extraction parameters. Specific examples of the extraction parameters will be described later. If the counted number of repetitions is less than the preset number of times of acquiring sequence information, the sequence information processing unit 111a updates the extraction parameters based on the acquired prediction information.
- the sequence information processing unit 111a stops the processing of the combinatorial optimization algorithm, and the prediction information processing unit 111c outputs the prediction information for each molecule input from the property prediction unit 111b to the candidate molecule identification unit 111d.
- the candidate molecule identification unit 111d identifies at least one candidate molecule from the sequence information set in each round based on the prediction information for each molecule input from the prediction information processing unit 111c.
- the prediction information processing unit 111c identifies at least one candidate molecule from the sequence information of the molecules extracted N times.
- the candidate molecule identification unit 111d may also identify at least one candidate molecule from the sequence information set of the molecules extracted N times. In addition, when the prediction information includes a property quality value, the candidate molecule identification unit 111d can identify a candidate molecule based on the property quality value for each molecule. The candidate molecule identifying unit 111d outputs sequence information of the identified candidate molecule.
- Bayesian optimization can be applied, but is not limited to this.
- any algorithm may be used, such as, for example, TPE (Tree-Structured Parzen Estimator), and as an evolutionary algorithm method, for example, NSGA II (Elitist Non-dominated Sorting Genetic Algorithm).
- the molecular design device 1 may use a mathematical model for estimating characteristic values from sequence information as the objective function of combinatorial optimization, or may use a mathematical model for estimating characteristic quality values. The output of the mathematical model is then optimized by combinatorial optimization.
- a mathematical model for estimating characteristic quality values it is possible to prevent a molecule with a low prediction certainty even if the characteristics indicated by the characteristic prediction value are favorable from being identified as a candidate molecule. In other words, it is possible to reduce the possibility of identifying a molecule that is unlikely to have the characteristics expected when the molecule is actually generated as a candidate molecule. This can further reduce the burden required for drug discovery.
- FIG. 3 is an explanatory diagram illustrating an example of a sequence information set in the first and second embodiments.
- the "sequence ID" in FIG. 3 is an identifier that identifies the sequence information of each molecule included in the sequence information set.
- FIG. 3 shows sequence information for each molecule included in the sequence information set.
- the sequence information of each molecule shows multiple building blocks and the arrangement of each building block.
- the arrangement of each building block is shown between information D101 and information D102.
- information D101 is information on the building block at position H1 of the sequence of each molecule.
- VHL0001, VHL0002, VHL0003, VHL0004, etc. all indicate a molecule in which the building block at position H1 is M.
- FIG. 4 is a diagram showing an example of the hardware configuration of the molecular design device 1 according to the first and second embodiments.
- the molecular design device 1 has a control unit 11 including a processor 91 such as a CPU and a memory 92 connected to each other by a bus, and has a computer system that executes a predetermined program.
- the computer system can also be considered to function as the molecular design device 1 including the control unit 11, input unit 12, communication unit 13, storage unit 14 and output unit 15 by executing the program.
- "executing a program” or “running a program” includes the meaning of executing a process instructed by each of one or more commands written in the program.
- the processor 91 reads out a program stored in the storage unit 14 and stores the read out program in the memory 92.
- the processor 91 executes the program stored in the memory 92, thereby functioning as the molecular design device 1 that includes a control unit 11, an input unit 12, a communication unit 13, a storage unit 14, and an output unit 15.
- the program is stored in advance in the storage unit 14, for example.
- the control unit 11 controls the operation of various functional units of the molecular design device 1.
- the control unit 11 has the function of an inference unit 111 that executes an inference process to infer molecules having desired properties, for example.
- the input unit 12 includes input devices such as a mouse, a keyboard, and a touch panel.
- the input unit 12 may be configured as an interface that connects these input devices to the molecular design device 1.
- the input unit 12 accepts input of various information to the molecular design device 1. For example, training data for a predictive model is input to the input unit 12.
- the communication unit 13 includes a communication interface for connecting the molecular design device 1 to an external device.
- the communication unit 13 communicates with the external device via wired or wireless communication.
- the external device is, for example, a device that transmits training data for a predictive model.
- the external device may also be a device to which candidate molecule sequence information is transmitted.
- the molecular design device 1 may be realized as a server device that is connected to the Internet and capable of communicating with one or more terminal devices.
- the storage unit 14 is configured using a non-transitory computer-readable recording medium such as a magnetic hard disk device or a semiconductor storage device.
- the storage unit 14 stores various information related to the molecular design device 1.
- the storage unit 14 stores information input via the input unit 12 or the communication unit 13, for example.
- the storage unit 14 stores various information generated by the processing executed by the control unit 11, for example.
- the storage unit 14 stores, for example, a predictive model, sequence information, various parameters used in combinatorial optimization (which may include the above-mentioned extracted parameters), and the like.
- the storage unit 14 stores, for example, the above-mentioned programs.
- the output unit 15 outputs various types of information.
- the output unit 15 includes a display device such as a CRT (Cathode Ray Tube) display, a liquid crystal display, or an organic EL (Electro-Luminescence) display.
- the output unit 15 may be configured as an interface that connects these display devices to the molecular design device 1.
- the output unit 15 outputs information input to the input unit 12 or the communication unit 13, for example.
- the output unit 15 may output various types of information generated by the processing executed by the control unit 11, for example.
- FIG. 5 is a diagram showing an example of the configuration of the control unit 11 according to each of the first and second embodiments.
- the control unit 11 includes an inference unit 111, an input control unit 112, a communication control unit 113, a memory control unit 114, and an output control unit 115.
- the inference unit 111 executes the above-mentioned inference process.
- the input control unit 112 controls the operation of the input unit 12.
- the communication control unit 113 controls the operation of the communication unit 13.
- the memory control unit 114 controls the operation of the memory unit 14.
- the output control unit 115 controls the operation of the output unit 15.
- Fig. 6 is a flowchart showing an example of the flow of the process executed by the molecular design device 1 according to the first embodiment.
- the sequence information processing unit 111a acquires a sequence information set including sequence information of a plurality of molecules (step S101).
- the property prediction unit 111b calculates a predicted property value for each molecule in the sequence information set, and calculates an estimate of the uncertainty of the property value (step S102).
- the candidate molecule identifying unit 111d identifies a candidate molecule based on the predicted property value and the estimated value of the uncertainty of the prediction calculated for each molecule (step S103).
- the method is not limited.
- the standard deviation of the predicted property value may be calculated as the estimated value of the uncertainty of the prediction.
- conformal prediction may be used in the quantification of the uncertainty.
- the characteristic predicting unit 111b may divide the learning data into a plurality of parts, calculate an error (prediction error) for the prediction for each part, and quantify the uncertainty based on the distribution of the errors.
- the candidate molecule specifying unit 111d outputs the sequence information of the specified candidate molecule as candidate molecule sequence information using the output unit 15 (step S104). After that, the control unit 11 ends the process of FIG.
- the prediction information processing unit 111c may calculate a characteristic quality value based on the characteristic prediction value obtained in step S102 and the estimated value of the prediction uncertainty prior to step S103.
- the prediction information processing unit 111c may calculate the mean variance (MV) as an example of the characteristic quality value.
- the risk tolerance parameter is a positive real value indicating the tolerance of the characteristic prediction value f'(x) to the uncertainty g(x), that is, the reliability of the predicted value f'(x).
- the prediction information processing unit 111c calculates the standard deviation as an example of the uncertainty estimate g(x).
- MV is sometimes applied to portfolio optimization in financial engineering. In financial engineering, MV is sometimes used as an index that combines the expected reward as the average and the variance of the reward as the risk. MV is an index that can be used in various fields and purposes regardless of the optimization algorithm (see, for example, Q. Zhu and V. Y. F. Tan: Thompson Sampling Algorithms for Mean-Variance Bandits (2020 ICML), S. Takemori: Distributionally-Aware Kernelized Bandit Problems for Risk Aversion (2022 ICML)).
- the prediction information processing unit 111c may execute calculations using uncertainty g(x) as a constraint condition, which is an example of a characteristic quality value. For example, the prediction information processing unit 111c may set a threshold value ⁇ for a certain uncertainty g(x), and when the uncertainty g(x) is equal to or smaller than ⁇ , set the characteristic quality value to the characteristic predicted value f'(x), and when the uncertainty g(x) is greater than ⁇ , set the characteristic quality value to a sufficiently small value (for example, the minimum value that f'(x) can take).
- a sufficiently small value for example, the minimum value that f'(x) can take.
- the prediction information processing unit 111c may calculate the characteristic quality value based on a plurality of physical property values. For example, the prediction information processing unit 111c sets a threshold value ⁇ 1 for the first physical property value g1(x) and a threshold value ⁇ 2 for the second physical property value g2(x). When the first physical property value g1(x) is smaller than ⁇ 1 and the second physical property value g2(x) is smaller than ⁇ 2, the prediction information processing unit 111c may set the characteristic quality value to the characteristic prediction value f'(x).
- the prediction information processing unit 111c may set the characteristic quality value to a sufficiently small value (for example, the minimum value that f'(x) can take).
- the prediction information processing unit 111c may calculate the characteristic quality value based on the predicted value Pr(x ⁇ F
- the characteristic quality value approaches 0, and the higher the probability of satisfying the constraint (the closer to 1), the closer the characteristic quality value approaches the value of the objective index itself.
- Equation (2) Such a characteristic quality value is expressed by Equation (2).
- the number of constraints may be one or more.
- the prediction information processing unit 111c may set constraint conditions for each physical property value and multiply the characteristic prediction value f'(x) by the prediction model by the probability that all conditions are satisfied.
- the probability that all conditions are satisfied may be the product of the probabilities that the constraint conditions for each physical property value are satisfied.
- FIG. 7 is a flowchart showing an example of the flow of processing executed by the molecular design device 1 in the second embodiment.
- the array information processing unit 111a acquires a data set D including a plurality of data pairs, each of which is a pair of an input value x and a measurement value y, as training data (step S201).
- the storage unit 14 stores the data set D input from the input unit 12 or the communication unit 13 in advance.
- the array information processing unit 111a reads out the data set D from the storage unit 14.
- the sequence information processing unit 111a uses the acquired training data to, for example, execute Gaussian process regression to learn a prediction model for predicting a measurement value for an input value (step S202).
- the inference unit 111 executes combinatorial optimization to identify candidate molecules (step S203).
- the candidate molecule identification unit 111d outputs sequence information of the identified candidate molecules using the output unit 15 (step S204). Then, the process of FIG. 7 ends.
- Offline MBO is a method for searching for optimal molecules within a "proxy" predictive model generated from acquired data.
- the method involves black-box optimization, also known as inverse analysis, which treats the representative model as a black box.
- MBO is a technique in which a predictive model is trained using previously accumulated experimental data as training data, drug candidate molecules are evaluated based on characteristic values obtained using the trained predictive model, and the predictive model is updated based on the evaluation results.
- the predictive model may be referred to as a representative model (proxy model).
- discrete input values can be the optimization target of MBO.
- MBO a measurement value y for a discrete input value x ⁇ X is given as an unknown function f(x)
- a data set D is given that has n sets of measurement values y for discrete input values x (n is an integer equal to or greater than 2), and it is assumed that measurement values y for additional input values x cannot be obtained.
- the measurement value y for the discrete input value x provides the correct answer (also called an oracle) for the function f(x).
- a general MBO aims to find an input value x that maximizes the function f(x) in the discrete value space X.
- the above method can be considered as a problem of finding an input value x that maximizes the characteristic quality value f(x)- ⁇ g(x) instead of the function f(x).
- an array is applied as the input value x, and the function f(x) corresponds to a function for predicting a real value corresponding to the characteristic value.
- the penalty function g(x) is a function that gives a real value corresponding to the uncertainty of the value of the function f(x).
- ⁇ corresponds to the coefficient ⁇ above.
- the input value x is expressed using a vector representation.
- the array information processing unit 111a trains a prediction model in advance using a data set D having multiple pairs of measured values y for discrete input values x as training data.
- the array information processing unit 111a sets the function value of the objective function for the discrete input value x as a predicted value y', and determines the parameters of the prediction model so as to minimize an index value representing the magnitude of the difference between the predicted value y' and the measured value y for the discrete input value x.
- the inference unit 111 samples the discrete input value x, and updates the parameters of the combinatorial optimization based on the prediction results obtained using the prediction model for each sample.
- the inference unit 111 can use, for example, the above-mentioned TPE in combinatorial optimization.
- TPE is a black-box optimization algorithm that combines Bayesian optimization with Parzen window density estimation.
- TPE can handle categorical parameters, making it possible to apply it to building block-based molecular design.
- the inference unit 111 implements the function of TPE and is applied in molecular design as follows.
- TPE is a method aiming at maximizing the expected improvement (EI) of an objective function.
- the sequence information processing unit 111a applies a building block combination that constitutes a drug candidate molecule to a search space, and assigns the output from the prediction model to an objective score set Y corresponding to an input molecule set X sampled from the search space.
- the sequence information processing unit 111a sets the characteristic prediction value obtained by the prediction model as the objective score of TPE.
- the sequence information processing unit 111a samples the input molecule set X from the search space, that is, assigns building block candidates for each modification position in the building block combination.
- the drug candidate molecule is a protein
- a candidate amino acid is assigned for each modification position in the protein sequence.
- an Optuna implementation can be used to execute the TPE sampler.
- the characteristic prediction unit 111b calculates (evaluates) an estimated value of the objective function using the prediction model for the input values determined by sampling.
- the prediction information processing unit 111c outputs the selected input value and the calculated estimated value to the sequence information processing unit 111a, and the sequence information processing unit 111a updates the parameters of the optimization algorithm using the input value and the estimated value input from the prediction information processing unit 111c (update).
- the sequence information processing unit 111a assigns the selected input value and the calculated estimated value to an input molecule set X sampled from the search space and a set Y of objective scores, respectively.
- the inference unit 111 repeats the processes of data division, sampling, evaluation, and update, so that the evaluation results of the input values (sample values) obtained by sampling are reflected in the prediction model, and a combination of building blocks with a larger estimated value of the objective function is probabilistically searched for.
- FIG. 8 shows an example of the flow of a combinatorial optimization process executed by the molecular design device 1 according to this application example.
- the inference unit 111 sets the initial value of the number of times array information is acquired to 0.
- Loop R20 includes the processes of steps S203a to S203c.
- the control unit 11 sets the execution condition of loop R20 as the case where the number of repetitions is equal to or less than a predetermined number of samplings N. The number of samplings corresponds to the number of times array information is acquired.
- the sequence information processing unit 111a obtains sequence information of a plurality of molecules according to a combinatorial optimization algorithm (step S203a). For example, when TPE is used as the combinatorial optimization algorithm, the sequence information processing unit 111a divides a set of input values into two sets based on an estimated value (corresponding to the above-mentioned predicted characteristic value) obtained using a prediction model for each input value and a predetermined threshold value ⁇ . One set (called the "first set”) is composed of input values whose estimated value is equal to or greater than the threshold value ⁇ . The other set (called the "second set”) is composed of input values whose estimated value is less than the threshold value ⁇ . The sequence information processing unit 111a samples input values that maximize the expected improvement of the objective function.
- the expected improvement corresponds to the increase in the expected value of the objective function before and after the update, and is known to be proportional to p(x
- y1) indicates the density distribution of the input value x for the first set.
- y2) represents the density distribution of the input value x for the second set. That is, EI is an example of an extraction parameter.
- the array information processing unit 111a samples the input value that maximizes the calculated EI.
- the characteristic prediction unit 111b performs sequence evaluation using a prediction model. Here, the characteristic prediction unit 111b uses the prediction model to calculate an estimate of an objective function for the sampled input values (step S203b).
- the sequence information processing unit 111a assigns the selected input value and the calculated estimated value to an input molecule set X sampled from a search space and a set of objective scores Y.
- a new input value is added to the input molecule set X sampled from the search space in association with the objective function, thereby updating extraction parameters of the combinatorial optimization algorithm related to extraction of a sequence information set (step S203c).
- the inference unit 111 updates (increments) the number of repetitions by adding 1. When the number of repetitions is equal to or less than the sampling count N, the inference unit 111 repeats the process of steps S203a to S203c.
- the inference unit 111 ends the process of loop R20 when the number of repetitions exceeds the number of samplings N.
- the candidate molecule specification unit 111d may output a predetermined number of building block combinations in descending order from the one having the highest estimated value of the objective function as candidate molecule sequence information using the output unit 15. Then, the process of FIG. 8 ends.
- step S203d the inference unit 111 sets the sampling count N and the initial value 0 of the repetition count.
- step S203e the inference unit 111 adds 1 to the repetition count at that time, and determines whether the repetition count has reached N. If it is determined that it has reached N (step S203e YES), the process of FIG.
- step S203e NO If it is determined that it has not reached N (step S203e NO), the process proceeds to step S203c. After the process of step S203c, the process proceeds to step S203a. Therefore, the process in FIG. 9 differs from the process in FIG. 8 in that the inference unit 111 does not update the extraction parameters when the number of repetitions exceeds the number of samplings N.
- the termination condition is set when the number of repetitions exceeds the number of times sequence information is obtained, but this is not necessarily limited to this.
- the termination condition may be, for example, a target value for the estimated value of the objective function that is set in advance, and the estimated value of the objective function reaching that target value.
- MV-TPE MV-TPE
- Mean-TPE the predicted mean of Gaussian process regression as the objective function refers to the characteristic value itself.
- a prediction model was trained using a Gaussian process (GP) based on the GFP sequence with a mutation number of 2 or less from the parent sequence avGFP (Aequorea victoria GFP) as training data.
- the prediction model obtained by this training may be called a proxy model.
- the parent sequence avGFP may also be simply called the parent sequence or template GFP. This procedure verifies sequences with fewer residue substitutions from the parent sequence, thus realizing a practical molecular optimization process. In this verification, a GFP sequence with a residue substitution (edit distance) of 2 or less was adopted, as illustrated in FIG. 10.
- the Light Gradient Boosting Machine (LightGBM) was trained using all data in the GFP dataset. LightGBM may be used for ranking, classification, and other tasks based on decision tree algorithms. With this setting, the representative model covers the GFP sequences around the parent sequence, and the pseudo-ground-truth model covers a wider GFP space as a search space.
- the search space was defined as the mutations in the top 100 sequences of the training data.
- the search space included 2 to 5 amino acid candidates for each of the 37 candidate mutation sites, as shown in Figure 17.
- parameters for TPE as shown in Figure 18, the number of samplings was set to 3000, and multivariate "none" (i.e., the objective function has one variable).
- the top 10 sequences of the training data were used for warm start initialization for both MV-TPE and Mean-TPE.
- Figs. 7 and 8 were performed using MV and the average as the objective function.
- the average was used as a comparative example.
- MV and the average each include a characteristic value, and the brightness of GFP was used as the characteristic value.
- Fig. 11 shows the average as the estimated value of the objective function in the upper row as the optimization trajectory by TPE, and MV in the lower row. Both show the results of 10 optimization processes.
- the dashed lines show the estimated values for each sample in each round.
- Figure 12 shows the relationship between the mean (GP Mean) and standard deviation (GP Std) of the sample values in each round.
- Figure 14 shows the distribution of scores for proposed sequences obtained using the pseudo-ground-truth model.
- the scores indicate estimates of the brightness of the fluorescence emitted by GFP. It is shown that using MV as the objective function results in higher brightness than using the mean.
- Fig. 15 shows the edit distance of the proposed sequence based on the template GFP. Using MV as the objective function results in a smaller edit distance than using the average. This shows that using MV allows sequences with fewer mutations to be sampled, i.e., safe optimization can be achieved.
- Fig. 16 shows the standard deviation (GP Std) of the proposed array. Using MV as the objective function results in a smaller standard deviation than using the average. This supports the idea that using MV allows for sampling of an array with less uncertainty in the brightness used as a characteristic value, thereby realizing safe optimization.
- ⁇ Second verification example> Next, a second verification example will be described.
- data on bispecific antibodies was used as training data, and a Gaussian process was executed to learn a prediction model.
- the training data used in this verification example is data showing the sequence and characteristic value of binding ability of a bispecific antibody whose antigens are MarvelD3 and CD3. Octet values were used as the characteristic value of binding ability.
- a vector expression obtained using TAPE was used as an input value for the prediction model. This vector expression represents the protein sequence of the bispecific antibody sample.
- MarvelD3 is a tight junction protein with a four-transmembrane structure.
- MarvelD3 was set as a candidate target for anticancer drugs.
- the development of bispecific antibodies that cross-link cancer antigens with antigens on T cells is expected to be applied to cancer treatment.
- the trained prediction model was used to determine candidate anti-MarvelD3 sequences with superior properties from the lead antibodies.
- the octet values of the antibody sequences were measured multiple times, and the antibody sequences and the octet values of the antibody sequences obtained by the measurement were used as inputs to the prediction model.
- batch measurement typically 100 or less octet values of the antibody sequences are obtained in one measurement.
- the antibody for the measurement was obtained by the following procedure. First, a plasmid encoding a pre-designed heavy or light chain was prepared, and the recombinant antibody was transiently expressed using Expi293F cells. The antibody was captured from the culture supernatant using protein A and eluted into a buffer solution. The eluted buffer solution was mixed under reducing conditions to prepare a MarvelD3/CD3 bispecific antibody.
- Octet values were measured using the Octet HTX system. Extracellular vesicles bearing CD81 and human MarvelD3 proteins on their surfaces were captured on a sensor chip using anti-CD81 antibodies. After a baseline step of 600 seconds in D-PBS(-) solution containing 0.1% BSA, the association and dissociation responses were measured for 900 and 1500 seconds, respectively, in the same buffer containing 20 nM of antibody. The binding ability of the antibody is expressed as a shift in wavelength between the baseline step and the end of the association phase. Measurements were performed at a temperature of 30°C and at a vibration speed of 1000 times per minute during the baseline step, association, and dissociation phases.
- MV-TPE and Mean-TPE were performed as in the first verification example.
- the top 48 sequences in terms of the estimated value of the objective function were evaluated as proposed sequences.
- Figure 19 shows the distribution of the mean and standard deviation of the estimates obtained by sampling at each time for both MV-TPE and Mean-TPE.
- the standard deviation indicates uncertainty, and the higher the mean of the estimates, the better the octet value.
- the most frequent value of the standard deviation is 1.0, whereas when MV is used as the objective function, it is 0.3.
- the mean is used as the objective function
- the most frequent value of the mean is 2.3, whereas when MV is used, it is 1.5.
- Figure 20 shows a t-SNE (t-distributed Stochastic Neighbor Embedding) visualization of sequences obtained by sampling.
- t-SNE is a method for compressing high-dimensional data into low dimensions and visualizing it.
- Figure 20 shows the distribution of vector representations by TAPE representing individual sequences on a two-dimensional plane for each of the training data (Train), mean, and MV.
- TAPE t-distributed Stochastic Neighbor Embedding
- Figure 21 shows the average value (GP Mean) of the estimated octet values of the 48 sequences to be evaluated when the objective function of TPE is the mean (Mean) and when it is MV. According to Figure 21, the average value of the octet values obtained using MV is lower than the average value of the octet values obtained using the mean as the objective function.
- Figure 22 shows the standard deviation (GP Std) of the estimated values for both the mean (Mean) and MV. According to Figure 22, the standard deviation of the octet values obtained using MV is lower than the standard deviation of the octet values obtained using the mean as the objective function. From this result, it can be inferred that the sequence sampled by MV has low uncertainty and is therefore less likely to lose stable expression and binding ability.
- Figure 23 shows the distribution of expression levels of the 48 sequences to be evaluated for both mean and MV.
- the mean when used as the objective function, the expression levels of the top 48 sequences are kept extremely low, and sufficient samples for measuring the octet value were not obtained.
- MV when used as the objective function, almost all of the top 48 sequences are significantly expressed.
- FIG. 24 shows the distribution of octet values for each sequence for both MV and training data. The distribution trends of octet values are similar between MV and training data. This shows that using MV as the objective function makes it possible to obtain sequences with binding ability similar to that of the training data and, therefore, the parent sequence. Furthermore, MV yielded several sequences with octet values higher than the maximum value in the training set.
- the molecular design device 1 of the embodiment configured in this way performs Bayesian optimization with a model that obtains characteristic quality values based on drug candidate molecular sequence information as the objective function. This makes it possible to further reduce the burden required for drug discovery.
- the molecular design device 1 may be implemented using a plurality of information processing devices communicatively connected via a network. In this case, each functional unit of the molecular design device 1 may be distributed and implemented in a plurality of information processing devices.
- the drug discovery system 100 may also include a plant (not shown) that generates an optimal molecule estimated by the molecular design device 1.
- the molecular design device 1 outputs output information indicating the biological sequence of the optimal molecule estimated by the above method to the plant.
- the plant executes a process of generating a molecular compound having a biological sequence indicated by the output information input from the molecular design device 1.
- All or part of the functions of the molecular design device 1 may be realized using hardware such as an ASIC (Application Specific integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array).
- the program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems.
- the program may be transmitted via a telecommunications line.
- the molecular design device 1 is an example of an estimation device.
- the above functions may be realized by other types of devices, such as devices whose main function is not molecular design, or information devices equipped with general-purpose computer systems.
- the present embodiment may be realized as a system including a processor and a memory, in which the memory is configured to store one or more instructions, and the instructions may be instructions for causing the processor to execute the steps of: calculating a predicted property value of a molecule that is an element of a sequence information set, which is a set of sequence information of a plurality of different molecules, using a prediction model for predicting the property of the molecule from sequence information of the molecule, and estimating the uncertainty of the prediction; and searching for a candidate molecule having a more desired property based on the predicted property value and the uncertainty of the prediction.
- the instructions may be instructions to cause a processor to execute an acquisition step of acquiring a sequence information set, which is a collection of sequence information of a plurality of different molecules, in accordance with a combinatorial optimization algorithm; a prediction step of calculating a predicted property value of a molecule that is an element of the sequence information set, using a prediction model for predicting the properties of the molecule from its sequence information; an update step of updating extraction parameters of the combinatorial optimization algorithm, based on the predicted property value for each molecule, so that molecules having more desired properties are included; and a search step of searching for molecules having desired properties based on the predicted property values.
- This embodiment may be realized as a non-transitory computer-readable medium storing one or more instructions.
- the instructions may be instructions for causing a computer to execute the following steps: for a molecule that is an element of a sequence information set, which is a set of sequence information of a plurality of different molecules, using a prediction model for predicting the properties of the molecule from the sequence information of the molecule, to calculate a property prediction value of the molecule and estimate the uncertainty of the prediction; and, based on the property prediction value and the uncertainty of the prediction, to search for a candidate molecule having a more desired property.
- the instructions may also be instructions for causing a computer to execute the following steps: an acquisition step of acquiring a sequence information set, which is a set of sequence information of a plurality of different molecules, according to a combinatorial optimization algorithm; a prediction step of calculating a property prediction value of a molecule that is an element of the sequence information set, using a prediction model for predicting the properties of the molecule from the sequence information of the molecule; an update step of updating extraction parameters of the combinatorial optimization algorithm so that molecules having more desired properties are included based on the property prediction value for each molecule; and a search step of searching for a molecule having a desired property based on the property prediction value.
- the computer-implemented method is a process for a candidate set, which is a set whose elements are drug candidate molecule sequence information, which is information on the sequence of units that make up drug candidate molecules, and includes a control step of estimating an optimal molecule, which is a drug candidate molecule that provides an optimal value of a desired characteristic, among each of the drug candidate molecules indicated by each element of the candidate set by performing combinatorial optimization, and a step of outputting information on the optimal molecule
- the objective function of the combinatorial optimization is a model that executes a mathematical model that estimates a characteristic value of the desired characteristic based on the drug candidate molecule sequence information and obtains the uncertainty of the result of the estimation, and obtains a characteristic quality value calculated based on the characteristic value and the specified uncertainty.
- 1... molecular design device 11... control unit, 12... input unit, 13... communication unit, 14... memory unit, 15... output unit, 100... drug discovery system, 111... inference unit, 112... input control unit, 113... communication control unit, 114... memory control unit, 115... output control unit, 91... processor, 92... memory.
Landscapes
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Chemical & Material Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medicinal Chemistry (AREA)
- Pharmacology & Pharmacy (AREA)
- Physics & Mathematics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024566017A JP7646953B1 (ja) | 2023-08-10 | 2024-07-31 | 情報処理システム、情報処理方法、プログラム、および分子化合物を製造する方法 |
| CN202480050730.4A CN121605481A (zh) | 2023-08-10 | 2024-07-31 | 信息处理系统、信息处理方法、程序和用于制造分子化合物的方法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023-131649 | 2023-08-10 | ||
| JP2023131649 | 2023-08-10 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025033296A1 true WO2025033296A1 (ja) | 2025-02-13 |
Family
ID=94534678
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2024/027412 Pending WO2025033296A1 (ja) | 2023-08-10 | 2024-07-31 | 情報処理システム、情報処理方法、プログラム、および分子化合物を製造する方法 |
Country Status (3)
| Country | Link |
|---|---|
| JP (1) | JP7646953B1 (https=) |
| CN (1) | CN121605481A (https=) |
| WO (1) | WO2025033296A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120636581A (zh) * | 2025-05-29 | 2025-09-12 | 清华大学 | 一种模型训练方法、装置、介质及电子设备 |
Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4816567A (en) | 1983-04-08 | 1989-03-28 | Genentech, Inc. | Recombinant immunoglobin preparations |
| WO1995001937A1 (fr) | 1993-07-09 | 1995-01-19 | Association Gradient | Procede de traitement de residus de combustion et installation de mise en ×uvre dudit procede |
| US5648237A (en) | 1991-09-19 | 1997-07-15 | Genentech, Inc. | Expression of functional antibody fragments |
| US5789199A (en) | 1994-11-03 | 1998-08-04 | Genentech, Inc. | Process for bacterial production of polypeptides |
| US5840523A (en) | 1995-03-01 | 1998-11-24 | Genetech, Inc. | Methods and compositions for secretion of heterologous polypeptides |
| WO2002020565A2 (en) | 2000-09-08 | 2002-03-14 | Universität Zürich | Collections of repeat proteins comprising repeat modules |
| WO2002032925A2 (en) | 2000-10-16 | 2002-04-25 | Phylos, Inc. | Protein scaffolds for antibody mimics and other binding proteins |
| WO2003029462A1 (en) | 2001-09-27 | 2003-04-10 | Pieris Proteolab Ag | Muteins of human neutrophil gelatinase-associated lipocalin and related proteins |
| WO2004044011A2 (en) | 2002-11-06 | 2004-05-27 | Avidia Research Institute | Combinatorial libraries of monomer domains |
| WO2005040229A2 (en) | 2003-10-24 | 2005-05-06 | Avidia, Inc. | Ldl receptor class a and egf domain monomers and multimers |
| WO2008016854A2 (en) | 2006-08-02 | 2008-02-07 | The Uab Research Foundation | Methods and compositions related to soluble monoclonal variable lymphocyte receptors of defined antigen specificity |
| JP2008174503A (ja) * | 2007-01-19 | 2008-07-31 | Nec Corp | 化合物の仮想スクリーニング方法及び装置 |
| WO2018132752A1 (en) | 2017-01-13 | 2018-07-19 | Massachusetts Institute Of Technology | Machine learning based antibody design |
| WO2020246617A1 (ja) | 2019-06-07 | 2020-12-10 | 中外製薬株式会社 | 情報処理システム、情報処理方法、プログラム、及び、抗原結合分子或いはタンパク質を製造する方法 |
| JP2022150078A (ja) * | 2021-03-26 | 2022-10-07 | 富士通株式会社 | 情報処理プログラム、情報処理装置、及び情報処理方法 |
-
2024
- 2024-07-31 CN CN202480050730.4A patent/CN121605481A/zh active Pending
- 2024-07-31 JP JP2024566017A patent/JP7646953B1/ja active Active
- 2024-07-31 WO PCT/JP2024/027412 patent/WO2025033296A1/ja active Pending
Patent Citations (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4816567A (en) | 1983-04-08 | 1989-03-28 | Genentech, Inc. | Recombinant immunoglobin preparations |
| US5648237A (en) | 1991-09-19 | 1997-07-15 | Genentech, Inc. | Expression of functional antibody fragments |
| WO1995001937A1 (fr) | 1993-07-09 | 1995-01-19 | Association Gradient | Procede de traitement de residus de combustion et installation de mise en ×uvre dudit procede |
| US5789199A (en) | 1994-11-03 | 1998-08-04 | Genentech, Inc. | Process for bacterial production of polypeptides |
| US5840523A (en) | 1995-03-01 | 1998-11-24 | Genetech, Inc. | Methods and compositions for secretion of heterologous polypeptides |
| WO2002020565A2 (en) | 2000-09-08 | 2002-03-14 | Universität Zürich | Collections of repeat proteins comprising repeat modules |
| WO2002032925A2 (en) | 2000-10-16 | 2002-04-25 | Phylos, Inc. | Protein scaffolds for antibody mimics and other binding proteins |
| WO2003029462A1 (en) | 2001-09-27 | 2003-04-10 | Pieris Proteolab Ag | Muteins of human neutrophil gelatinase-associated lipocalin and related proteins |
| WO2004044011A2 (en) | 2002-11-06 | 2004-05-27 | Avidia Research Institute | Combinatorial libraries of monomer domains |
| WO2005040229A2 (en) | 2003-10-24 | 2005-05-06 | Avidia, Inc. | Ldl receptor class a and egf domain monomers and multimers |
| WO2008016854A2 (en) | 2006-08-02 | 2008-02-07 | The Uab Research Foundation | Methods and compositions related to soluble monoclonal variable lymphocyte receptors of defined antigen specificity |
| JP2008174503A (ja) * | 2007-01-19 | 2008-07-31 | Nec Corp | 化合物の仮想スクリーニング方法及び装置 |
| WO2018132752A1 (en) | 2017-01-13 | 2018-07-19 | Massachusetts Institute Of Technology | Machine learning based antibody design |
| WO2020246617A1 (ja) | 2019-06-07 | 2020-12-10 | 中外製薬株式会社 | 情報処理システム、情報処理方法、プログラム、及び、抗原結合分子或いはタンパク質を製造する方法 |
| JP2022150078A (ja) * | 2021-03-26 | 2022-10-07 | 富士通株式会社 | 情報処理プログラム、情報処理装置、及び情報処理方法 |
Non-Patent Citations (7)
| Title |
|---|
| ANNU. REV. BIOPHYS. BIOMOL. STRUCT, vol. 35, 2006, pages 225 - 249 |
| CHARLTON: "Methods in Molecular Biology", vol. 248, 2003, HUMANA PRESS, pages: 245 - 254 |
| CLARKSON ET AL., NATURE, vol. 352, 1991, pages 624 - 628 |
| KINDT ET AL.: "Kuby Immunology", 2007, W.H. FREEMAN AND CO., pages: 91 |
| KUNKEL ET AL., PROC. NATL. ACAD. SCI. USA, vol. 82, 1985, pages 488 - 492 |
| PORTOLANO ET AL., J. IMMUNOL., vol. 150, 1993, pages 880 - 887 |
| PROC. NATL. ACAD. SCI. U.S.A., vol. 100, no. 11, 2003, pages 6353 - 6357 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120636581A (zh) * | 2025-05-29 | 2025-09-12 | 清华大学 | 一种模型训练方法、装置、介质及电子设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7646953B1 (ja) | 2025-03-17 |
| JPWO2025033296A1 (https=) | 2025-02-13 |
| CN121605481A (zh) | 2026-03-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7757472B2 (ja) | 情報処理システム、情報処理方法、プログラム、及び、抗原結合分子或いはタンパク質を製造する方法 | |
| JP5457009B2 (ja) | ヒトに適合したモノクローナル抗体における使用法 | |
| CN114303201B (zh) | 使用机器学习技术生成蛋白质序列 | |
| US9550986B2 (en) | High-throughput antibody humanization | |
| CN105793850B (zh) | 抗体选择装置和方法 | |
| JP2007512846A (ja) | 増加した宿主ストリング含有量を有する変異体タンパク質の生成方法およびその組成物 | |
| JP7419534B2 (ja) | 鋳型タンパク質配列に基づく機械学習技術を用いたタンパク質配列の生成 | |
| Sulea | Humanization of camelid single-domain antibodies | |
| JP7646953B1 (ja) | 情報処理システム、情報処理方法、プログラム、および分子化合物を製造する方法 | |
| JP5323710B2 (ja) | ヒトに適合するモノクローナル抗体に使用するための方法 | |
| US20240355412A1 (en) | Systems and methods for generative design of custom biologics | |
| WO2006135793A2 (en) | Protein engineering with analogous contact environments | |
| JP7611464B1 (ja) | 情報処理システム、情報処理方法、情報処理プログラム、および分子化合物の製造方法 | |
| WO2024088381A1 (zh) | 人源化抗体序列评估模型的构建方法及其应用 | |
| Bansia et al. | Homology modeling of antibody variable regions: methods and applications | |
| CN118510946A (zh) | 用于抗体文库智能构建的系统和方法 | |
| Aubrey et al. | Antibody fragments humanization: Beginning with the end in mind | |
| HK40063830A (en) | Information processing system, information processing method, program, and method for producing antigen-binding molecule or protein | |
| Beňo | Predicting residue-level surface exposure from antibody sequence | |
| EP4408878A1 (en) | Methods for antibody optimization | |
| Baartmans | Overview and prospects of physical and computational antibody design | |
| HK40070051B (en) | Generation of protein sequences using machine learning techniques | |
| HK40070051A (en) | Generation of protein sequences using machine learning techniques | |
| CN118382639A (zh) | 人源化抗体方法 | |
| CN118402005A (zh) | 用于生成蛋白质序列的残差人工神经网络 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| ENP | Entry into the national phase |
Ref document number: 2024566017 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024566017 Country of ref document: JP |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24851721 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024851721 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |