JPWO2020242766A5

JPWO2020242766A5 -

Info

Publication number: JPWO2020242766A5
Application number: JP2021571033A
Authority: JP
Publication date: 2023-06-15

Claims

a method,
training a machine learning model based on a first plurality of blueprint records, or representations thereof, and a first plurality of scores, wherein each blueprint from the first plurality of blueprint records; said training, wherein a record is associated with each score from said first plurality of scores;
after the training, running the machine learning model to generate a second plurality of blueprint records having at least one desired score;
wherein said second plurality of blueprint records is received as input in computational protein modeling and configured to generate an engineered polypeptide based on said second plurality of blueprint records; Method.

receiving a representation of the reference target structure for the reference target;
generating said first plurality of blueprint records from a predetermined portion of said reference target structure, wherein each blueprint record from said first plurality of blueprint records comprises a target residue position and a scaffold residue; group positions, each target residue position corresponding to one target residue from a plurality of target residues;
in at least one blueprint record, the target residue positions are non-contiguous;
said generating, wherein in at least one blueprint record, one or more target residue positions are in a different order than the order of said target residue positions in a reference target sequence;
for each blueprint record from the first plurality of blueprint records;
performing computational protein modeling on the blueprint record to generate a polypeptide structure;
calculating a score for said polypeptide structure; and
associating the score with its blueprint record
by
labeling the first plurality of blueprint records;
said computational protein modeling is based on de novo design without template matching to said reference target structure; or
each score from the first plurality of scores comprising an energy term and a structural constraint match term determined using one or more structural constraints extracted from the representation of the reference target structure;
The method of claim 1.

determining whether to retrain the machine learning model by calculating a second plurality of scores for the second plurality of blueprint records;
in response to the determining, based on (1) a retraining blueprint record comprising the second plurality of blueprint records; and (2) a retraining score comprising the second plurality of scores. retraining the machine learning model; and optionally ,
concatenating the first plurality of blueprint records and the second plurality of blueprint records to generate the retraining blueprint record after the retraining the machine learning model; generating a score, wherein each blueprint record from the retraining blueprint record is associated with a score from the retraining score;
the at least one desired score is a preset value;
the at least one desired score is determined dynamically;
or
When the machine learning model is a supervised machine learning model,
the supervised machine learning model comprises a decision tree ensemble, a boosted decision tree algorithm, an eXtreme Gradient Boosting (XGBoost) model, or a random forest;
whether the supervised machine learning model comprises a support vector machine (SVM), a feedforward machine learning model, a recurrent neural network (RNN), a convolutional neural network (CNN), a graph neural network (GNN), or a transformer neural network ,or
the machine learning model is an inductive machine learning model or a generative machine learning model;
The method of claim 1.

performing computational protein modeling on the second plurality of blueprint records to generate the engineered polypeptide ;
filtering said engineered polypeptides by comparison of static structures to said representation of said reference target structure;
Filtering said engineered polypeptides by comparison of dynamic structures to said representation of said reference target structure using molecular dynamics (MD) simulations of said representation of each structure of said reference target structure and engineered polypeptide. including at least one of:
said MD simulations are performed in parallel using symmetric multiprocessing (SMP), or
3. The method of claim 2 , wherein a number of blueprint records in said second plurality of blueprint records is less than a number of blueprint records in said first plurality of blueprint records.

A non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code causing the processor to:
training a machine learning model based on a first plurality of blueprint records, or representations thereof, and a first plurality of scores, wherein each blueprint from the first plurality of blueprint records; said training, wherein a record is associated with each score from said first plurality of scores;
executing the machine learning model after the training to generate a second plurality of blueprint records having at least one desired score;
wherein said second plurality of blueprint records is received as input in computational protein modeling and configured to generate an engineered polypeptide based on said second plurality of blueprint records; A non-transitory processor-readable medium.

to the processor;
receiving a representation of the reference target structure;
generating said first plurality of blueprint records from a predetermined portion of said reference target structure, wherein each blueprint record from said first plurality of blueprint records comprises a target residue position and a scaffold residue; base positions, each target residue position from the plurality of target residue positions corresponding to one target residue from the plurality of target residues;
in at least one blueprint record, the target residue positions are non-contiguous;
said generating, wherein in at least one blueprint record, one or more target residue positions are in a different order than the order of said target residue positions in said reference target sequence;
for each blueprint record from the first plurality of blueprint records;
performing computational protein modeling on the blueprint record to generate a polypeptide structure;
calculating a score for said polypeptide structure; and
associating the score with its blueprint record
by
labeling the first plurality of blueprint records;
contains code that causes the
said computational protein modeling is based on de novo design without template matching to said reference target structure, or
each score from the first plurality of scores comprising an energy term and a structural constraint match term determined using one or more structural constraints extracted from the representation of the reference target structure;
A medium according to claim 5 .

to the processor;
determining whether to retrain the machine learning model by calculating a second plurality of scores for the second plurality of blueprint records;
in response to the determining, based on (1) a retraining blueprint record comprising the second plurality of blueprint records; and (2) a retraining score comprising the second plurality of scores. retraining the machine learning model, and optionally ,
concatenating the first plurality of blueprint records and the second plurality of blueprint records to generate the retraining blueprint record after the retraining the machine learning model; generating a score, wherein each blueprint record from the retraining blueprint record is associated with a score from the retraining score;
the at least one desired score is a preset value;
the at least one desired score is determined dynamically
or
When the machine learning model is a supervised machine learning model,
the supervised machine learning model comprises a decision tree ensemble, a boosted decision tree algorithm, an eXtreme Gradient Boosting (XGBoost) model, or a random forest;
whether the supervised machine learning model comprises a support vector machine (SVM), a feedforward machine learning model, a recurrent neural network (RNN), a convolutional neural network (CNN), a graph neural network (GNN), or a transformer neural network ,or
the machine learning model is an inductive machine learning model or a generative machine learning model;
A medium according to claim 5 .

to the processor;
performing computational protein modeling on the second plurality of blueprint records to generate engineered polypeptides;
filtering said engineered polypeptides by comparison of static structures to representations of reference target structures;
Filtering said engineered polypeptides by comparison of dynamic structures to said representation of said reference target structure using molecular dynamics (MD) simulations of said representation of each structure of said reference target structure and engineered polypeptide. to do; or
Performing said MD simulations in parallel using symmetric multiprocessing (SMP)
contains code that causes the
the number of blueprint records in the second plurality of blueprint records is less than the number of blueprint records in the first plurality of blueprint records;
A medium according to claim 5 .

A device for selecting an engineered polypeptide, comprising:
a processor;
receiving a reference target structure from a second computing device remote from the first computing device;
generating a first plurality of blueprint records from a predetermined portion of said reference target structure, wherein each blueprint record from said first plurality of blueprint records comprises a target residue position and a scaffold residue; positions, each target residue position corresponding to one target residue from a plurality of target residues;
training a machine learning model based on a first plurality of blueprint records, or representations thereof, and a first plurality of scores, wherein each blueprint from the first plurality of blueprint records; said training, wherein a record is associated with each score from said first plurality of scores;
After the training, running the machine learning model to generate a second plurality of blueprint records having at least one desired score.
a memory storing instructions executable by the processor to perform
a first computing device having
wherein said second plurality of blueprint records is received as input in computational protein modeling and configured to generate an engineered polypeptide based on said second plurality of blueprint records; Device.

to the processor;
determining whether to retrain the machine learning model by calculating a second plurality of scores for the second plurality of blueprint records;
in response to the determining, based on (1) a retraining blueprint record comprising the second plurality of blueprint records; and (2) a retraining score comprising the second plurality of scores. retraining a machine learning model ; and
the desired score is a preset value, or
The desired score is dynamically determined
is at least one of, or
When the machine learning model is a supervised machine learning model,
the supervised machine learning model comprises a decision tree ensemble, a boosted decision tree algorithm, an eXtreme Gradient Boosting (XGBoost) model, or a random forest;
wherein the supervised machine learning model comprises a support vector machine (SVM), a feedforward machine learning model, a recurrent neural network (RNN), a convolutional neural network (CNN), a graph neural network (GNN), or a transformer neural network;
the machine learning model is an inductive machine learning model;
wherein the machine learning model is a generative machine learning model;
10. Apparatus according to claim 9 .

to the processor;
performing computational protein modeling on the second plurality of blueprint records to generate engineered polypeptides;
filtering said engineered polypeptides by comparison of static structures to representations of reference target structures;
filtering said engineered polypeptides by comparison of dynamic structures to said representation of said reference target structure using a molecular dynamics (MD) simulation of said representation of each structure of said engineered polypeptide and said reference target structure. to do; or
performing said MD simulations in parallel using symmetric multiprocessing (SMP);
10. The apparatus of claim 9 , comprising code for causing at least one of :

A method according to any one of claims 1-4 , a non-transitory processor-readable medium according to any one of claims 5-8 , or an apparatus according to any one of claims 9-11. engineered polypeptides produced by

an engineered peptide, said engineered peptide having a molecular weight of 1 kDa to 10 kDa and comprising up to 50 amino acids, said engineered peptide comprising:
comprising a combination of spatially related topological constraints, one or more of said constraints being reference target derived constraints;
10% to 98% of said amino acids of said engineered peptide satisfy said one or more reference target derived constraints;
Said engineered peptide, wherein said amino acids satisfying the constraints from said one or more reference targets have less than 8.0 Å backbone root mean square deviation (RSMD) structural homology with said reference targets.

said amino acids satisfying the constraints from said one or more reference targets have between 10% and 90% sequence homology with said reference targets ;
said combination comprises at least two reference target derived constraints ;
said combination comprises an energy term and a structural constraint match term determined using one or more structural constraints extracted from a representation of a reference target structure;
the one or more non-reference target-derived constraints define desired structural properties, dynamic properties, or any combination thereof;
said reference target comprises one or more atoms associated with a biological response or function, said one or more atoms in said engineered peptide associated with a biological response or function overlaps with said atomic variations of said one or more atoms in said reference target associated with a biological response or biological function;
the overlap is a root mean square dot product (RMSIP) greater than 0.25, or
the overlap has a root mean square inner product (RMSIP) greater than 0.75
14. The engineered peptide of claim 13 , which is at least one of

A method of selecting an engineered peptide comprising:
identifying one or more topological properties of the reference target;
designing a spatially related constraint for each topological feature to generate a combination of spatially related topological constraints derived from the reference target;
comparing the spatially related topological properties of a candidate peptide with the combination of said spatially related topological constraints from said reference target;
selecting candidate peptides with spatially related topological properties that overlap with the combination of the spatially related topological constraints from the reference target to generate the engineered peptides ;
The above method, comprising

one or more of the constraints come from energy per residue and atomic distance per residue;
said property of one or more candidate peptides is determined by computer simulation;
the computer simulation comprises a molecular dynamics simulation, a Monte Carlo simulation, a coarse-grained simulation, a Gaussian network model, machine learning, or any combination thereof;
said amino acids satisfying the constraints from said one or more reference targets have 10% to 90% sequence homology with said reference targets, or
the one or more non-reference target-derived constraints describe desired structural and/or dynamic properties
16. The method of claim 15 , wherein at least one of