US12512182B2 - Binding peptide generation for MHC class I proteins with deep reinforcement learning - Google Patents
Binding peptide generation for MHC class I proteins with deep reinforcement learningInfo
- Publication number
- US12512182B2 US12512182B2 US18/471,591 US202318471591A US12512182B2 US 12512182 B2 US12512182 B2 US 12512182B2 US 202318471591 A US202318471591 A US 202318471591A US 12512182 B2 US12512182 B2 US 12512182B2
- Authority
- US
- United States
- Prior art keywords
- peptide
- mutation
- peptides
- mhc
- policy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
Definitions
- the present invention relates to peptide generation and, more particularly, to binding peptide generation for Major Histocompatibility Complex (MHC) class I proteins with deep reinforcement learning.
- MHC Major Histocompatibility Complex
- MHC Major Histocompatibility Complex
- a method for generating binding peptides presented by any given Major Histocompatibility Complex (MHC) protein includes, given a peptide and an MHC protein pair, enabling a Reinforcement Learning (RL) agent to interact with and exploit a peptide mutation environment by repeatedly mutating the peptide and observing an observation score of the peptide, learning to form a mutation policy, via a mutation policy network, to iteratively mutate amino acids of the peptide to obtain desired presentation scores, and generating, based on the desired presentation scores, qualified peptides and binding motifs of MHC Class I proteins.
- RL Reinforcement Learning
- a non-transitory computer-readable storage medium comprising a computer-readable program for generating binding peptides presented by any given Major Histocompatibility Complex (MHC) protein is presented.
- the computer-readable program when executed on a computer causes the computer to perform the steps of, given a peptide and an MHC protein pair, enabling a Reinforcement Learning (RL) agent to interact with and exploit a peptide mutation environment by repeatedly mutating the peptide and observing an observation score of the peptide, learning to form a mutation policy, via a mutation policy network, to iteratively mutate amino acids of the peptide to obtain desired presentation scores, and generating, based on the desired presentation scores, qualified peptides and binding motifs of MHC Class I proteins.
- RL Reinforcement Learning
- a system for generating binding peptides presented by any given Major Histocompatibility Complex (MHC) protein includes a memory and one or more processors in communication with the memory configured to, given a peptide and an MHC protein pair, enable a Reinforcement Learning (RL) agent to interact with and exploit a peptide mutation environment by repeatedly mutating the peptide and observing an observation score of the peptide, learn to form a mutation policy, via a mutation policy network, to iteratively mutate amino acids of the peptide to obtain desired presentation scores, and generate, based on the desired presentation scores, qualified peptides and binding motifs of MHC Class I proteins.
- RL Reinforcement Learning
- FIG. 1 is a block/flow diagram of an exemplary architecture for generating binding peptides that can be presented by any given Major Histocompatibility Complex (MHC) protein, in accordance with embodiments of the present invention
- FIG. 2 is a block/flow diagram of an exemplary architecture of a peptide sequence representation network with bi-directional Long Short-Term Memory (LSTM), in accordance with embodiments of the present invention
- FIG. 3 is a block/flow diagram of an exemplary architecture of a MHC allele representation network, in accordance with embodiments of the present invention.
- FIG. 4 is a block/flow diagram of an exemplary method for a Deep Reinforcement Learning (DRL) approach to peptide vaccine design for immunotherapy, in accordance with embodiments of the present invention
- FIG. 5 is an exemplary practical application for generating binding peptides that can be presented by any given MHC protein, in accordance with embodiments of the present invention
- FIG. 6 is an exemplary processing system for generating binding peptides that can be presented by any given MHC protein, in accordance with embodiments of the present invention.
- FIG. 7 is a block/flow diagram of an exemplary method for generating binding peptides that can be presented by any given MHC protein, in accordance with embodiments of the present invention.
- Immunotherapy which aims at boosting a patient's immune system against intracellular pathogens (e.g., viruses or bacteria) and tumor cells, is a fundamental treatment for human diseases.
- a major branch of such immune responses are triggered by the Cytotoxic T cells (also known as CD8+ T cells) when they recognize foreign peptides presented by Major Histocompatibility Complex (MHC) Class I proteins on the cell surface.
- MHC Major Histocompatibility Complex
- these foreign peptides are first degraded from intracellular antigens by proteolytic enzymes within the proteasome, and then transported to the endoplasmic reticulum to bind to MHC Class I proteins. The resulting peptide-MHC complexes are then moved to the cell surface to interact with the CD8+ T cell receptors.
- Peptide-based vaccines Leveraging such immune reactions triggered by peptide-MHC complexes has recently shown substantial promise for peptide-based vaccines in the prevention of human diseases. Peptide-based vaccines have better stability and synthesizability when compared with large proteins and may trigger the desired immune responses with fewer side effects.
- the exemplary embodiments formulate the foreign peptide search as a Reinforcement Learning (RL) problem and propose a framework, referred to as PepPPO, to generate qualified peptides and peptide binding motifs.
- RL Reinforcement Learning
- the PepPPO learns a mutation policy to optimize the peptides through mutating amino acids, step by step, such that the mutated peptides can be presented by a given MHC protein.
- the exemplary embodiments demonstrate that PepPPO can significantly outperform multiple baselines in terms of finding qualified peptides and can effectively generate binding motifs of MHC proteins.
- the generated motifs are highly robust, with random initial peptides leading to identical motifs after stepwise mutations and are highly correlated with experimentally derived motifs. Furthermore, it is demonstrated that motifs generated by PepPPO can be used in rapid screening for neoantigens through motif matching even for rare MHC class I proteins without experimental data.
- a peptide is represented as a sequence of amino acids ⁇ o 1 , o 2 , . . . , o i , . . . , o l >, where o is one of 20 types of natural amino acids and l is the length of the sequence ranging from 8 ⁇ 15.
- PepPPO 100 ( FIG. 1 ) aims at generating a binding peptide p of length l that will be presented by m.
- PepPPO 100 leverages a reinforcement learning (RL) agent to explore (interact with) the peptide mutation environment 120 for high-presentation peptide generation.
- RL reinforcement learning
- the RL agent explores and exploits the peptide mutation environment 120 by repeatedly mutating the current peptide and observing its presentation score.
- the RL agent learns to form a mutation policy ⁇ ( ⁇ ) to iteratively mutate the amino acids of any given peptide p to have a desired presentation score.
- This learning paradigm is illustrated in FIG. 1 , and there are two main components to fulfill the learning, that is, constructing the peptide mutation environment 120 and learning the mutation policy network 110 , which will be described below.
- the peptide mutation environment 120 enables the RL agent to perform and experience trial-and-error peptide mutations to gradually refine its mutation policy (through tuning the parameters of the mutation policy networks).
- the RL agent keeps mutating peptides and receiving their presentation scores (i.e., reward signal) given by the environment. These rewards thus help reinforce the agent's mutation behaviors. For instance, mutation behaviors resulting in high peptide presentation scores (high rewards) are encouraged while others leading to low scores are discouraged.
- the mutation environment 120 includes three components, that is, a state space, an action space and a reward function.
- the state includes the current mutated peptide and the MHC protein.
- the action and the reward represent the mutation action that may be taken by the RL agent and the resulting new presentation score of the mutated peptide, respectively.
- the exemplary embodiments define the state of the environment s t at time step t as a pair including a peptide and an MHC class I protein (p, m).
- the exemplary embodiments represent an MHC protein as a pseudo sequence with 34 amino acids, each of which is in potential contact with the bound peptide within a distance of 4.0 ⁇ , following the previous work for peptide-MHC binding prediction.
- the exemplary embodiments initialize the state so by randomly sampling an MHC class I protein and a peptide sequence.
- the exemplary embodiments define the terminal state s T , which will stop mutating a peptide, as the state either with the maximum time step T reached by the RL agent or with the presentation score greater than a predefined threshold ⁇ .
- the exemplary embodiments use the final reward to guide the optimization of the RL agent. That is, only the terminal states can receive rewards from the peptide mutation environment 120 .
- the exemplary embodiments define the final reward as the presentation score r(p T , m) between the peptide p T and the MHC protein m in the terminal state s T .
- the exemplary embodiments leverage the presentation score predicted by the MHCflurry2.0 for learning.
- MHCflurry2.0 is the best existing method able to accurately estimate the presentation scores of peptides with MHC proteins. This score is a composite score of the antigen processing (AP) prediction and the binding affinity (BA) prediction.
- the former predicts the probability for a peptide to be delivered by the transporter associated with antigen processing (TAP) protein complex into the endoplasmic reticulum (ER), where the peptide can bind to MHC proteins.
- TEP antigen processing
- ER endoplasmic reticulum
- the latter predicts the binding strength between the peptide and MHC protein.
- Higher presentation scores require higher AP and BA scores and indicate higher probabilities for peptides to be presented on the cell surface by the given MHC proteins.
- the RL agent in the PepPPO 100 takes as input the given peptide and the MHC protein. The agent then learns to mutate the amino acids in the peptide sequence, one amino acid at each step, aiming at maximizing the presentation score of the resulting peptide.
- both the peptide and the MHC protein are first encoded into a distributed embedding space. Then, a mapping between the embedding space and the mutation policy is learned by a gradient descent optimization method, as discussed below.
- the exemplary embodiments use a mixture of multiple encoding methods to represent the amino acids within the peptide sequences and the MHC proteins.
- the exemplary embodiments first embed each amino acid o i within the peptide sequences ⁇ o 1 , o 2 , . . . , o l > into a continuous latent vector h i using one-layer bidirectional LSTM as below:
- h i / i are the hidden state vectors of the i-th amino acid
- i / i are the memory cell states of the i-th amino acid
- 0 , l , 0 and, l are initialized with random noise vectors.
- P and P are the learnable parameters of the LSTM for the forward and backward direction, respectively.
- the exemplary embodiments To embed an MHC protein into a continuous latent vector, the exemplary embodiments first flatten the encoding matrix E m into a vector m. Then, the exemplary embodiments learn the continuous latent embedding h m as:
- the exemplary embodiments optimize the peptide sequence p t by predicting the mutation of one amino acid with the latent embeddings h p t and h m . Specifically, the exemplary embodiments first select the amino acid o i in p t as the one to be replaced. The exemplary embodiments then predict which amino acid should be used to replace o i . For each amino acid o i in the peptide sequence, the exemplary embodiments predict the score of replacement as shown below:
- h i is the hidden latent vector of o i
- the exemplary embodiments measure “how likely” the amino acid o i can be replaced with another one by looking at its context in h i (e.g., o i and the peptide sequence p t ) and the MHC protein h m .
- the amino acid to be replaced is determined by sampling from the distribution with normalized scores.
- the exemplary embodiments then predict the type of the amino acid used to replace o i as shown below:
- the amino acid type is then determined by sampling from the distribution of probabilities of amino acid types excluding the original type of o i .
- the exemplary embodiments adopt the Proximal Policy Optimization (PPO), a policy gradient method to optimize the policy networks.
- PPO Proximal Policy Optimization
- the objective function of the PPO is defined as follows:
- ⁇ is the set of learnable parameters of the policy network
- r t ( ⁇ ) ⁇ ⁇ ( a t ⁇ ⁇ " ⁇ [LeftBracketingBar]" s t ) ⁇ ⁇ old ( a t ⁇ ⁇ " ⁇ [LeftBracketingBar]” s t ) , which is the probability ratio between the action under current policy ⁇ ⁇ and the action under previous policy ⁇ ⁇ old .
- r t ( ⁇ ) is clipped to avoid moving r t outside of the interval [1 ⁇ , 1+ ⁇ ].
- ⁇ t is the advantage at timestep t computed with the generalized advantage estimator, measuring how much better the selected actions are than others on average:
- V(s t ) uses a Multi-Layer Perceptron (MLP) to predict the future return of current state s t from the MHC embedding h m and the peptide embedding h p .
- MLP Multi-Layer Perceptron
- V( ⁇ ) The objective function of V( ⁇ ) is defined as follows:
- V ( ⁇ ) E ⁇ t [ ( V ⁇ ( s t ) - R ⁇ t ) 2 ] , ( 7 )
- the exemplary methods also add the entropy regularization loss H( ⁇ ), a popular strategy used for policy gradient methods, to encourage the exploration of the policy.
- the exemplary embodiments derive an expert policy ⁇ ept from the existing data. Specifically, for each MHC protein m with enough data, the exemplary embodiments calculate the amino acid distributions ⁇ p 1 (o
- m ) - p i ( o o i ⁇ ⁇ " ⁇ [LeftBracketingBar]" m ) ) , ( 8 )
- the exemplary embodiments utilize the expert policy to pre-train the policy network.
- the objective of pretraining is to minimize the following cross entropy loss:
- the exemplary embodiments have included the entropy regularization into the objective function to ensure sufficient exploration.
- this strategy cannot explicitly encourage the policy to produce diverse actions that could lead to high rewards.
- the exemplary embodiments design a diversity-promoting experience buffer to store the trajectories that could result in qualified peptides.
- the visited state action pairs of mutation trajectories of qualified peptides are added into this buffer.
- the exemplary embodiments always keep the state-action pairs with infrequent actions and remove those with frequent actions to ensure that the buffer is not dominated by the frequent actions.
- the exemplary embodiments then randomly sample a batch of state-action pairs with infrequent actions from the buffer.
- the cross-entropy loss L B is defined as:
- the experimental dataset of MHC binding affinities was used to derive the amino acid distributions of qualified peptides to get the expert policy for PepPPO 100 .
- This dataset includes 149 human MHC class I proteins (alleles) and 319,971 peptides. 3,688 unique pseudo sequences for MHC proteins were retrieved from a previous publication. It is noted that different MHC proteins could be represented with the same pseudo sequences.
- the exemplary embodiments thus present a deep reinforcement learning system with peptide mutation policies for generating binding peptides that are the same as or at most d amino acids different from a library of peptides.
- the pre-defined library of peptides can be derived from the genome of a virus such as SARS-CoV-2 or from sequencing tumor samples of a patient. Therefore, the system 100 can be used for generating peptides for immunotherapy targeting a particular type of virus or tumor. Given a virus genome or some tumor cells, the exemplary methods run sequencing followed by some off-the-shelf peptide processing pipelines to extract a library of peptides that can uniquely identify the virus or tumor cells. Targeting this peptide library from the virus or tumor, the system 100 can generate peptides that bind to MHC and are presented on cell surface, so that immune responses can be triggered to kill the virus or tumor cells.
- a deep neural network is first trained on the public IEDB dataset or a pre-trained model such as MHCFlurry 2.0 is employed to predict a peptide presentation score (a combination of peptide-MHC binding affinity and antigen processing score) given a MHC allele sequence and a peptide sequence.
- a pre-trained model for predicting peptide presentation scores from MHC allele and peptide sequences, the exemplary embodiments develop a DRL system with peptide mutation policies to generate peptides with high presentation scores that are the same as or at most d amino acids different from the provided library of peptides.
- the exemplary embodiments then pretrain the DRL system to learn good peptide mutation policies transforming a given random peptide into a peptide with a high presentation score.
- the exemplary embodiments randomly sample batches of peptides from the provided library and follow the policy network to mutate the peptides. During the mutation process, if any mutated peptide is already d amino acid different from the starting peptide, the exemplary embodiments stop the process and output the peptide as the final peptide. The exemplary embodiments also optionally finetune the policy network on this library of peptides with the similarity constraint enforced.
- the exemplary embodiments output the final mutated peptides for all peptides in the library (each peptide in the library might produce several promising mutated peptides satisfying the similarity constraint) and rank the compiled set of mutated peptides. The top ranked peptides are used as promising drug candidates targeting the specified virus or tumor cells for immunotherapy.
- HLA Human Leukocyte Antigen
- the exemplary embodiments identify qualified peptides and patterns for HLA molecules.
- the exemplary methods further formulate the peptide generation problem as a search problem.
- a reinforcement learning (RL) network is developed to solve the problem with includes a mutational policy that can generate diverse peptides and has good interpretability.
- the exemplary methods can identify the patterns of each MHC allele by counting the frequency of predicted positions and amino acids to mutate random peptides into qualified peptides with RL.
- FIG. 3 is a block/flow diagram of an exemplary architecture 300 of a MHC allele representation network, in accordance with embodiments of the present invention.
- the exemplary methods first train a deep neural network on the public IEDB dataset or employ a pre-trained model such as MHCFlurry 2.0 to predict a peptide presentation score (a combination of peptide-MHC binding affinity and antigen processing score) given a MHC allele sequence and a peptide sequence.
- a pre-trained model such as MHCFlurry 2.0 to predict a peptide presentation score (a combination of peptide-MHC binding affinity and antigen processing score) given a MHC allele sequence and a peptide sequence.
- the exemplary methods develop a DRL system with peptide mutation policies to generate peptides with high presentation scores that are the same as or at most d amino acids different from the provided library of peptides.
- the exemplary embodiments then pretrain a DRL system to learn good peptide mutation policies transforming a given random peptide into a peptide with a high presentation score.
- the exemplary methods randomly sample batches of peptides from the provided library and follow the policy network to mutate the peptides. During the mutation process, if any mutated peptide is already d amino acid different from the starting peptide, the exemplary embodiments stop the process and output the peptide as final peptide. The exemplary methods also optionally finetune the policy network on this library of peptides with the similarity constraint enforced.
- the final mutated peptides for all peptides in the library are output (each peptide in the library might produce several promising mutated peptides satisfying the similarity constraint), and the compiled set of mutated peptides are ranked. The top ranked peptides are used as promising drug candidates targeting the specified virus or tumor for immunotherapy.
- the exemplary methods use amino acid embeddings followed by a convolutional layer 320 and fully-connected layers 310 , 312 to get the allele representation, and the exemplary methods further use bi-directional LSTM 210 on top of amino acid embeddings to get peptide representation.
- a deep neural network is used as a policy network to learn the conditional probability of different actions given the state.
- FIG. 4 is a block/flow diagram of an exemplary method for a deep reinforcement learning (DRL) approach to peptide vaccine design for immunotherapy, in accordance with embodiments of the present invention.
- DRL deep reinforcement learning
- DRL Deep Reinforcement Learning
- FIG. 5 is an exemplary practical application 500 for generating binding peptides that can be presented by any given MHC protein, in accordance with embodiments of the present invention.
- a peptide is processed by the PepPPO 100 within the peptide mutation environment 120 by the mutation policy network 120 to generate new qualified peptides 510 to be displayed on a screen 512 and analyzed by a user 514 .
- FIG. 6 is an exemplary processing system for generating binding peptides that can be presented by any given MHC protein, in accordance with embodiments of the present invention.
- the processing system includes at least one processor (CPU) 904 operatively coupled to other components via a system bus 902 .
- a Graphical Processing Unit (GPU) 905 , a cache 906 , a Read Only Memory (ROM) 908 , a Random Access Memory (RAM) 910 , an Input/Output (I/O) adapter 920 , a network adapter 930 , a user interface adapter 940 , and a display adapter 950 are operatively coupled to the system bus 902 .
- the PepPPO system 100 employs a mutation policy network 120 in a peptide mutation environment 110 .
- a storage device 922 is operatively coupled to system bus 902 by the I/O adapter 920 .
- the storage device 922 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid-state magnetic device, and so forth.
- a transceiver 932 is operatively coupled to system bus 902 by network adapter 930 .
- User input devices 942 are operatively coupled to system bus 902 by user interface adapter 940 .
- the user input devices 942 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present invention.
- the user input devices 942 can be the same type of user input device or different types of user input devices.
- the user input devices 942 are used to input and output information to and from the processing system.
- a display device 952 is operatively coupled to system bus 902 by display adapter 950 .
- the processing system may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements.
- various other input devices and/or output devices can be included in the system, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art.
- various types of wireless and/or wired input and/or output devices can be used.
- additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art.
- FIG. 7 is a block/flow diagram of an exemplary method for generating binding peptides that can be presented by any given MHC protein, in accordance with embodiments of the present invention.
- RL Reinforcement Learning
- the system 100 can be used for generating peptides for immunotherapy targeting a particular type of virus or tumor.
- the state is an MHC allele sequence and a peptide sequence
- the action at each time step is to first choose a position in the input peptide to determine the position of edits (replace the current amino acid at the position with another one) and then determine the type of amino acid at the predicted position.
- amino acid embeddings are employed followed by a convolutional layer and fully-connected layers to get the allele representation.
- Bi-directional LSTM is used on top of amino acid embeddings to obtain peptide representation, and a deep neural network is employed as a policy network to learn the conditional probability of different actions given the state.
- the reward design is based on the difference of the presentation scores of the peptides before and after mutations (actions).
- the exemplary embodiments use PepPPO 100 to optimize the DRL model. During the mutation process, if any mutated peptide is already d amino acid different from the starting peptide, the process stops, and the peptide is output as the final peptide. The exemplary embodiments also optionally finetune the policy network on this library of peptides with the similarity constraint enforced. Using the pretrained presentation-score prediction deep model to define reward functions and starting from random peptides, the exemplary embodiments then pretrain a DRL system to learn good peptide mutation policies transforming a given random peptide into a peptide with a high presentation score.
- the exemplary methods randomly sample batches of peptides from the provided library and follow the policy network to mutate the peptides.
- the final mutated peptides for all peptides in the library are output (each peptide in the library might produce several promising mutated peptides satisfying the similarity constraint) and the compiled set of mutated peptides are ranked.
- the top ranked peptides are used as promising drug candidates targeting the specified virus or tumor for immunotherapy.
- the exemplary methods use binding motifs for MHC alleles with experimental data to guide the pre-training of the policy network.
- the exemplary methods use binding motifs from the most similar MHC alleles with experimental data to guide the pre-training of the policy network.
- the terms “data,” “content,” “information” and similar terms can be used interchangeably to refer to data capable of being captured, transmitted, received, displayed and/or stored in accordance with various example embodiments. Thus, use of any such terms should not be taken to limit the spirit and scope of the disclosure.
- a computing device is described herein to receive data from another computing device, the data can be received directly from the another computing device or can be received indirectly via one or more intermediary computing devices, such as, for example, one or more servers, relays, routers, network access points, base stations, and/or the like.
- aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module,” “calculator,” “device,” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
- the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can include, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
- a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
- Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks or modules.
- the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks or modules.
- processor as used herein is intended to include any processing device, such as, for example, one that includes a CPU and/or other processing circuitry. It is also to be understood that the term “processor” may refer to more than one processing device and that various elements associated with a processing device may be shared by other processing devices.
- memory as used herein is intended to include memory associated with a processor or CPU, such as, for example, RAM, ROM, a fixed memory device (e.g., hard drive), a removable memory device (e.g., diskette), flash memory, etc. Such memory may be considered a computer readable storage medium.
- input/output devices or “I/O devices” as used herein is intended to include, for example, one or more input devices (e.g., keyboard, mouse, scanner, etc.) for entering data to the processing unit, and/or one or more output devices (e.g., speaker, display, printer, etc.) for presenting results associated with the processing unit.
- input devices e.g., keyboard, mouse, scanner, etc.
- output devices e.g., speaker, display, printer, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Chemical & Material Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Library & Information Science (AREA)
- Biochemistry (AREA)
- Crystallography & Structural Chemistry (AREA)
- Peptides Or Proteins (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
Abstract
Description
which is the probability ratio between the action under current policy πθ and the action under previous policy πθ
is the rewards-to-go. Because only the final rewards are used, that is ri=0 if i≠T, the exemplary methods calculate {circumflex over (R)}t with
The exemplary methods also add the entropy regularization loss H(θ), a popular strategy used for policy gradient methods, to encourage the exploration of the policy.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/471,591 US12512182B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163241129P | 2021-09-07 | 2021-09-07 | |
| US17/899,004 US12518851B2 (en) | 2021-09-07 | 2022-08-30 | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
| US18/471,591 US12512182B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning |
Related Parent Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/899,004 Continuation US12518851B2 (en) | 2021-09-07 | 2022-08-30 | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20240087672A1 US20240087672A1 (en) | 2024-03-14 |
| US12512182B2 true US12512182B2 (en) | 2025-12-30 |
Family
ID=85478820
Family Applications (4)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/899,004 Active 2044-07-20 US12518851B2 (en) | 2021-09-07 | 2022-08-30 | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
| US18/471,591 Active 2043-05-14 US12512182B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning |
| US18/471,597 Active 2043-06-19 US12518852B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
| US18/471,610 Active 2043-06-09 US12518853B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning utilizing encoding methods of a blosum matrix |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/899,004 Active 2044-07-20 US12518851B2 (en) | 2021-09-07 | 2022-08-30 | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
Family Applications After (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/471,597 Active 2043-06-19 US12518852B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
| US18/471,610 Active 2043-06-09 US12518853B2 (en) | 2021-09-07 | 2023-09-21 | Binding peptide generation for MHC class I proteins with deep reinforcement learning utilizing encoding methods of a blosum matrix |
Country Status (1)
| Country | Link |
|---|---|
| US (4) | US12518851B2 (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12518851B2 (en) * | 2021-09-07 | 2026-01-06 | Nec Corporation | Binding peptide generation for MHC class I proteins with deep reinforcement learning for immunotherapy decision making |
| JP2025010852A (en) * | 2023-07-10 | 2025-01-23 | 株式会社日立製作所 | Sequence information processing device and method |
| CN117352058A (en) * | 2023-09-27 | 2024-01-05 | 晶泰智药技术(上海)有限公司 | Protein sequence generation methods, devices, equipment and storage media |
Citations (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080021686A1 (en) * | 2006-02-16 | 2008-01-24 | Microsoft Corporation | Cluster modeling, and learning cluster specific parameters of an adaptive double threading model |
| US20170016075A1 (en) * | 2015-07-14 | 2017-01-19 | Personal Genome Diagnostics, Inc. | Neoantigen analysis |
| US20210012858A1 (en) * | 2018-03-16 | 2021-01-14 | Kotai Biotechnologies, Inc. | Effective clustering of immunological entities |
| US20210033608A1 (en) * | 2019-07-30 | 2021-02-04 | The Board Of Trustees Of The Leland Stanford Junior University | Methods and Systems for Identification of Human Leukocyte Antigen Peptide Presentation and Applications Thereof |
| US20210284738A1 (en) * | 2017-03-31 | 2021-09-16 | Act Genomics (Ip) Co., Ltd. | Ranking system for immunogenic cancer-specific epitopes |
| US20210383892A1 (en) * | 2020-06-03 | 2021-12-09 | Xenotherapeutics, Inc. | Selection and Monitoring Methods for Xenotransplantation |
| US20220327425A1 (en) * | 2021-04-05 | 2022-10-13 | Nec Laboratories America, Inc. | Peptide mutation policies for targeted immunotherapy |
| US20230085160A1 (en) * | 2021-09-07 | 2023-03-16 | Nec Laboratories America, Inc. | Binding peptide generation for mhc class i proteins with deep reinforcement learning |
| US20230282304A1 (en) * | 2021-12-21 | 2023-09-07 | Regeneron Pharmaceuticals, Inc. | Off-target prediction method for antigen-recognition molecules binding to mhc-peptide targets |
| US20240153591A1 (en) * | 2021-03-30 | 2024-05-09 | Pentamedix Co., Ltd. | Method for predicting t cell activity of peptide-mhc, and analysis device |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190237158A1 (en) * | 2016-08-31 | 2019-08-01 | Medgenome, Inc. | Methods to analyze genetic alterations in cancer to identify therapeutic peptide vaccines and kits therefore |
| CN112912960B (en) * | 2018-08-20 | 2024-10-29 | 南托米克斯有限责任公司 | Methods and systems for improving major histocompatibility complex (MHC)-peptide binding prediction for neoepitopes using recurrent neural network encoders and attention weighting |
| KR20240091046A (en) * | 2018-12-21 | 2024-06-21 | 바이오엔테크 유에스 인크. | Method and systems for prediction of hla class ii-specific epitopes and characterization of cd4+ t cells |
| US20220275050A1 (en) * | 2019-05-06 | 2022-09-01 | Agency For Science, Technology And Research | High yield production and use of enzymatic-exchangeable peptide major histocompatibility complex class i single chain trimer tetramer |
-
2022
- 2022-08-30 US US17/899,004 patent/US12518851B2/en active Active
-
2023
- 2023-09-21 US US18/471,591 patent/US12512182B2/en active Active
- 2023-09-21 US US18/471,597 patent/US12518852B2/en active Active
- 2023-09-21 US US18/471,610 patent/US12518853B2/en active Active
Patent Citations (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080021686A1 (en) * | 2006-02-16 | 2008-01-24 | Microsoft Corporation | Cluster modeling, and learning cluster specific parameters of an adaptive double threading model |
| US20170016075A1 (en) * | 2015-07-14 | 2017-01-19 | Personal Genome Diagnostics, Inc. | Neoantigen analysis |
| US20210284738A1 (en) * | 2017-03-31 | 2021-09-16 | Act Genomics (Ip) Co., Ltd. | Ranking system for immunogenic cancer-specific epitopes |
| US20210012858A1 (en) * | 2018-03-16 | 2021-01-14 | Kotai Biotechnologies, Inc. | Effective clustering of immunological entities |
| US20210033608A1 (en) * | 2019-07-30 | 2021-02-04 | The Board Of Trustees Of The Leland Stanford Junior University | Methods and Systems for Identification of Human Leukocyte Antigen Peptide Presentation and Applications Thereof |
| US20210383892A1 (en) * | 2020-06-03 | 2021-12-09 | Xenotherapeutics, Inc. | Selection and Monitoring Methods for Xenotransplantation |
| US20240153591A1 (en) * | 2021-03-30 | 2024-05-09 | Pentamedix Co., Ltd. | Method for predicting t cell activity of peptide-mhc, and analysis device |
| US20220327425A1 (en) * | 2021-04-05 | 2022-10-13 | Nec Laboratories America, Inc. | Peptide mutation policies for targeted immunotherapy |
| US20230085160A1 (en) * | 2021-09-07 | 2023-03-16 | Nec Laboratories America, Inc. | Binding peptide generation for mhc class i proteins with deep reinforcement learning |
| US20240071563A1 (en) * | 2021-09-07 | 2024-02-29 | Nec Laboratories America, Inc. | Binding peptide generation for mhc class i proteins with deep reinforcement learning |
| US20240087673A1 (en) * | 2021-09-07 | 2024-03-14 | Nec Laboratories America, Inc. | Binding peptide generation for mhc class i proteins with deep reinforcement learning |
| US20230282304A1 (en) * | 2021-12-21 | 2023-09-07 | Regeneron Pharmaceuticals, Inc. | Off-target prediction method for antigen-recognition molecules binding to mhc-peptide targets |
Also Published As
| Publication number | Publication date |
|---|---|
| US20240087672A1 (en) | 2024-03-14 |
| US20240071563A1 (en) | 2024-02-29 |
| US20240087673A1 (en) | 2024-03-14 |
| US12518852B2 (en) | 2026-01-06 |
| US12518853B2 (en) | 2026-01-06 |
| US20230085160A1 (en) | 2023-03-16 |
| US12518851B2 (en) | 2026-01-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12512182B2 (en) | Binding peptide generation for MHC class I proteins with deep reinforcement learning | |
| EP3821434B1 (en) | Machine learning for determining protein structures | |
| US20240177799A1 (en) | T-cell receptor optimization with reinforcement learning and mutation policies for precision immunotherapy | |
| US20240120022A1 (en) | Predicting protein amino acid sequences using generative models conditioned on protein structure embeddings | |
| US20230377682A1 (en) | Peptide binding motif generation | |
| US20220327425A1 (en) | Peptide mutation policies for targeted immunotherapy | |
| CN110188158A (en) | Keyword and topic label generating method, device, medium and electronic equipment | |
| US12482534B2 (en) | Peptide based vaccine generation system with dual projection generative adversarial networks | |
| US20260011397A1 (en) | T-Cell Receptor Repertoire Selection Prediction with Physical Model Augmented Pseudo-Labeling for Personalized Medicine Decision Making | |
| US20240071571A1 (en) | Peptide search system for immunotherapy | |
| US20220319635A1 (en) | Generating minority-class examples for training data | |
| US20230304189A1 (en) | Tcr engineering with deep reinforcement learning for increasing efficacy and safety of tcr-t immunotherapy | |
| US20250259698A1 (en) | T-cell receptor optimization using quantum variational autoencoders |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MIN, RENQIANG;GRAF, HANS PETER;CHEN, ZIQI;SIGNING DATES FROM 20220805 TO 20220808;REEL/FRAME:065133/0541 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: ALLOWED -- NOTICE OF ALLOWANCE NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: AWAITING TC RESP., ISSUE FEE NOT PAID |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| AS | Assignment |
Owner name: NEC CORPORATION, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NEC LABORATORIES AMERICA, INC.;REEL/FRAME:072938/0913 Effective date: 20251113 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |