WO2023139031A1

WO2023139031A1 - Method and system for predicting tcr (t cell receptor)-peptide interactions

Info

Publication number: WO2023139031A1
Application number: PCT/EP2023/050900
Authority: WO
Inventors: Filippo Grazioli; Anja Moesch; Pierre MACHART; Martin Renqiang MIN; Kai Li
Original assignee: NEC Laboratories Europe GmbH; Nec Laboratories America, Inc
Priority date: 2022-01-18
Filing date: 2023-01-16
Publication date: 2023-07-27

Abstract

The invention presents a system and a method for predicting the interaction between T cell receptors (TCRs) and peptides presented on the surface of cells. First, the system is configured to approximate unimodal posterior distributions over latent encodings conditioned by the input molecules. Second, the system is configured to aggregate the predicted parameters of the posteriors in a single matrix. Third, using a mechanism based on multi-head self-attention, the system is configured to approximate a multimodal posterior distribution over latent encodings, which is then used to estimate the probability of binding between TCR and peptides. Due to the multimodal nature of the system, the invention can account for an arbitrarily large set of heterogeneous input modalities.

Description

METHOD AND SYSTEM FOR PREDICTING TOR (T CELL RECEPTOR)-PEPTIDE INTERACTIONS

The present invention relates to a computer system and a computer-implemented method of predicting an interaction or binding between T cell receptors (TCRs), and peptides presented on the surface of cells, or peptides presented on the surface of cells in complex with major histocompatibility complex (MHC) molecules.

In the human immune system, T cells monitor the health status of cells by identifying foreign peptides on their surface. Peptides are presented on the surface of cells in complex with major histocompatibility complex (MHC) molecules. Depending on the cell type, two different MHC molecules are found: class I and class II. The T cell receptors (TCRs) allow the recognition of the anomalous presented peptides. The binding of TCRs with peptide-MHC (pMHC) complexes - also known as TCR recognition - constitute a necessary step for immune response. Only if TCR recognition takes places, cytokines can be released, leading to the death of a target cell. Understanding the rules that govern TCR recognition represents a fundamental step towards the development of personalized and more effective cancer treatments and vaccines.

It is an object of the present invention to improve and further develop a method and system of the initially described type in such a way that precise and effective predictions of binding between T cell receptors (TCRs) and peptides presented on the surface of cells are possible. This problem, in the computational biology literature, is often referred to TCR-peptide interaction (or binding) prediction. If the major histocompatibility complex (MHC) is also considered together with the peptide (pMHC), the present disclosure refers to TCR-pMHC binding prediction.

In accordance with the invention, the aforementioned object is accomplished by a computer-implemented method of predicting an interaction or binding between T cell receptors, TCRs, and peptides presented on the surface of cells or between peptides presented on the surface of cells and major histocompatibility complex, MHC, molecules, the method comprising a training stage, including: a) providing a training dataset of samples, wherein each sample comprises a multimodal tuple of molecules, and wherein each sample has assigned a ground truth label; b) inferring, for each sample of the training dataset using parametric encoders, a unimodal posterior distribution over latent encodings conditioned by the respective input molecules, and combining the parameters of the inferred unimodal posteriors in a single matrix; c) implicitly learning dependencies among the inferred unimodal posteriors by applying a trainable parametric function leveraging multi-head selfattention onto the matrix and, based thereupon, approximating a multimodal joint posterior distribution over latent encodings; d) sampling the approximated multimodal joint posterior distribution and using parametric decoders to predict ground truth labels for the sampled inputs; and e) updating the parameters of the encoders, of the function leveraging multi-head self-attention and of the decoders by minimizing an objective function that accounts for the error between the ground truth labels and the predictions obtained by step d).

The present disclosure provides a system and a method for predicting the interaction between T cell receptors (TCRs) and peptides presented on the surface of cells. First, the system is configured to approximate unimodal posterior distributions over latent encodings conditioned by the input molecules. Second, the system is configured to aggregate the predicted parameters of the posteriors in a single matrix. Third, using a mechanism based on multi-head self-attention, the system is configured to approximate a multimodal posterior distribution over latent encodings, which is then used to estimate the probability of binding between TCR and peptides. Due to the multimodal nature of the system, the invention can account for an arbitrarily large set of heterogeneous input modalities.

According to embodiments, for each input tuple of M molecules, e.g. (peptide, CDR3β, CDR3α, MHC), by means of M derivable parametric encoders (i.e. each data modality has a dedicated encoder), M unimodal posterior distributions over latent encodings conditioned by the respective input molecules may be determined. For each input molecule, the parameters of the posteriors may be approximated and stored in a vector. The M vectors of the parameters of the M unimodal posteriors may be combined (e.g. stacked) in a matrix which contains them all. According to embodiments, the matrix may serve as input to a trainable parametric function which leverages multi-head self-attention. This function allows the various unimodal posteriors to attend to each other, estimates their relative importance and implicitly learns inter-dependencies among them. In the present disclosure, this mechanism is sometimes referred to as AoD (Attention-of-Distributions). A multimodal posterior distribution over latent encodings is eventually approximated and output by the system.

According to embodiments, the method may include a training stage, in which the system is trained, for instance by performing the following steps:

1. Collecting a dataset of samples. Each sample may consist in a multimodal tuple of molecules, e.g. (peptide, CDR3β, CDR3α) for TCR-peptide interaction prediction, or (peptide, MHC) for peptide-MHC binding prediction. For each sample, a ground truth label is provided.

2. For each sample in the dataset, inferring a unimodal posterior distribution over latent encodings, conditioned by the input molecule. The parameters of the inferred posterior may be combined in a single matrix.

3. Using AoD to approximate a multimodal posterior.

4. Sampling from the multimodal posterior and inferring the ground truth label associated to the input by means of a parametric decoder.

5. Updating the parameters of the encoders, AoD and decoder by minimizing an objective function which accounts for the error between ground truth label and predictions of step 4.

6. Repeating steps 2-5 until convergence of a validation metric.

According to embodiments, the method may include an inference stage in which inference is performed with the pre-trained system. Inference may be performed by executing steps 2-4 above.

The methods and systems disclosed herein can be compared with state-of-the-art approaches in two different ways. First, the invention can be compared with general- purpose methods for approximating multimodal joint posteriors in deep variational inference (i.e. without focus on the specific biomedical problems disclosed herein). Second, the invention can be compared with state-of-the-art methods for multimodal TCR-peptide binding prediction.

Deep Multimodal Variational Inference

A fundamental step in deep multimodal variational inference consists in approximating a distribution of a latent variable Z (encoding), conditioned by all observed input modalities.

Grazioli et al.: “Microbiome-based disease prediction with multimodal variational information bottlenecks" PLOS Computational Biology, 2022, operates a so-called product of experts (PoE), which consists in approximating the variational multimodal joint posterior as a product of m unimodal posterior distributions:

. This approximation is only valid under the assumption of conditional independence between the modalities, which is not always true in real-world settings. Embodiments of the present invention based on the attention of distributions (AoD) allow to learn relationships of conditional dependence among the various data modalities in a data-driven fashion. An AoD module can, in fact, be trained to weight how much a given modality shall attend to the other ones. In presence of sufficient data, this allows for a more flexible approximation of the multimodal joint posterior.

Shi et al.: “Variational mixture-of-experts autoencoders for multi-modal deep generative models”, NeurlPS, 2019 proposes a so-called mixture of experts (MoE), i.e. the multimodal joint posterior is approximated as a sum of unimodal Gaussian distributions: A possible drawback of this approach compared to AoD

according to embodiments of the present disclosure is that the multimodal posterior has a different and more complex distribution compared to the unimodal posterior. This makes the inference step less intuitive.

TCR-peptide binding prediction

Montemurro et al.: “Nettcr-2.0 enables accurate prediction of tcr-peptide binding by using paired ter alpha and beta sequence data”, in Communications biology, 2021 proposes NetTCR-2.0, a deep learning model which can predict TCR-peptide binding prediction by jointly analyzing peptide, CDR3a and CDR3β chains. Compared to the methods disclosed herein, NetTCR-2.0 has the drawback of not learning a latent probability distribution, but just a mapping from inputs to a predicted binding score. This means there is no way to assess uncertainty. Additionally, the approach according to the present disclosure allows to observe the contribution of each modality to the predicted score, as a unimodal posterior is learned for each input modality.

There are several ways how to design and further develop the teaching of the present invention in an advantageous way. To this end, it is to be referred to the dependent claims on the one hand and to the following explanation of preferred embodiments of the invention by way of example, illustrated by the figure on the other hand. In connection with the explanation of the preferred embodiments of the invention by the aid of the figure, generally preferred embodiments and further developments of the teaching will be explained. In the drawing

Fig. 1 is a diagram schematically illustrating the general concept of an attention of distribution (AoD) mechanism for Gaussian posteriors,

Fig. 2 is a schematic view illustrating an architecture of stochastic encoders deployed in accordance with embodiments of the present disclosure,

Fig. 3 is a schematic view illustrating a trimodal attentive variational information bottleneck (AVIB) architecture in accordance with embodiments of the present disclosure,

Fig. 4 is a diagram illustrating TCR-peptide distributions for datasets used in accordance with embodiments of the present disclosure,

Fig. 5 is a diagram illustrating a length distribution of the amino acid chains for datasets used in accordance with embodiments of the present disclosure,

Fig. 6 is a diagram illustrating a class distribution of a + (i and (i datasets used in accordance with embodiments of the present disclosure, Fig. 7 is a table showing TCR-peptide interaction prediction results of experiments performed in accordance with embodiments of the present disclosure, and

Fig. 8 is a table showing multimodal posterior approximation results of experiments performed in accordance with embodiments of the present disclosure.

The TCR is a heterodimeric protein, which consists of an a-and a p-chain. The structure of the a- and p-chains determine the interaction with the pMHC complex. Each chain consists of three loops, referred to as complementarity determining regions 1 , 2 and 3 (CDR1 -3). According to recent studies (for reference, see N.L. La Gruta et al.: “Understanding the drivers of mhc restriction of t cell receptors”, in Nature Reviews Immunology, 18(7):467-478,2018), it is believed that the CDR3 loops primarily interact with the peptide of a given pMHC complex; the CDR1 and CDR2 loops interact with the MHC molecule. Hence, the CDR3 loops are primarily responsible for the peptide specificity.

The genomic recombination of variable, diversity, and joining (V, D, J) TCR-genes determines the diversity of the CDR3s. The p-chain (CDR3P) is the result of a recombination of the V, D and J genes, while the a-chain (CDR3a) derives from the recombination of the V and J genes. This implies that the CDR3ps present higher variability.

A broad spectrum of recent studies investigate the prediction of TCR-pMHC interactions. Most leverage data from the Immune Epitope Database (IEDB), VDJdb, and McPAS-TCR. These databases mainly contain CDR3p-data, and lack information on CDR3a.

Various recent studies have demonstrated that both the a- and p-chains carry information on the specificity of the TCR toward its cognate pMHC target. Singlecell (SC) technology is required to investigate the pMHC specificity on paired a- 1 p- chains. SC is expensive. Hence, the amount of publicly available data with both a- and p-chains is scarce. Several recent works have investigated TCR-pMHC and TCR-peptide interaction prediction. Various proposed approaches operate simple CDR3β alignment (cf. E. Wong et al.: “Trav1-2 cd8 t-cells including oligoconal expansions of mait cells are enriched in the airways in human tuberculosis”, in commun biol 2: 203, 2019). TCRdist (cf. P. Dash et al.: “Quantifiable predictive features define epitope-specific t cell receptor repertoires”, in Nature, 547(7661 ):89-93, 2017) computes CDR similarity-weighted distances. SETE (cf. Y. Tong et al.: “Sequence-based ensemble learning approach for ter epitope binding prediction”, in Computational Biology and Chemistry, 87:107281 , 2020) adopts k-mer feature spaces in combination with principal component analysis (PCA) and decision trees. Various methods adopt Random Forest to operate classification (cf., e.g., N. De Neuter et al.: “On the feasibility of mining cd8+ t cell receptor patterns underlying immunogenic peptide recognition”, in Immunogenetics, 70(3):159-168, 2018). ImRex (cf. P. Moris et al.: “Treating biomolecular interaction as an image classification problem - a case study on t-cell receptor-epitope recognition prediction”, in bioRxiv, 2019) tackles the problem with a method based on convolutional neural networks (CNNs). TCRGP (cf. E. Jokinen et al.: ’’Determining epitope specificity of t cell receptors with tergp”, in BioRxiv, pp. 542332, 2019) is a classification method which leverages a Gaussian process. ERGO (cf. I. Springer et al.: “Prediction of specific tcr-peptide binding from large dictionaries of tcr-peptide pairs”, in Frontiers in immunology, 11 :1803, 2020) is a deep learning approach that adopts long short-term memory (LSTM) networks and autoencoders to compute representations of the peptide and CDR3β. ERGO II (cf. I. Springer et al.: “Contribution of t cell receptor alpha and beta cdr3, mhc typing, v and j genes to peptide binding prediction”, in Frontiers in immunology, 12, 2021) is an updated version of ERGO which considers additional input modalities, i.e. CDR3asequence, V and J genes, MHC and T cell type. NetTCR-1.0 (cf. V.L Jurtz et al.: “Nettcr: sequence-based prediction of fer binding to peptide-mhc complexes using convolutional neural networks”, in BioRxiv, pp. 433706, 2018) and NetTCR- 2.0 (cf. A. Montemurro et al.: “Nettcr-2.0 enables accurate prediction of tcr-peptide binding by using paired tcra and 0 sequence data”, in Communications biology, 4(1 ):1— 13, 2021) propose a simple 1 D CNN-based model, integrating peptide and CDR3 sequence information for the prediction of TCR peptide specificity. The present disclosure provides a system and a method for predicting the interaction between T cell receptors (TCRs) and peptides presented on the surface of cells. According to an embodiment, the system may first approximate unimodal posterior distributions over latent encodings conditioned by the input molecules. Second, the system may aggregate the predicted parameters of the posteriors in a single matrix. Third, using a mechanism based on multi-head self-attention (e.g., following the mechanism described in A. Vaswani et al.: “Attention Is All You Need”, NeurlPS 2017, which is hereby incorporated by reference herein), the system may approximate a multimodal posterior distribution over latent encodings, which may then be used to estimate the probability of binding between TCR and peptides. Due to the multimodal nature of the system, embodiments of the present disclosure can account for an arbitrarily large set of heterogeneous input modalities, as will be described in detail for some use cases below.

Embodiments of the present disclosure use a variant of the Variational Information Bottleneck (VIB) approach, as disclosed in Alemi, A. A., Fischer, I., Dillon, J. V., and Murphy, K.: “Deep variational information bottleneck”, in arXiv preprint arXiv: 1612.00410, 2016, which is hereby incorporated by reference herein. VIB leverages variational inference to construct a lower bound on the Information Bottleneck (IB) objective (as disclosed in Tishby, N., Pereira, F. C., and Bialek, W.: “The information bottleneck method”, in arXiv preprint physics/0004057, 2000, which is hereby incorporated by reference herein). Although those skilled in the art are assumed to be sufficiently familiar with these concepts, the basic principles are described in the following for the ease of understanding. In the present disclosure, the following notation is adopted: X, Y,Z are random variables; x,y,z are multidimensional instances of random variables; and

are functions and

probability distributions parametrized by a vector of parameters

respectively; S represents a set.

Let Y be a random variable representing a ground truth label associated with an input random variable X. Let Z be a stochastic encoding of X coming from an intermediate layer of a deep neural network and defined by a parametric encoder

)representing the upstream part of such neural model. Following the general Information Bottleneck (IB) approach, the goal according to the embodiments of the present disclosure consists in learning an encoding Z which is (a) maximally informative about Y and (b) maximally compressive about X. Following an information theoretic approach, objective (a) implies maximizing the mutual information /(Z, Y; 0) between the encoding Z and the target Y, where:

A trivial solution for maximizing above Equation 1 would be the identity Z = X. This would ensure a maximally informative representation, but (b) places a constraint on Z. In fact, due to (b), one wants to “forget” as much information as possible about X. This leads to following the IB objective with

where is a Lagrange multiplier. The first term on the right hand side of

Equation 2 causes Z to be predictive of Y, while the second term constraints Z to be a minimal sufficient statistics of controls the trade-off between (a) and (b) and

is the mutual information between Z and Y parameterized by

.

As derived for the VIB, assuming and

are variational approximations

of the true

and

respectively, Equation 2 can be rewritten as:

where

is an auxiliary Gaussian noise variable,

is the Kullback- Leibler divergence and is a vector-valued parametric deterministic encoding

function (e.g., in the context of the present disclosure, a neural network). The introduction of e consists in the reparameterization trick (cf. Kingma, D. P. and Welling, M.: “Auto-encoding Variational bayes”, arXiv preprint arXiv: 1312.6114, 2013), which allows to write

where is now treated

as a deterministic variable. This formulation allows the noise variable to be independent of the model parameters. This way, it is easy to compute gradients of the objective in Equation 3 and optimize via backpropagation. In the present disclosure, embodiments of the proposed method let the variational approximate posteriors be multivariate Gaussian distributions with a diagonal covariance structure a valid reparameterization is

With the variational distribution set to a standard multivariate Gaussian

distribution

as done in practice, VIB can be viewed as a variational encoderdecoder analogous to the Variational Auto-Encoding, VAE (cf. again Kingma, D. P. and Welling, M.: “Auto-encoding variational bayes”, arXiv preprint arXiv:1312.6114, 2013), in which the latent encoding distribution can be viewed as a latent

posterior, and the variational decoding distribution can be viewed as a decoder.

As derived for the MVIB (cf. Grazioli, F., Siarheyeu, R., Pileggi, G., and Meiser, A.: “Microbiome-based disease prediction with multimodal variational information bottlenecks”, PLOS Computational Biology, 2022), the VIB objective of Equation 3 can be generalized by representing X as a collection of multimodal random input variables (or multiple input sequences) In light of this,

in the language of a variational encoder-decoder, the posterior of Equation

3 consists actually in the joint posterior conditioned

by the joint M available sequences. However, it should be noted that for predicting the interaction label Y from X, the M different sequences cannot be simply treated as M different modalities.

The Multimodal Variational Autoencoder, MVAE (cf. Wu, M. and Goodman, N.: “Multimodal generative models for scalable weakly-supervised learning”, arXiv preprint arXiv: 1802.05335, 2018) and MVIB approximate the joint posterior assuming that the M modalities are conditionally independent, given the

common latent variable Z. This allows to express the joint posterior as a product of unimodal approximate posteriors

and a prior

, referred to as Product of Experts (PoE): . The Mixture-

of-experts Mulitmodal Variational Autoencoder, MMVAE (cf. Shi, Y., Siddharth, N., Paige, B., and Torr, P. H.: “ Variational mixture-of-experts autoencoders for multi- modal deep generative models”, arXiv preprint arXiv:1911.03393, 2019) factorizes the joint multimodal posterior as a mixture of Gaussian unimodal posteriors, referred to as Mixture of Experts (MoE):

Fig.1 is a diagram schematically illustrating the general concept of an Attention of Distribution (AoD) mechanism 100 (herein sometimes also referred to as AoE, Attention of Experts) for Gaussian posteriors according to embodiments of the present disclosure. In Fig.1, ^_^ is the stochastic Gaussian encoder 101 of the

modality. In accordance with embodiments of the present invention, the unimodal posteriors are modelled as Gaussian distributions with diagonal structure:

e.g., following the proposals provided in Wu, M. and Goodman, N.: “Multimodal generative models for scalable weakly-supervised learning”, in arXiv preprint arXiv:1802.05335, 2018 or in Grazioli, F., Siarheyeu, R., Pileggi, G., and Meiser, A.: “Microbiome-based disease prediction with multimodal variational information bottlenecks”, in PLOS Computational Biology, 2022, which are both hereby incorporated by reference herein. As shown at 102 in Fig.1, by stacking the parameters (represented as column-vectors)

^ and

for all available

, , sequences, one can deﬁne the following two matrices

and

, where

is the dimensionality of the latent single-sequence posteriors:

and parametrize the latent prior.

According to embodiments of the present disclosure, it may be provided that the dependencies between the M single-sequence posteriors and the multi-sequence joint posterior are implicitly learn by means of multi-head self-attention (as shown at 104 in Fig. 1 including the AoD module) leveraging its power of capturing multiple complex interactions in X, and allowing possible missing sequences, in particular in accordance with the approach disclosed in A. Vaswani et al.: “Attention Is All You Need”, NeurlPS 2017:

Pool:

is a 1 D max pooling function, as shown at 106 in Fig. 1 , MultiHead is the standard multi-head attention block defined in Vaswani et al. The above equation is referred to as Attention of Distributions (AoD) in the context of the present disclosure.

Furthermore, in the context of the present disclosure, a multi-sequence VI B (Variational Information Bottleneck) which adopts AoD for modelling the multisequencejoint posterior is referred to as Attentive Variational Information Bottleneck (AVIB). The AVIB objective is:

where the multi-sequence posterior is modelled as

Embodiments of the present disclosure provide computer-implemented methods and systems that are configured to apply the Attentive Variational Information Bottleneck (AVIB) for tackling a fundamental problem of immuno-oncology predicting: TCR-peptide interactions. It can be demonstrated that AVIB significantly outperforms existing baseline methods. As described above, AVIB may be implemented as a multimodal generalization of the Variational Information Bottleneck (VI B) capable of learning a joint representation of multiple input data modalities through self-attention. In the context of the present disclosure, the input modalities may include the peptide, the CDR3a, and the CDR3β. The model may learn to predict whether the binding between peptide and the TCR takes place or not.

According to an embodiment, the present disclosure provides methods and systems that utilize an approach for combining unimodal posteriors in a joint multimodal posterior using self-attention, termed Attention of Distribution (AoD) herein.

According to an embodiment, the present disclosure provides methods and systems that use a simple, hyperparameter-free approach for the detection of out-of- distribution (OOD) amino acid chains, which leverages the multimodal posterior distribution over latent encodings learned by AVIB.

According to embodiments of the present disclosure, there are three data modalities:

and

For each modality, the dedicated stochastic encoder 101 may have the form

and

are the two output branches 105 of the neural stochastic encoder 101 , as schematically illustrated in Fig. 2.

The neural stochastic encoder 101 of Fig. 2 presents an architecture, which is inspired by NetTCR-2.0 (cf. A. Montemurro et al.: “Nettcr-2.0 enables accurate prediction of tcr-peptide binding by using paired tcra and 0 sequence data”, in Communications biology, 4(1):1-13, 2021). It encodes the peptides by using BLOSUM50 encoding 110 and operates 1 D convolutions 114 of the encoded peptides 112 with kernel 116 sizes 1 , 3, 5, 7 and 9. After the convolutions 114, 1 D max pooling 118 is operated, followed by ReLu activation functions 120. The obtained vectors are then concatenated, as shown at 122. Eventually, two parallel fully-connected layers 105 output two vectors of size d_z for

and

is the size of the bottleneck, i.e. the dimension of Z. d_z can be set via hyperparameter optimization. For a more stable computation, one may let

model the logarithm of the variance logo-².

According to embodiments of the present disclosure, it may be provided that the corresponding decoder consists in a simple neural network with fully-connected layers and ReLu activations. This implements the binary classification.

Fig. 3 is a schematic view illustrating a trimodal Attentive Variational Information Bottleneck (AVIB) system in accordance with embodiments of the present disclosure. As will be appreciated by those skilled in the art, higher or lower modalities can be implemented in a corresponding fashion. The system comprises parametric encoders 101 , the AoD (or AoE) mechanism 104 described in detail above, and parametric decoders 124.

According to embodiments, in the loss function (see above equation (6)),

and the latent prior may be treated as d_z-dimensional spherical Gaussian

distributions. An embodiment may set can be set via

hyperparameter optimization.

According to embodiments, the networks may be trained using the Adam or Stochastic Gradient Descent optimizers. For example, the optimizer may implement a learning rate of 10³ and a L2 weight decay with

The batch size may be set to 4096. 0.3 drop-out may be used at training time. A cosine annealing learning rate scheduler with a period of 10 epochs may be adopted (cf. I. Loshchilov et al.: “Stochastic gradient descent with warm restarts”, arXiv preprint arXiv: 1608.03983, 2016). The training may be performed for 200 epochs and, in order to avoid overfitting, the best model may be selected by saving the weights corresponding to the epoch where the AUROC is maximum on the validation set. The validation set may be obtained via 80/20 stratified random split of the training set. Training and test sets may be obtained via a non-stratified 80/20 split of the whole dataset. Experiments may be repeated 5 times with different training/test splits to ensure unbiased performance evaluation.

All experiments may be performed, e.g., on a CentOS Linux 8 machine with NVIDIA GeForce RTX 2080 Ti GPUs and CUDA 10.2 installed. Algorithms may be implemented, e.g., in Python 3.6 using PyTorch version 1.10.

In what follows, the datasets that may be used in connection with the present disclosure for the TCR-peptide interaction prediction problem are described. In particular, human TCR-peptide data extracted from IEDB (cf. Vita et al.: “The immune epitope database (iedb): 2018 update”, in Nucleic acids research, 47(D1):D339-D343, 2019), VDJdb (cf. Bagaev et al.: “Vdjdb in 2019: database extension, new analysis infrastructure and a t-cell receptor motif compendium”, in Nucleic Acids Research, 48(D1):D1057-D1062, 2020), and McPAS-TCR (cf Tickotsky et al.: “Mcpas-tcr: a manually curated catalogue of pathology-associated t-cell receptor sequences”, in Bioinformatics, 33(18):2924-2929, 2017) may be combined. Additionally, the dataset proposed by Klinger et al.: “Multiplex identification of antigen-specific t cell receptors using a combination of immune assays and immune receptor sequencing” in PLoS One, 10(10):e0141561 , 2015, referred to as MIRA set is considered (The MIRA set is publicly available in the NetTCR-2.0 repository: https://github.eom/mnielLab/NetTCR-2.0/tree/main/data). Overall, 271.366 human TCR-peptide samples were available. The data were organized creating the following datasets: a+β set. 117.753 samples out of 271.366 present peptide information, together with both CDR3aand CDR3βchains. In the present disclosure, this subset is referred to as the a + β set. The ground truth label is a binary variable which represents whether binding between peptide and TCR chains takes place. βset. 153.613 samples out of 271.366 present peptide and CDR3P information. For these samples, the CDR3a chain is missing. This subset is referred to as the β set.

Human TCR set. The totality of the human TCR-peptide data (i.e. a + β set U β set) is referred to as the Human TCR set. Non-human TCR set. In addition to the human TCR data, 5036 non-human TCR samples were extracted from the VDJdb database, which were used as OOD (out- of-distribution) samples. These samples come from mice and macaques and present peptide and CDR3β information. These samples are referred to as Non - human TCR set.

Human MHC set. A second set of OOD samples were created, composed of 463.684 peptide-MHC pairs. The peptide sequences are taken from the Human TCR set, i.e. the peptide information is shared among ID and OOD sets. The MHC sequences are amino acid chains corresponding to human MHC alleles (The MHC sequences were extracted from the PUFFIN repository: https://github.com/gifford-lab/PUFFIN). These samples are referred to as Human MHC set.

Fig. 4 depicts the distributions of the human TCR data, i.e. Peptide, CDR3a and CDR3β distributions, for both the a + β set and the β set. In Fig. 4, a point on the x- axis represents one unique chain of amino acids. The y-axis represents how many samples present that specific chain. Samples are sorted by count considering the a + β set. It is possible to observe that the two datasets have similar peptide distributions, but present different CDR3β sequences.

Fig. 5 depicts a length distribution of the datasets used in the context of the present disclosure. The length consists in the number of amino acids which constitute the peptides, CDR3α, CDR3β and MHC molecules. All these types of molecules are sequences of amino acids.

Fig. 6 depicts the class distributions of the β set and a + β set. The class distributions of the Non - human TCR set and Human MHC set are not reported, as they were only used for OOD (Out-Of-Distribution) detection experiments, and not for TCR-peptide interaction prediction.

It is possible to observe that the TCR data presents many more non-binding samples (0), compared to binding ones (1). This class imbalance may be handled by means of balanced batch sampling, i.e. when a batch is sampled at training time, it may be ensured that the numbers of binding and non-binding samples are essentially equal.

Pre-processing

Peptides, CDR3a and CDR3β chains are sequences of amino acids. The 20 amino acids translated by the genetic code are in general represented as English alphabet letters. In accordance with embodiments of the present disclosure, the amino acid sequences pre-processed using BLOSUM50 encodings (cf. Henikoff, S. and Henikoff, J. G.: “Amino acid substitution matrices from protein blocks”, in Proceedings of the National Academy of Sciences, 89(22):10915-10919, 1992). This allows to represent a sequence of N amino acids as an 20 x N matrix. The elements of the matrix are natural numbers that represent similarity scores among amino acids. After performing BLOSUM50 encoding, features standardization may be operated by removing the mean and scaling to unit variance. As the length of the amino acid chains is not constant (see Fig. 5), 0-padding may be operated after the BLOSUM50 encoding. This ensures that all matrices have shape 20 x N_max, where N_max is the length of the longest chain.

TCR-peptide Interaction Prediction

In order to evaluate the predicting capabilities of AVIB on the TCR-peptide interaction prediction task, experiments were performed on three datasets: the a + (3 set, the (3 set and the Human TCR set. For the (3 set and the Human TCR set, experiments were performed in the bimodal setting, i.e. samples are (.^xPeptide, ^xcDR3β) 2-tuples. For the a + (3 set, bimodal experiments were performed, as well as trimodal experiments considering (x_PepMe, x_CDR3a, x_CDR3β) 3-tuples. The approach proposed herein (briefly referred to as AVIB) is benchmarked against two state-of-the-art deep learning methods for TCR-peptide interaction predictions:

ERGO II (cf. Springer et aI: “Prediction of specific tcr-peptide binding from large dictionaries of tcr-peptide pairs”, in Frontiers in immunology, 11 :1803, 2020) and NetTCR-2.0 (cf .Montemurro et aI: “Nettcr-2.0 enables accurate prediction of tcr- peptide binding by using paired tcra and β sequence data”, in Communications biology, 4(1): 1-13, 2021). Additionally, AVIB is benchmarked against the LUPI- SVM (cf. Abbasi et al.: “Learning protein binding affinity using privileged information”, in BMC bioinformatics, 19(1 ):1 — 12 , 2018), by leveraging the a-chain at training time as privileged information. For all benchmark methods, the original publicly available implementations were adopted.

The experimental results are summarized in Table 1 shown in Fig. 7. For evaluation, the area under the receiver operator characteristic (ALIROC) curve, the area under the precision-recall (AUPR) curve and the F1 score (F1) were computed on the test sets. 5 repeated experiments with different independent 80/20 training/test random splits were performed for unbiased performance evaluation.

As can be see, on the (3 set, the method according to embodiments disclosed herein obtains ~4% higher ALIROC and 8% higher ALIPR compared to the best baseline, ERGO II. On the β set U a + β set, AVIB outperforms ERGO II by achieving ~3% higher ALIROC and ~4% higher ALIPR. On the a + β set, in the bimodal setting, AVIB compares with ERGO II. In the trimodal setting, when also considering the a- chain, AVIB obtains 1 % higher ALIROC, ~4% higher ALIROC and ~6% higher F1 score.

Multimodal Posterior Approximation

This section provides a comparison of two techniques for approximating Gaussian joint posteriors: AoD as described herein and PoE. AVIB, which employs AoD, is benchmarked against MVIB (cf. Grazioli et al.: “Microbiome-based disease prediction with multimodal variational information bottlenecks”, PLOS Computational Biology, 2022), which employs PoE. Experiments and benchmark were performed in the bimodal and trimodal settings. Table 2 shown in Fig. 8 presents the TCR-peptide interaction prediction results on the α + β set. AoD achieves best results in both the bimodal and trimodal settings. The ALIPR score, in particular, improves on PoE up to 2%.

In the following, some specific use cases of the methods and systems described herein in accordance with embodiments of the present disclosure, are described in detail. It is expressly noted that the described use cases are only exemplary and that further applications are straightforward and readily practicable, as will be appreciated by those skilled in the art.

TCR-peptide binding prediction for personalized cancer vaccines

Tumors harbor mutations, of which some can be recognized by the patients’ TCRs. Therapeutic cancer vaccines consist of patient- and therefore tumor-specific mutations meant to stimulate the patient’s immune response against their own tumor. In order to select those mutations that have the highest likelihood to induce an immune response, it is important to know whether the patient’s T cell receptor repertoire is able to recognize the immunogenic mutations (neoantigens), i.e. if there are TCRs present that are able to bind to the associated peptides. Being able to reliably predict if TCRs bind to neoantigens can help to improve the selection process for neoantigen candidates that will be used in a therapeutic cancer vaccine.

T cell engineering

Tumor antigens shared by patients are a highly interesting target for cancer immunotherapy, especially adoptive T cell therapy, where a tumor-recognizing TCR is introduced in the patient’s T cells. These TCRs need to fulfil demanding requirements regarding their safety and efficacy, as shared antigens are usually not strictly tumor-specific. Therefore, prediction of TCR-peptide interaction allows the modelling and engineering of suitable TCRs, which can be evaluated for their safety and efficacy before any wet lab experiments are necessary.

TCR cross-reactivity assessment

Therapeutic TCRs can be used to target tumors or other types of cells depending on the MHC presented peptides. However, TCRs do usually not bind to one single peptide but multiple peptides. In order to identify suitable TCR candidates for therapeutic purposes, it is important to test for potential cross-reactivity, i.e. to verify that the TCR does not recognize healthy cells. Being able to predict crossreactivities of a TCR before experimental tests can significantly speed up and lower the costs of the process of candidate identification. Peptide-MHC binding and interaction prediction for personalized cancer vaccines

Although the present disclosure focuses on TCR-peptide interaction prediction, in an embodiment the system may also be used to predict peptide-MHC binding and presentation. In this embodiment, the set of MHC molecules may be taken from a specific patient, and the set of peptides are based on mutations present in the cancerous cells of the patient. For all (peptide, MHC) combinations, predictions are made. In this setting, the first modality is the peptide, while the second modality is the MHC molecule. Depending on how the model is trained, the predicted scores can represent either the likelihood of having binding between the two molecules, or the likelihood that the peptide-MHC complex is presented on the surface of the cell. The peptides with the highest likelihood of being presented are good candidates to be included in a personalized cancer vaccine for that patient. The predicted candidates are the output of the peptide-MHC interaction prediction system. In modern cancer vaccines design systems, these candidates may then be processed by further downstream modules, e.g. to predict TCR recognition, or to filter-out peptides which could trigger autoimmune response.

AoE for sensor fusion (Computer vision applications)

AoE can be used as a building block of a broad spectrum of multimodal learning systems, beyond the TCR-specific setting, and also with non-molecular data. Approximating a multimodal posterior from heterogeneous unimodal posteriors can in fact be particularly helpful in sensor fusion for combining various types of sensor data. RGB, infrared and depth images, as well as radar and point cloud information can be combined using AoE to detect, segment and/or classify objects in an environment with higher confidence. In robotics, the output of the sensor fusion module is provided to, e.g., the planning algorithms, or to additional scene understanding algorithms.

Considering the above, methods in accordance with embodiments of the present disclosure can be suitable applied to improve and reinforce existing software frameworks for vaccine design. In particular, the system proposed in the present disclosure can be used to predict if neoantigen vaccine candidates are in fact recognized by a patient’s T cells. This makes the present invention particularly suited to be combined with immune profiling technology. As of now, most existing immune profiling approaches only leverage a so-called distance-from-self score to predict whether a presented neoantigen is in fact recognized by specific T cells. This is based on the optimistic assumption that if a neoantigen is sufficiently different to the healthy genome of the patient, the corresponding neoeptitopes will be recognized by some T cell. This is however not always true in practice. Hence, predicting the binding between peptides and TCRs in accordance with the methods disclosed herein can be a potential key factor in the development of more effective therapies.

To summarize, embodiments of the present disclosure provide a multimodal generalization of the Variational Information Bottleneck (VIB), which is sometimes briefly denoted AVIB herein and which leverages multi-head self-attention to implicitly approximate the posterior distribution over latent encodings conditioned by multiple input modalities. In accordance with embodiments of the present disclosure, AVIB may be applied to the TCR-peptide interaction prediction problem, an important challenge in immuno-oncology. It has been shown that the methods disclosed herein significantly improve on the baselines ERGO II and NetTCR-2.0.

Many modifications and other embodiments of the invention set forth herein will come to mind to the one skilled in the art to which the invention pertains having the benefit of the teachings presented in the foregoing description and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

C l a i m s

1. A computer-implemented method of predicting an interaction or binding between T cell receptors, TCRs, and peptides presented on the surface of cells or between peptides presented on the surface of cells and major histocompatibility complex, MHC, molecules, the method comprising a training stage, including: a) providing a training dataset of samples, wherein each sample comprises a multimodal tuple of molecules, and wherein each sample has assigned a ground truth label; b) inferring, for each sample of the training dataset using parametric encoders, a unimodal posterior distribution over latent encodings conditioned by the respective input molecules, and combining the parameters of the inferred unimodal posteriors in a single matrix; c) implicitly learning dependencies among the inferred unimodal posteriors by applying a trainable parametric function leveraging multi-head self-attention onto the matrix and, based thereupon, approximating a multimodal joint posterior distribution over latent encodings; d) sampling the approximated multimodal joint posterior distribution and using parametric decoders (124) to predict ground truth labels for the sampled inputs; and e) updating the parameters of the encoders, of the function leveraging multihead self-attention and of the decoders by minimizing an objective function that accounts for the error between the ground truth labels and the predictions obtained by step d).

2. The method according to claim 1 , further comprising: repeating steps b)-e) until convergence of a predefined validation metric.

3. The method according to claim 1 or 2, the method further comprising an inference stage, including: providing a yet unseen and unlabeled sample comprising a multimodal tuple of molecules; and using the approximated multimodal posterior distribution obtained in the training stage for estimating a probability of binding between TCRs and peptides and/or between peptides and MHC molecules.

4. The system according to any of claims 1 to 3, wherein the parametric encoders (101) operate 1 D convolutions (114) of encoded molecules (112), followed by 1 D max pooling (118) and Rectified Linear Unit, ReLU, activation functions (120).

5. The system according to any of claims 1 to 4, wherein the parametric decoders (124) include a neural network with a number of fully connected layers and ReLU activation functions.

6. The method according to any of claims 1 to 5, wherein the multimodal tuple of molecules of the samples includes at least a specific TCR and a specific peptide.

7. The method according to any of claims 1 to 6, wherein the multimodal tuple of molecules of the samples includes a peptide, a CDR3β molecule and a CDR3a molecule.

8. The method according to any of claims 1 to 7, wherein the estimated probabilities of binding between TCRs and peptides are used for selecting patientspecific neoantigen candidates.

9. The method according to any of claims 1 to 8, wherein the estimated probabilities of binding between TCRs and peptides are used for modelling and/or engineering tumor-recognizing TCRs.

10. The method according to any of claims 1 to 9, wherein the estimated probabilities of binding between TCRs and peptides are used for predicting crossreactivities of a TCR.

11. The method according to any of claims 1 to 10, wherein the estimated probabilities of binding and interaction between peptides and MHCs are used for developing personalized cancer vaccines.

12. A system for predicting an interaction or binding between T cell receptors, TCRs, and peptides presented on the surface of cells or between peptides presented on the surface of cells and major histocompatibility complex, MHC, molecules, in particular for execution of a method according to any of claims 1 to 11 , the system comprising one or more processors which, alone or in combination, are configured to provide for execution of the following steps: a) providing a training dataset of samples, wherein each sample comprises a multimodal tuple of molecules, and wherein each sample has assigned a ground truth label; b) inferring, for each sample of the training dataset using parametric encoders, a unimodal posterior distribution over latent encodings conditioned by the respective input molecules, and combining the parameters of the inferred unimodal posteriors in a single matrix; c) implicitly learning dependencies among the inferred unimodal posteriors by applying a trainable parametric function leveraging multi-head self-attention onto the matrix and, based thereupon, approximating a multimodal joint posterior distribution over latent encodings; d) sampling the approximated multimodal joint posterior distribution and using parametric decoders (124) to predict ground truth labels for the sampled inputs; and e) updating the parameters of the encoders (101), of the function leveraging multi-head self-attention and of the decoders (124) by minimizing an objective function that accounts for the error between the ground truth labels and the predictions obtained by step d).

13. The system according to claim 12, wherein the parametric encoders (101) are configured to operate 1 D convolutions (114) of encoded molecules (112), followed by 1 D max pooling (118) and Rectified Linear Unit, ReLU, activation functions (120).

14. The system according to claims 12 or 13, wherein the parametric decoders (124) include a neural network with a number of fully connected layers and ReLU activation functions.

15. A tangible, non-transitory computer-readable medium containing instructions which, upon execution by one or more processors with access to memory, provide for execution of a method according to any of claims 1 to 11.