WO2024115638A1 - Method to evaluate the conspicuousness of an epitope towards the repertoire of t-cell receptors - Google Patents
Method to evaluate the conspicuousness of an epitope towards the repertoire of t-cell receptors Download PDFInfo
- Publication number
- WO2024115638A1 WO2024115638A1 PCT/EP2023/083687 EP2023083687W WO2024115638A1 WO 2024115638 A1 WO2024115638 A1 WO 2024115638A1 EP 2023083687 W EP2023083687 W EP 2023083687W WO 2024115638 A1 WO2024115638 A1 WO 2024115638A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- epitope
- tcr
- epitopes
- score
- conspicuousness
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 108091008874 T cell receptors Proteins 0.000 title description 41
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 title description 31
- 230000005867 T cell response Effects 0.000 claims abstract description 47
- 229960005486 vaccine Drugs 0.000 claims abstract description 22
- 238000010801 machine learning Methods 0.000 claims abstract description 21
- 239000000203 mixture Substances 0.000 claims abstract description 12
- 230000002163 immunogen Effects 0.000 claims abstract description 8
- 108090000765 processed proteins & peptides Proteins 0.000 claims description 31
- 238000012549 training Methods 0.000 claims description 24
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 20
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 19
- 150000001413 amino acids Chemical class 0.000 claims description 15
- 210000001744 T-lymphocyte Anatomy 0.000 claims description 14
- 230000028993 immune response Effects 0.000 claims description 11
- 108090000623 proteins and genes Proteins 0.000 claims description 11
- 102000004169 proteins and genes Human genes 0.000 claims description 11
- 210000004027 cell Anatomy 0.000 claims description 10
- 229920001184 polypeptide Polymers 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 8
- 150000007523 nucleic acids Chemical class 0.000 claims description 7
- 229960000074 biopharmaceutical Drugs 0.000 claims description 6
- 108020004707 nucleic acids Proteins 0.000 claims description 6
- 102000039446 nucleic acids Human genes 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 239000003814 drug Substances 0.000 claims description 5
- 210000000987 immune system Anatomy 0.000 claims description 5
- 238000000844 transformation Methods 0.000 claims description 5
- 210000000612 antigen-presenting cell Anatomy 0.000 claims description 4
- 241000894006 Bacteria Species 0.000 claims description 3
- 240000004808 Saccharomyces cerevisiae Species 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000000205 computational method Methods 0.000 claims description 3
- 210000003527 eukaryotic cell Anatomy 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 239000012678 infectious agent Substances 0.000 claims description 3
- 210000002540 macrophage Anatomy 0.000 claims description 3
- 102000029797 Prion Human genes 0.000 claims description 2
- 108091000054 Prion Proteins 0.000 claims description 2
- 241000700605 Viruses Species 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 201000010099 disease Diseases 0.000 claims description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 2
- 229940079593 drug Drugs 0.000 claims description 2
- 238000011478 gradient descent method Methods 0.000 claims description 2
- 238000012417 linear regression Methods 0.000 claims description 2
- 210000001236 prokaryotic cell Anatomy 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 238000012706 support-vector machine Methods 0.000 claims description 2
- 238000003556 assay Methods 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 abstract description 2
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 14
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 14
- 239000000427 antigen Substances 0.000 description 11
- 108091007433 antigens Proteins 0.000 description 11
- 102000036639 antigens Human genes 0.000 description 11
- 230000009258 tissue cross reactivity Effects 0.000 description 10
- 230000005847 immunogenicity Effects 0.000 description 9
- 230000004044 response Effects 0.000 description 8
- 210000003719 b-lymphocyte Anatomy 0.000 description 4
- 230000000717 retained effect Effects 0.000 description 4
- 230000006044 T cell activation Effects 0.000 description 3
- 230000005875 antibody response Effects 0.000 description 3
- 238000004166 bioassay Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000002560 therapeutic procedure Methods 0.000 description 3
- 108020004414 DNA Proteins 0.000 description 2
- 230000033289 adaptive immune response Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 208000023275 Autoimmune disease Diseases 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- 206010062016 Immunosuppression Diseases 0.000 description 1
- 108700005089 MHC Class I Genes Proteins 0.000 description 1
- 108700005092 MHC Class II Genes Proteins 0.000 description 1
- 102000043129 MHC class I family Human genes 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 102400000745 Potential peptide Human genes 0.000 description 1
- 101800001357 Potential peptide Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 210000005006 adaptive immune system Anatomy 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 210000004671 cell-free system Anatomy 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 210000001151 cytotoxic T lymphocyte Anatomy 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003828 downregulation Effects 0.000 description 1
- 239000012636 effector Substances 0.000 description 1
- 238000013209 evaluation strategy Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 230000001506 immunosuppresive effect Effects 0.000 description 1
- 230000028709 inflammatory response Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000001225 therapeutic effect Effects 0.000 description 1
- 230000003827 upregulation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
Definitions
- the present invention is in the field of immunology and medicine.
- the present invention is particularly in the field of prediction of immunogenicity of a peptide.
- Developing a new vaccine or other T-cell-based therapy aims to elicit an adaptive immune response for which T-cells are one of the most important drivers.
- the epitopes or antigens included in the therapy are selected based on their ability to elicit this response, which is termed immunogenicity.
- the present invention relates to a method of predicting T-cell response to a query epitope with a known epitope-TCR binding according to claim 1.
- the method comprises calculating conspicuousness score of said epitope based on the number of known TCR sequences or TCR clusters responsive to said epitopes or by a centrality metric for said query epitope in the epitope-TCR bipartite graph wherein said calculated conspicuousness score represents the T-cell response to said query epitope.
- the present invention relates to a method of producing a vaccine composition according to claim 20.
- the present invention relates to a vaccine composition according to claim 22.
- the present invention relates to a non-immunogenic composition according to claim 23.
- Figure 1 schematically presents a conspicuousness score calculation and training of the machine learning model.
- Figure 2 shows the Receiver Operator Characteristic for the independent test data.
- the present invention concerns a method for predicting the T-cell response of an epitope by its conspicuousness to a repertoire of TCRs.
- the method is distinct as it focuses on the TCR diversity of the T-cell response after MHC binding has been achieved that is novel in the view of prior art.
- the methods disclosed can further predict the T-cell response of an arbitrary epitope without any known epitope- TCR binding via calculating their conspicuousness to a repertoire of TCRs by using machine learning algorithms.
- a compartment refers to one or more than one compartment.
- the terms "one or more” or “at least one”, such as one or more or at least one member(s) of a group of members, is clear per se, by means of further exemplification, the term encompasses inter alia a reference to any one of said members, or to any two or more of said members, such as, e.g., any >3, >4, >5, >6 or >7 etc. of said members, and up to all said members.
- protein As used herein, the terms “protein,” “polypeptide,” and “peptide” refer to a molecule comprising amino acids joined via peptide bonds. In general, “peptide” is used to refer to a sequence of 40 or less amino acids and “polypeptide” is used to refer to a sequence of greater than 40 amino acids.
- the term "immunogen” refers to a molecule which stimulates a response from the adaptive immune system, such as a T-cell response.
- the unlimiting examples of said responses may comprise an antibody response, a cytotoxic T-cell response, a T helper response, and a T-cell memory.
- An immunogen may stimulate an upregulation of the immune response with a resultant inflammatory response or may result in down-regulation or immunosuppression.
- the T-cell response may be a T regulatory response.
- An immunogen also may stimulate a B-cell response and lead to an increase in antibody titer.
- Another term used herein to describe a molecule or combination of molecules which stimulate an immune response is "antigen".
- epitope refers to a peptide sequence which elicits an immune response, from either T cells or B cells or antibody.
- An epitope may be a linear peptide or may comprise several discontinuous sequences which together are folded to form a structural epitope.
- T-cell epitopes are presented bound to a MHC molecule on the surface of an antigen-presenting cell.
- the term “query epitope” or “candidate epitope” refers to a polypeptide sequence that is predicted to bind to a major histocompatibility protein molecule by computerized methods, or as determined experimentally.
- the term “major histocompatibility complex (MHC)” refers to the MHC Class I and MHC Class II genes and the proteins encoded thereby. Molecules of the MHC bind small peptides and present them on the surface of cells for recognition by T-cell receptor-bearing T- cells.
- the T-cell receptor (TCR) is a protein complex found on the surface of T cells, or T lymphocytes that is responsible for recognizing fragments of antigen as peptides bound to MHC molecules.
- the binding between TCR and antigen peptides is of relatively low affinity and is degenerate: that is, many TCRs recognize the same antigen peptide and many antigen peptides are recognized by the same TCR.
- conspicuous epitope or “conspicuousness” of an epitope refers to how visible that epitope is to a set of TCRs or how recognizable the epitope is by a set of TCRs.
- Conspicuousness score refers to a calculated value of the probability of that epitope being visible by a set of TCRs. As such, conspicuousness score represents the conspicuousness of an epitope towards the repertoire of TCRs, effectively approximating the breadth of the T-cell response to an epitope.
- the invention relates to a method of predicting the T-cell response to a query epitope with known epitope-TCR binding.
- the method comprises, calculating conspicuousness score of said epitope wherein said score is calculated based on the number of known TCR sequences responsive to said epitope or number of TCR clusters, grouped by TCR sequence patterns, responsive to said epitope or by a centrality metric for each said epitopes in the epitope-TCR graph; wherein said calculated conspicuousness score represents and/or is used to predict the T-cell response to said query epitope.
- conspicuousness score is calculated by a centrality metric for each said epitopes in the epitope-TCR graph wherein said epitope-TCR graph is a bipartite graph.
- the ways of calculating the conspicuousness score for the epitopes which are present in a database of known epitope-TCR binding pairs are established.
- calculated conspicuousness score represents the conspicuousness of an epitope towards the repertoire of TCRs, effectively approximating the breadth of the T-cell response to an epitope.
- Two main factors can influence the breadth of response: the degeneracy of the TCR sequence that can result in binding and the different possible binding modes that the epitope-TCR complex can exist in.
- degeneracy is an important feature of the immune response mechanism that permits effective T-cell responses to a vast number of potential peptide sequences complexed to MHC molecules with specificity sufficient to distinguish between self and foreign peptides and thus to avoid autoimmune disease.
- the conspicuousness score of an epitope can be calculated from different size of epitope-TCR databases.
- the higher limit of database of known epitope-TCR binding pairs is unlimited.
- said databases may have larger than 75 unique pairs, larger than 100 unique pairs, larger than 1000, larger than 10000, larger than 100000, larger than 1000000, larger than 10000000 (10 million) unique pairs.
- the data bases have between 100 and 10000000 unique pairs, between 1000 and 10000000, between 10000 and 10000000, between 100000 and 10000000, between 100000 and 1000000, and all the ranges and subranges therein between.
- the conspicuousness score can be calculated for any epitope present in the database.
- the conspicuousness score is calculated based on the number of known TCRs that are responsive to an epitope for each epitopes present in a database. In further embodiments, the conspicuousness score of an epitope can also be calculated based on the number of TCR clusters that are grouped by TCR sequence patterns and are known to interact with said epitope.
- alternative ways of calculating of the conspicuousness score of an epitope can be achieved by a centrality metric for the epitope in the epitope- TCR bipartite graph, where epitopes belong to one class of nodes, TCRs belong to the other class of nodes, and edges represent known or predicted epitope-TCR associations.
- the centrality metric includes, but is not limited to degree centrality, eccentricity centrality, or PageRank centrality.
- one or more transformation is applied to said epitope-TCR bipartite graph to improve the quality of the model and consequently the metric.
- transformations include but are not limited to node filtering or edge weighting by measures including our unique confidence or specificity measures for known epitope-TCR pairs.
- transformations are applied to said centrally metric to improve the quality of the metric, preferably said transformations are node filtering or edge weighting.
- query epitopes with a conspicuousness score above a predefined threshold are selected for vaccine testing.
- suitable thresholds can be but are not limited to in a range of 0.5 to 100 of raw score. For example, 0.5 to 1, 5 to 100, 10 to 100, 15 to 100, 20 to 100, 30 to 100, 40 to 100, 50 to 100, 60 to 100, 70 to 100, 80 to 100, 85 to 100, 90 to 100, 95 to 100 or 0.5 to 90, 0.5 to 80, 0.5 to 70, 0.5 to 60, 0.5 to 50, 0.5 to 40, 0.5 to 30, 0.5 to 20, 0.5 to 15, 0.5 to 10, 0.5 to 5 of calculated raw value and all the ranges and subranges therein between.
- suitable threshold values may have Bayesian Poisson Regression (BPR) or P-values in the range of 0.1 to 0.00001, for example 0.05 to 0.00001, 0.01 to 0.00001, 0.001 to 0.00001 and all ranges and subranges therein between.
- BPR Bayesian Poisson Regression
- P-values in the range of 0.1 to 0.00001, for example 0.05 to 0.00001, 0.01 to 0.00001, 0.001 to 0.00001 and all ranges and subranges therein between.
- BPR Bayesian Poisson Regression
- the T-cell response of query epitopes with a conspicuousness score above a predefined threshold is measured and/or confirmed by means of biological assay.
- Any biological assays used to assess the T-cell response to an epitope, antibody or any other T-cell stimulatory molecules are known in the art can be used to measure/confirm the T-cell response of query epitopes. It is understood that a skilled person will choose suitable biological assays for the assessment of T-cell response.
- Some non-limiting examples of known methods of measuring T-cell activation include e but is not limited to the assessment of cell proliferation, the assessment of the regulation of activation markers, production of effector cytokines.
- the method is a computer-implemented method.
- the present invention relates to a computational method of predicting a T-cell response to a candidate epitope.
- the calculation of the conspicuousness score is extended to any arbitrary epitope including epitopes that are not in a database.
- the computational method comprises, generating a training data set by calculating the conspicuousness score of a plurality of epitopes with known epitope-TCR binding wherein said score is calculated based on the number of known TCR sequences responsive to each said epitopes or number of TCR clusters, grouped by TCR sequence patterns, responsive to each said epitope or by a centrality metric for each said epitopes in the epitope-TCR graph, preferably in the epitope-TCR bipartite graph; predicting conspicuousness score of candidate epitope using a machine learning algorithm model trained using said training data set.
- calculating the conspicuousness score of a candidate epitope comprises the steps of
- predicting the conspicuousness score of a candidate epitope comprises the steps of
- said method uses a predictive model
- said predictive model is a machine learning model trained using a training dataset.
- the training data set comprises a list of epitopes with known epitope-TCR binding and their conspicuousness score.
- the query and candidate epitopes represent a putative minimum amino acid sequence that can be recognized by an immune system component (e.g. by a T-cell receptor).
- the epitope is preferably a linear epitope.
- the input of the algorithm is any amino acid sequence as a candidate epitope.
- the amino acid sequence is 5 to 35 amino acids in length such as 5 to 30 amino acids, 5 to 25 amino acids, 5 to 20 amino acids, 5 to 15 amino acids, 5 to 10 amino acids, 5 to 7 amino acids and all the ranges and subranges therein between.
- amino acid sequence may be derived from one or more biological or synthetic sources, such as peptides or synthetic peptides.
- the epitope can be an amino acid sequence translated from a nucleic acid sequence.
- the input amino acid sequence is converted into numeric features. These numerical features represent the physicochemical properties of the epitope.
- the physicochemical properties of the epitope include but are not limited to the length of the amino acid sequence, HLA preference, amino acid frequencies, and molecular properties.
- the output of the algorithm is a single score and said score is a continuous value that represents the conspicuousness of the candidate epitope.
- conspicuousness score is supplemented by an additional interpretability measure which is preferably generated through a transformation function.
- the transformation function includes, but is not limited to the log of the probability (p-value) or the expect value (E-value) where P-value present the probability that the given conspicuousness score is obtained by chance and E-value presents the number of hits to be expected by chance.
- p-value log of the probability
- E-value expect value
- both p- value and E-value are generated by the background distribution of a set of random epitope sequences.
- the operation of the algorithm is based on a machine learning framework which can be any machine learning framework that can translate the numerical input features into a single score.
- the machine learning framework is chosen from the non-limiting list of multivariate regression, Bayesian regression, support vector machines, regression forests, linear regression, gradient descent methods, gradient boosting methods and/or neural networks or any combination therein.
- weights of the machine learning model are fit on training data.
- the training data set comprises a list of epitopes and their known conspicuousness score, derived from a TCR-epitope database using the disclosed method.
- the conspicuousness score of an epitope is used to predict T-Cell response of said epitope.
- a low conspicuousness score relates to a narrow or no T- Cell response whereas a high conspicuousness score relates to a broad T-cell response.
- the epitopes with the highest scores should be retained.
- the conspicuousness score of an epitope may also be used as a stand-in for B-cell response of the epitope.
- a high conspicuousness score relates to a good antibody response.
- a high conspicuousness score can relate to an antibody response with a high (e.g. > 10 IU) antigen-specific antibody titer after 7 days to 60 days after administration of the epitope or the antigen containing the epitope to an individual or a population.
- epitopes with low conspicuousness scores are expected to receive low or no immune response.
- the epitopes with the lowest scores should be retained.
- the epitopes with low conspicuousness are suitable for biologicals with low immune response, such as drugs or therapeutics.
- the epitopes with high conspicuousness scores are suitable for biologicals with high immune response, such as vaccines.
- the invention relates to a method of producing a vaccine composition, wherein said method comprises a step of predicting a T- cell response to an epitope.
- the method can be applied for predicting a T-cell response to an epitope with known epitope-TCR binding data according to the first aspect of the invention and /or for predicting a T-cell response to an epitope without a prior epitope-TCR binding data according to the second aspect of the invention.
- the vaccine composition comprises one or more of:
- RNA double or single stranded RNA, DNA; DNA-RNA hybrid
- At least one immune system cell e.g. antigen presenting cell, T-cell, B- cell, macrophage
- an infectious agent e.g. prokaryotic cell (bacteria)
- yeast eukaryotic cell
- virus eukaryotic virus
- the epitope is present in the substance or vaccine as part of an amino acid sequence of the same length as the epitope or longer than the epitope, and/or as nucleic acid encoding said amino acid sequence of the same length as the epitope or longer than the epitope.
- an amino acid chain may refer to a protein, a polypeptide or a peptide.
- nucleic acids refer to double- or single-stranded RIMA, DNA and/or DNA-RNA hybrid.
- Immune system cells are known in the art, the examples of immune system cells can be but are not limited to antigen-presenting cells, T-cells, B-cells, macrophages.
- an infectious agent can be a prokaryoytic cell, such as bacteria.
- the vaccine composition may comprise a eukaryotic cell, such as yeast.
- the invention in another aspect, relates to a vaccine composition comprising one or more epitopes with high T-cell response identified according to the methods of the present disclosure.
- the invention in another aspect, relates to a non-immunogenic composition for use in a method of treating a disease comprising one or more peptides with low T-cell response identified according to the disclosed methods.
- Figure 1 workflow presents conspicuousness score calculation and training of the machine learning model.
- FIG. 2 shows the Receiver Operator Characteristic for the independent test data.
- An antigen amino acid sequence of a target of interest is split into epitopes based on a sliding window approach with variable lengths of 7 to 25 amino acids.
- potential epitopes of interest can be entered directly.
- Each epitope in turn is given as input to the trained machine learning model that is deployed on a web platform or API or is packaged into an executable.
- the trained machine learning model For each epitope, the trained machine learning model returns a conspicuousness score together with an additional interpretability measure, both representing the breadth of the T-cell response to the given epitope.
- Epitopes and antigens with high scores are retained as good candidates for vaccines, while ones with low scores are retained as good candidates for biologicals.
- Standard regression metrics such as R-squared, RMSE and MAE due to the nature of the problem and used machine learning algorithms
- Spearman rank correlation the errors calculated by the regression metrics such as RMSE show how far the predicted conspicuousness score lies from the actual values. However, in the principal scenario where a list of potential candidate epitopes is given, it is more important to accurately rank them from least to most immunogenic. Therefore, the Spearman rank correlation is used to assess the algorithm's ranking ability.
- AUC Area Under the Curve
- Cross-validation the database that is used for training is repeatedly split into training and test partitions and the aforementioned metrics are calculated and averaged. This additionally allows the selection of the best model.
- Independent data set an additional data set containing epitopes with known immunogenicity is used to independently verify model performance.
- Results can be observed in Table 1 and Figure 2, showing a clear signal that the algorithm has the capability to predict the conspicuousness of an epitope.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Data Mining & Analysis (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Peptides Or Proteins (AREA)
Abstract
The current invention relates to a method of predicting T-cell response to a query epitope with a known epitope-TCR binding comprising calculating conspicuousness score of said epitope based on the number of known TCR sequences or TCR clusters responsive to said epitopes or by a centrality metric for each said epitopes in the epitope-TCR graph. The invention further relates to a second method to predict T- Cell response of any arbitrary epitopes using a machine learning algorithm. The T- cell response of epitopes can be used for the selection of molecules for vaccine and/or non-immunogenic compositions. The invention further relates to a third method for producing vaccine, to a vaccine, a biological and a data base.
Description
METHOD TO EVALUATE THE CONSPICUOUSNESS OF AN EPITOPE TOWARDS THE REPERTOIRE OF T-CELL RECEPTORS
FIELD OF THE INVENTION
The present invention is in the field of immunology and medicine. The present invention is particularly in the field of prediction of immunogenicity of a peptide.
BACKGROUND
Developing a new vaccine or other T-cell-based therapy aims to elicit an adaptive immune response for which T-cells are one of the most important drivers. The epitopes or antigens included in the therapy are selected based on their ability to elicit this response, which is termed immunogenicity.
Inversely, the development of certain products such as therapeutic biologica Is aims to prevent any immune response against the product as this may cause the degradation of the product, making the therapy inefficient. In such situations, a T-cell response is not desired, and the product should be of low immunogenicity.
The current state of the art uses major histocompatibility complex (MHC) binding as a proxy for T-cell immunogenicity. US11183272, EP2550529, W02019006022, US20210104294 disclose several methods to predict whether an epitope can be coupled and consequently presented by an MHC molecule as the presentation of the epitope by an MHC molecule is considered to be the most important step for T-cell activation and consequently, for elicitation of an adaptive immune response. The prediction of immunogenicity by using only MHC-binding approach considers one aspect of T-cell activation and neglects the other features that can also be used as indicators of T-cell response.
The T-cell receptor (TCR) is a protein complex found on the surface of T cells, or T lymphocytes that is responsible for recognizing fragments of antigen as peptides bound to major histocompatibility complex (MHC) molecules. CN111429965 and
WO2022072722 disclose other methods that attempt to predict if a single epitope can be bound by a specific T-cell receptor (TCR). However, these methods require prior knowledge of TCR sequences, which is not known in many applications. In addition, the prediction of this binding is highly inaccurate, and most reported performances in this field are due to imbalances or biases in the test set.
US20210147929 discloses a method to address the low accuracy prediction by acquiring an epitope-specific training data set, however, this is only possible for a limited number of epitopes.
There remains a need in the art for an improved method for more precise prediction of the immunogenicity of any potential epitope.
SUMMARY OF THE INVENTION
The present invention and embodiments thereof serve to provide a solution to one or more of above-mentioned disadvantages. To this end, in a first aspect the present invention relates to a method of predicting T-cell response to a query epitope with a known epitope-TCR binding according to claim 1. The method comprises calculating conspicuousness score of said epitope based on the number of known TCR sequences or TCR clusters responsive to said epitopes or by a centrality metric for said query epitope in the epitope-TCR bipartite graph wherein said calculated conspicuousness score represents the T-cell response to said query epitope.
In a second aspect, the present invention relates to a method of predicting T-cell response to a candidate epitope according to claim 6. The method comprises generating a training data set by calculating a conspicuousness score of a plurality of epitopes with known epitope-TCR binding wherein said score is calculated based on the number of known TCR sequences responsive to each said epitopes or number of TCR clusters, grouped by TCR sequence patterns, responsive to each said epitope or by a centrality metric for each said epitopes in the epitope-TCR bipartite graph; calculating conspicuousness score of said epitope using a machine learning algorithm model trained using said training data set to predict T-cell response of said epitope. A specific preferred embodiment relates to an invention according to claim 7.
In a third aspect, the present invention relates to a method of producing a vaccine composition according to claim 20.
In a fourth aspect the present invention relates to a vaccine composition according to claim 22.
In a fifth aspect, the present invention relates to a non-immunogenic composition according to claim 23.
In a sixth aspect, the present invention relates to a data set according to claim 24.
DESCRIPTION OF FIGURES
The following description of the figures of specific embodiments of the invention is merely exemplary in nature and is not intended to limit the present teachings, their application or uses.
Figure 1 schematically presents a conspicuousness score calculation and training of the machine learning model.
Figure 2 shows the Receiver Operator Characteristic for the independent test data.
DETAILED DESCRIPTION OF THE INVENTION
The present invention concerns a method for predicting the T-cell response of an epitope by its conspicuousness to a repertoire of TCRs. The method is distinct as it focuses on the TCR diversity of the T-cell response after MHC binding has been achieved that is novel in the view of prior art. The methods disclosed can further predict the T-cell response of an arbitrary epitope without any known epitope- TCR binding via calculating their conspicuousness to a repertoire of TCRs by using machine learning algorithms.
DEFINITIONS
Unless otherwise defined, all terms used in disclosing the invention, including technical and scientific terms, have the meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. By means of further guidance, term definitions are included to better appreciate the teaching of the present invention.
As used herein, the following terms have the following meanings:
"A", "an", and "the" as used herein refers to both singular and plural referents unless the context clearly dictates otherwise. By way of example, "a compartment" refers to one or more than one compartment.
"About" as used herein referring to a measurable value such as a parameter, an amount, a temporal duration, and the like, is meant to encompass variations of +/- 20% or less, preferably +/-10% or less, more preferably +/-5% or less, even more preferably +/-1% or less, and still more preferably +/-0.1% or less of and from the specified value, in so far such variations are appropriate to perform in the disclosed invention. However, it is to be understood that the value to which the modifier "about" refers is itself also specifically disclosed.
"Comprise", "comprising", and "comprises" and "comprised of" as used herein are synonymous with "include", "including", "includes" or "contain", "containing", "contains" and are inclusive or open-ended terms that specifies the presence of what follows e.g. component and do not exclude or preclude the presence of additional, non-recited components, features, element, members, steps, known in the art or disclosed therein.
Furthermore, the terms first, second, third and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order, unless specified. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
The recitation of numerical ranges by endpoints includes all numbers and fractions subsumed within that range, as well as the recited endpoints.
The expression "% by weight", "weight percent", "%wt" or "wt%", here and throughout the description unless otherwise defined, refers to the relative weight of the respective component based on the overall weight of the formulation.
Whereas the terms "one or more" or "at least one", such as one or more or at least one member(s) of a group of members, is clear per se, by means of further exemplification, the term encompasses inter alia a reference to any one of said
members, or to any two or more of said members, such as, e.g., any >3, >4, >5, >6 or >7 etc. of said members, and up to all said members.
As used herein, the terms "protein," "polypeptide," and "peptide" refer to a molecule comprising amino acids joined via peptide bonds. In general, "peptide" is used to refer to a sequence of 40 or less amino acids and "polypeptide" is used to refer to a sequence of greater than 40 amino acids.
As used herein, the term, "synthetic polypeptide," "synthetic peptide" and "synthetic protein" refer to peptides, polypeptides, and proteins that are produced by a recombinant process (i.e., expression of exogenous nucleic acid encoding the peptide, polypeptide or protein in an organism, host cell, or cell-free system) or by chemical synthesis.
As used herein, the term "immunogen" refers to a molecule which stimulates a response from the adaptive immune system, such as a T-cell response. The unlimiting examples of said responses may comprise an antibody response, a cytotoxic T-cell response, a T helper response, and a T-cell memory. An immunogen may stimulate an upregulation of the immune response with a resultant inflammatory response or may result in down-regulation or immunosuppression. Thus, the T-cell response may be a T regulatory response. An immunogen also may stimulate a B-cell response and lead to an increase in antibody titer. Another term used herein to describe a molecule or combination of molecules which stimulate an immune response is "antigen".
As used herein the term "epitope" refers to a peptide sequence which elicits an immune response, from either T cells or B cells or antibody. An epitope may be a linear peptide or may comprise several discontinuous sequences which together are folded to form a structural epitope. Typically, T-cell epitopes are presented bound to a MHC molecule on the surface of an antigen-presenting cell.
As used herein, the term "query epitope" or "candidate epitope" refers to a polypeptide sequence that is predicted to bind to a major histocompatibility protein molecule by computerized methods, or as determined experimentally. As used herein, the term "major histocompatibility complex (MHC)" refers to the MHC Class I and MHC Class II genes and the proteins encoded thereby. Molecules of the MHC bind small peptides and present them on the surface of cells for recognition by T-cell receptor-bearing T- cells.
The T-cell receptor (TCR) is a protein complex found on the surface of T cells, or T lymphocytes that is responsible for recognizing fragments of antigen as peptides bound to MHC molecules. The binding between TCR and antigen peptides is of relatively low affinity and is degenerate: that is, many TCRs recognize the same antigen peptide and many antigen peptides are recognized by the same TCR.
As used herein, the term "conspicuous" epitope or "conspicuousness" of an epitope refers to how visible that epitope is to a set of TCRs or how recognizable the epitope is by a set of TCRs. "Conspicuousness score” of an epitope as used herein refers to a calculated value of the probability of that epitope being visible by a set of TCRs. As such, conspicuousness score represents the conspicuousness of an epitope towards the repertoire of TCRs, effectively approximating the breadth of the T-cell response to an epitope.
Unless otherwise defined, all terms used in disclosing the invention, including technical and scientific terms, have the meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. By means of further guidance, definitions for the terms used in the description are included to better appreciate the teaching of the present invention. The terms or definitions used herein are provided solely to aid in the understanding of the invention.
Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to a person skilled in the art from this disclosure, in one or more embodiments. Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
DETAILED DESCRIPTION
In a first aspect, the invention relates to a method of predicting the T-cell response to a query epitope with known epitope-TCR binding. The method comprises, calculating conspicuousness score of said epitope wherein said score is calculated based on the number of known TCR sequences responsive to said epitope or number of TCR clusters, grouped by TCR sequence patterns, responsive to said epitope or by a centrality metric for each said epitopes in the epitope-TCR graph; wherein said calculated conspicuousness score represents and/or is used to predict the T-cell response to said query epitope.
In embodiments of discloses method, conspicuousness score is calculated by a centrality metric for each said epitopes in the epitope-TCR graph wherein said epitope-TCR graph is a bipartite graph.
In embodiments of the disclosed method, the ways of calculating the conspicuousness score for the epitopes which are present in a database of known epitope-TCR binding pairs are established.
In embodiments, calculated conspicuousness score represents the conspicuousness of an epitope towards the repertoire of TCRs, effectively approximating the breadth of the T-cell response to an epitope. Two main factors can influence the breadth of response: the degeneracy of the TCR sequence that can result in binding and the different possible binding modes that the epitope-TCR complex can exist in.
As known to a skilled person in the art, degeneracy is an important feature of the immune response mechanism that permits effective T-cell responses to a vast number of potential peptide sequences complexed to MHC molecules with specificity sufficient to distinguish between self and foreign peptides and thus to avoid autoimmune disease.
In embodiments, the conspicuousness score of an epitope can be calculated from different size of epitope-TCR databases. In embodiments, the higher limit of database of known epitope-TCR binding pairs is unlimited. In further embodiments, said databases may have larger than 75 unique pairs, larger than 100 unique pairs, larger than 1000, larger than 10000, larger than 100000, larger than 1000000, larger than 10000000 (10 million) unique pairs. In preferred embodiments, the data bases have between 100 and 10000000 unique pairs, between 1000 and 10000000, between 10000 and 10000000, between 100000 and 10000000, between 100000 and 1000000, and all the ranges and subranges therein between. According to
embodiments, the conspicuousness score can be calculated for any epitope present in the database.
In embodiments, the conspicuousness score is calculated based on the number of known TCRs that are responsive to an epitope for each epitopes present in a database. In further embodiments, the conspicuousness score of an epitope can also be calculated based on the number of TCR clusters that are grouped by TCR sequence patterns and are known to interact with said epitope.
In other embodiments, alternative ways of calculating of the conspicuousness score of an epitope can be achieved by a centrality metric for the epitope in the epitope- TCR bipartite graph, where epitopes belong to one class of nodes, TCRs belong to the other class of nodes, and edges represent known or predicted epitope-TCR associations.
In embodiments, the centrality metric includes, but is not limited to degree centrality, eccentricity centrality, or PageRank centrality.
In further embodiments, one or more transformation is applied to said epitope-TCR bipartite graph to improve the quality of the model and consequently the metric. These transformations include but are not limited to node filtering or edge weighting by measures including our unique confidence or specificity measures for known epitope-TCR pairs.
In preferred embodiments, further transformations are applied to said centrally metric to improve the quality of the metric, preferably said transformations are node filtering or edge weighting.
In embodiments, query epitopes with a conspicuousness score above a predefined threshold are selected for vaccine testing. Examples of suitable thresholds, calculated by using the disclosed method, can be but are not limited to in a range of 0.5 to 100 of raw score. For example, 0.5 to 1, 5 to 100, 10 to 100, 15 to 100, 20 to 100, 30 to 100, 40 to 100, 50 to 100, 60 to 100, 70 to 100, 80 to 100, 85 to 100, 90 to 100, 95 to 100 or 0.5 to 90, 0.5 to 80, 0.5 to 70, 0.5 to 60, 0.5 to 50, 0.5 to 40, 0.5 to 30, 0.5 to 20, 0.5 to 15, 0.5 to 10, 0.5 to 5 of calculated raw value and all the ranges and subranges therein between. In embodiments, suitable threshold values may have Bayesian Poisson Regression (BPR) or P-values in the range of 0.1
to 0.00001, for example 0.05 to 0.00001, 0.01 to 0.00001, 0.001 to 0.00001 and all ranges and subranges therein between. One skilled in the art will appreciate that different threshold values may be chosen in different circumstances for example, the threshold value may vary depending on available candidate epitopes, severity of the indication and/or urgency of the need for a vaccine.
In embodiments, the T-cell response of query epitopes with a conspicuousness score above a predefined threshold is measured and/or confirmed by means of biological assay. Any biological assays used to assess the T-cell response to an epitope, antibody or any other T-cell stimulatory molecules are known in the art can be used to measure/confirm the T-cell response of query epitopes. It is understood that a skilled person will choose suitable biological assays for the assessment of T-cell response. Some non-limiting examples of known methods of measuring T-cell activation include e but is not limited to the assessment of cell proliferation, the assessment of the regulation of activation markers, production of effector cytokines.
In embodiments, the method is a computer-implemented method.
In a second aspect, the present invention relates to a computational method of predicting a T-cell response to a candidate epitope.
In embodiments, the calculation of the conspicuousness score is extended to any arbitrary epitope including epitopes that are not in a database.
In embodiments the computational method comprises, generating a training data set by calculating the conspicuousness score of a plurality of epitopes with known epitope-TCR binding wherein said score is calculated based on the number of known TCR sequences responsive to each said epitopes or number of TCR clusters, grouped by TCR sequence patterns, responsive to each said epitope or by a centrality metric for each said epitopes in the epitope-TCR graph, preferably in the epitope-TCR bipartite graph; predicting conspicuousness score of candidate epitope using a machine learning algorithm model trained using said training data set.
In an embodiment, calculating the conspicuousness score of a candidate epitope comprises the steps of
- training the algorithm with training data set;
- feeding the algorithm with an amino acid sequence of said query epitope;
conversion of said sequence into numeric features wherein said numerical features represent the physicochemical properties of the epitope such as, sequence length, HLA preference, amino acid frequencies, and molecular properties; predicting the conspicuousness score of the query epitope using said numeric features.
In an embodiment, predicting the conspicuousness score of a candidate epitope comprises the steps of
- training the algorithm with training data set;
- feeding the algorithm with an amino acid sequence of said query epitope; conversion of said sequence into numeric features wherein said numerical features represent the physicochemical properties of the epitope such as, sequence length, HLA preference, amino acid frequencies, and molecular properties; predicting the conspicuousness score of the query epitope using said numeric features.
In embodiments, said method uses a predictive model, said predictive model is a machine learning model trained using a training dataset.
According to embodiments, the training data set comprises a list of epitopes with known epitope-TCR binding and their conspicuousness score.
The query and candidate epitopes represent a putative minimum amino acid sequence that can be recognized by an immune system component (e.g. by a T-cell receptor). The epitope is preferably a linear epitope.
In embodiments, the input of the algorithm is any amino acid sequence as a candidate epitope. In a preferred embodiment, the amino acid sequence is 5 to 35 amino acids in length such as 5 to 30 amino acids, 5 to 25 amino acids, 5 to 20 amino acids, 5 to 15 amino acids, 5 to 10 amino acids, 5 to 7 amino acids and all the ranges and subranges therein between.
In embodiments, amino acid sequence may be derived from one or more biological or synthetic sources, such as peptides or synthetic peptides. The epitope can be an amino acid sequence translated from a nucleic acid sequence.
In embodiments, the input amino acid sequence is converted into numeric features. These numerical features represent the physicochemical properties of the epitope.
In embodiments, the physicochemical properties of the epitope include but are not limited to the length of the amino acid sequence, HLA preference, amino acid frequencies, and molecular properties.
In embodiments, the output of the algorithm is a single score and said score is a continuous value that represents the conspicuousness of the candidate epitope.
In further embodiments, the conspicuousness score is supplemented by an additional interpretability measure which is preferably generated through a transformation function.
In embodiments, the transformation function includes, but is not limited to the log of the probability (p-value) or the expect value (E-value) where P-value present the probability that the given conspicuousness score is obtained by chance and E-value presents the number of hits to be expected by chance. In embodiments, both p- value and E-value are generated by the background distribution of a set of random epitope sequences.
In embodiments, the operation of the algorithm is based on a machine learning framework which can be any machine learning framework that can translate the numerical input features into a single score.
In embodiments, the machine learning framework is chosen from the non-limiting list of multivariate regression, Bayesian regression, support vector machines, regression forests, linear regression, gradient descent methods, gradient boosting methods and/or neural networks or any combination therein.
In embodiments, weights of the machine learning model are fit on training data. In embodiments, the training data set comprises a list of epitopes and their known conspicuousness score, derived from a TCR-epitope database using the disclosed method.
In embodiments, the conspicuousness score of an epitope is used to predict T-Cell response of said epitope. A low conspicuousness score relates to a narrow or no T- Cell response whereas a high conspicuousness score relates to a broad T-cell response. According to embodiments of present disclosure, when deploying disclosed method for vaccine design, the epitopes with the highest scores should be retained.
The conspicuousness score of an epitope may also be used as a stand-in for B-cell response of the epitope. In embodies, a high conspicuousness score relates to a good antibody response. For example, a high conspicuousness score can relate to an antibody response with a high (e.g. > 10 IU) antigen-specific antibody titer after 7 days to 60 days after administration of the epitope or the antigen containing the epitope to an individual or a population.
In embodiments, epitopes with low conspicuousness scores are expected to receive low or no immune response. When deploying this method for prioritizing biologicals according to the lowest probability of an unwanted immune response, the epitopes with the lowest scores should be retained. According to the embodiments of the disclosure, the epitopes with low conspicuousness are suitable for biologicals with low immune response, such as drugs or therapeutics. The epitopes with high conspicuousness scores are suitable for biologicals with high immune response, such as vaccines.
In a further aspect, the invention relates to a method of producing a vaccine composition, wherein said method comprises a step of predicting a T- cell response to an epitope. The method can be applied for predicting a T-cell response to an epitope with known epitope-TCR binding data according to the first aspect of the invention and /or for predicting a T-cell response to an epitope without a prior epitope-TCR binding data according to the second aspect of the invention.
In embodiments, the vaccine composition comprises one or more of:
- at least one amino acid chain (protein, polypeptide, peptide)
- at least one nucleic acid (double or single stranded RNA, DNA; DNA-RNA hybrid)
- at least one immune system cell (e.g. antigen presenting cell, T-cell, B- cell, macrophage),
- at least one an infectious agent (e.g. prokaryotic cell (bacteria),
- at least one eukaryotic cell (yeast),
- at least one virus,
- at least one prion, and the epitope is present in the substance or vaccine as part of an amino acid sequence of the same length as the epitope or longer than the epitope, and/or as nucleic acid encoding said amino acid sequence of the same length as the epitope or longer than the epitope.
As it would be obvious to a skilled person in the art, an amino acid chain may refer to a protein, a polypeptide or a peptide.
In embodiments, nucleic acids refer to double- or single-stranded RIMA, DNA and/or DNA-RNA hybrid.
Immune system cells are known in the art, the examples of immune system cells can be but are not limited to antigen-presenting cells, T-cells, B-cells, macrophages.
In embodiments, an infectious agent can be a prokaryoytic cell, such as bacteria.
In other embodiments, the vaccine composition may comprise a eukaryotic cell, such as yeast.
In another aspect, the invention relates to a vaccine composition comprising one or more epitopes with high T-cell response identified according to the methods of the present disclosure.
In another aspect, the invention relates to a non-immunogenic composition for use in a method of treating a disease comprising one or more peptides with low T-cell response identified according to the disclosed methods.
The present invention will be now described in more details, referring to examples that are not limitative.
EXAMPLES AND/OR DESCRIPTION OF FIGURES
Figure 1 workflow presents conspicuousness score calculation and training of the machine learning model.
Figure 2 shows the Receiver Operator Characteristic for the independent test data.
The present invention will now be further exemplified with reference to the following examples. The present invention is in no way limited to the given examples or to the embodiments presented in the figures.
Example 1: Operation of the method
An antigen amino acid sequence of a target of interest is split into epitopes based on a sliding window approach with variable lengths of 7 to 25 amino acids. Alternatively, potential epitopes of interest can be entered directly.
Each epitope in turn is given as input to the trained machine learning model that is deployed on a web platform or API or is packaged into an executable.
For each epitope, the trained machine learning model returns a conspicuousness score together with an additional interpretability measure, both representing the breadth of the T-cell response to the given epitope.
Epitopes and antigens with high scores are retained as good candidates for vaccines, while ones with low scores are retained as good candidates for biologicals.
Example 2: Evaluation of the algorithm for prediction of the conspicuousness of an epitope
Evaluation of the machine learning models is done using 3 different types of metrics:
1. Standard regression metrics such as R-squared, RMSE and MAE due to the nature of the problem and used machine learning algorithms
2. Spearman rank correlation: the errors calculated by the regression metrics such as RMSE show how far the predicted conspicuousness score lies from the actual values. However, in the principal scenario where a list of potential candidate epitopes is given, it is more important to accurately rank them from least to most immunogenic. Therefore, the Spearman rank correlation is used to assess the algorithm's ranking ability.
3. Area Under the Curve (AUC): the problem can be transformed into a binary classification, with either low immunogenicity or high immunogenicity, for which AUC gives an additional performance indicator.
These metrics are applied using two different evaluation strategies:
1. Cross-validation: the database that is used for training is repeatedly split into training and test partitions and the aforementioned metrics are calculated and averaged. This additionally allows the selection of the best model.
2. Independent data set: an additional data set containing epitopes with known immunogenicity is used to independently verify model performance.
Results can be observed in Table 1 and Figure 2, showing a clear signal that the algorithm has the capability to predict the conspicuousness of an epitope.
The results indicate that the present invention does not require knowledge of the TCR, as the score simply represents the size of the space and not the actual space and can be done for unseen epitopes.
The present invention is in no way limited to the embodiments described in the examples and/or shown in the figures. On the contrary, methods according to the present invention may be realized in many different ways without departing from the scope of the invention.
Claims
1. A method of predicting a T-cell response to a query epitope with known epitope-TCR binding wherein said method comprises
- calculating a conspicuousness score of said query epitope wherein, said score is calculated based on one or more of known TCR sequences responsive to said query epitope or number of TCR clusters, grouped by TCR sequence patterns, responsive to said query epitope or by a centrality metric for said epitope in the epitope-TCR graph; and predicting the T-cell response to query epitope based on said calculated conspicuousness score.
2. The method according to claim 1 wherein said query epitope is present in an epitope-TCR binding pair databases.
3. The method according to any of the previous claims, wherein said query epitopes with a conspicuousness score above a predefined threshold are selected for vaccine testing.
4. The method according to any of the previous claims, wherein the T-cell response of said query epitopes with a conspicuousness score above a predefined threshold is measured by means of a T-cell response assay.
5. The method of any of the previous claims, wherein said method is a computer-implemented method.
6. A computational method of predicting T-cell response to a candidate epitope comprising: a. Generating a training data set by calculating a conspicuousness score of a plurality of epitopes with known epitope-TCR binding wherein said score is calculated based on the number of known TCR sequences responsive to each said epitope or number of TCR clusters responsive to each said epitope or by a centrality metric for each said epitopes in the epitope-TCR graph; b. Calculating a conspicuousness score of candidate epitope using a machine learning algorithm model trained using said training data set; and
c. Predicting the T-cell response of candidate epitope based on said conspicuousness score. The method according to claim 6, wherein said calculating conspicuousness score of candidate epitope comprises i. training the algorithm with a training data set; ii. feeding said algorithm with an amino acid sequence of said candidate epitope; iii. conversion of said sequence into numeric features wherein said numerical features represent the physicochemical properties of the epitope such as, sequence length, HLA preference, amino acid frequencies, and molecular properties; iv. predicting the conspicuousness score of the query epitope using said numeric features. The method according to claim 7 wherein, said amino acid sequence is derived from a biological or a synthetic source. The method according to claims 6 to 8, wherein said method uses a predictive model, said predictive model is a machine learning model trained using a training dataset. The method according to claims 6 to 9, wherein said training data set comprises a list of epitopes with known epitope-TCR binding and their conspicuousness score. The method according to claim 10 wherein, said list of epitopes of said training data set is derived from a database of known epitope-TCR binding pairs. The method according to claims 6 to 11 wherein, the operation of said algorithm is based on a machine learning framework wherein said machine learning framework can be chosen from any machine learning framework that can translate the numerical input features into a single score.
The method according to claim 12, wherein said machine learning framework is chosen from multivariate regression, Bayesian regression, support vector machines, regression forests, linear regression, gradient descent methods, gradient boosting methods and neural networks. The method according to any of the previous claims wherein said TCR clusters are clustered based on their TCR sequence patterns. The method according to any of the previous claims wherein, said centrality metric is chosen from degree centrality, eccentricity centrality, or PageRank centrality. The method according to any of the previous claims wherein, further transformations applied to said centrally metric to improve the quality of the metric, preferably said transformations are node filtering or edge weighting. The method according to any of the previous claims wherein, said conspicuousness score of an epitope is used to predict T-Cell response of said epitope wherein, a low score relates to a narrow or no T-Cell response and a high score relates to a broad T-cell response. The method according to any of the previous claims wherein, said epitopes with low conspicuousness scores are suitable for biologicals with low immune response, such as drugs or therapeutics. The method according to any of the previous claims wherein, said epitopes with high conspicuousness scores are suitable for biologicals with high immune response, such as vaccines. A method of producing a vaccine composition, wherein said method comprises a step of predicting a T-cell response to a query epitope according to any of the claims 1 to 5, or a predicting a T-cell response for a candidate epitope according to any of the claims 6 to 19. The method according to claim 20 wherein the vaccine composition comprises one or more of: at least one amino acid chain (protein, polypeptide, peptide)
- at least one nucleic acid (double or single stranded RIMA, DNA; DNA-RNA hybrid)
- at least one immune system cell (e.g. antigen presenting cell, T-cell, B- cell, macrophage),
- at least one an infectious agent (e.g. prokaryotic cell (bacteria),
- at least one eukaryotic cell (yeast),
- at least one virus,
- at least one prion, and the epitope is present in the substance or vaccine as part of an amino acid sequence of the same length as the epitope or longer than the epitope, and/or as nucleic acid encoding said amino acid sequence of the same length as the epitope or longer than the epitope. A vaccine comprising one or more epitopes with high T-cell response identified according to the method of any of the previous claims. A non-immunogenic biological for use in a method of treating a disease comprising one or more peptides with low T-cell response identified according to the method of any of the previous claims. A data set comprises a list of epitopes present in one or more epitope-TCR binding databases and the conspicuousness score of said each epitope wherein said conspicuousness score is calculated according to the method of any of the previous claims.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263429296P | 2022-12-01 | 2022-12-01 | |
US63/429,296 | 2022-12-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024115638A1 true WO2024115638A1 (en) | 2024-06-06 |
Family
ID=89073128
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2023/083687 WO2024115638A1 (en) | 2022-12-01 | 2023-11-30 | Method to evaluate the conspicuousness of an epitope towards the repertoire of t-cell receptors |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024115638A1 (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2550529A1 (en) | 2010-03-23 | 2013-01-30 | Iogenetics, LLC. | Bioinformatic processes for determination of peptide binding |
WO2019006022A1 (en) | 2017-06-27 | 2019-01-03 | The Broad Institute, Inc. | Systems and methods for mhc class ii epitope prediction |
CN111429965A (en) | 2020-03-19 | 2020-07-17 | 西安交通大学 | T cell receptor corresponding epitope prediction method based on multiconnector characteristics |
WO2020174077A1 (en) * | 2019-02-28 | 2020-09-03 | Universiteit Antwerpen | Method for determining responsiveness to an epitope |
US20210104294A1 (en) | 2019-10-02 | 2021-04-08 | The General Hospital Corporation | Method for predicting hla-binding peptides using protein structural features |
US20210147929A1 (en) | 2019-10-11 | 2021-05-20 | Saint Louis University | Immune receptor analysis as diagnostic assay |
US11183272B2 (en) | 2018-12-21 | 2021-11-23 | Biontech Us Inc. | Method and systems for prediction of HLA class II-specific epitopes and characterization of CD4+ T cells |
WO2022072722A1 (en) | 2020-09-30 | 2022-04-07 | The Board Of Regents Of The University Of Texas System | Deep learning system for predicting the t cell receptor binding specificity of neoantigens |
-
2023
- 2023-11-30 WO PCT/EP2023/083687 patent/WO2024115638A1/en unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2550529A1 (en) | 2010-03-23 | 2013-01-30 | Iogenetics, LLC. | Bioinformatic processes for determination of peptide binding |
WO2019006022A1 (en) | 2017-06-27 | 2019-01-03 | The Broad Institute, Inc. | Systems and methods for mhc class ii epitope prediction |
US11183272B2 (en) | 2018-12-21 | 2021-11-23 | Biontech Us Inc. | Method and systems for prediction of HLA class II-specific epitopes and characterization of CD4+ T cells |
WO2020174077A1 (en) * | 2019-02-28 | 2020-09-03 | Universiteit Antwerpen | Method for determining responsiveness to an epitope |
US20210104294A1 (en) | 2019-10-02 | 2021-04-08 | The General Hospital Corporation | Method for predicting hla-binding peptides using protein structural features |
US20210147929A1 (en) | 2019-10-11 | 2021-05-20 | Saint Louis University | Immune receptor analysis as diagnostic assay |
CN111429965A (en) | 2020-03-19 | 2020-07-17 | 西安交通大学 | T cell receptor corresponding epitope prediction method based on multiconnector characteristics |
WO2022072722A1 (en) | 2020-09-30 | 2022-04-07 | The Board Of Regents Of The University Of Texas System | Deep learning system for predicting the t cell receptor binding specificity of neoantigens |
Non-Patent Citations (4)
Title |
---|
BI JINGSHU ET AL: "Prediction of Epitope-Associated TCR by Using Network Topological Similarity Based on Deepwalk", IEEE ACCESS, vol. 7, 18 October 2019 (2019-10-18), pages 151273 - 151281, XP011752446, DOI: 10.1109/ACCESS.2019.2948178 * |
DE NEUTER NICOLAS ET AL: "On the feasibility of mining CD8+ T cell receptor patterns underlying immunogenic peptide recognition", IMMUNOGENETICS, SPRINGER VERLAG, BERLIN, DE, vol. 70, no. 3, 4 August 2017 (2017-08-04), pages 159 - 168, XP036432332, ISSN: 0093-7711, [retrieved on 20170804], DOI: 10.1007/S00251-017-1023-5 * |
PRADYOT DASH ET AL: "Quantifiable predictive features define epitope-specific T cell receptor repertoires", NATURE, vol. 547, no. 7661, 21 June 2017 (2017-06-21), pages 89 - 93, XP055558049, DOI: 10.1038/nature22383 * |
ZVYAGIN IVAN V ET AL: "An overview of immunoinformatics approaches and databases linking T cell receptor repertoires to their antigen specificity", IMMUNOGENETICS, SPRINGER VERLAG, BERLIN, DE, vol. 72, no. 1-2, 18 November 2019 (2019-11-18), pages 77 - 84, XP036992942, ISSN: 0093-7711, [retrieved on 20191118], DOI: 10.1007/S00251-019-01139-4 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mohabatkar et al. | Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach | |
Bhasin et al. | Analysis and prediction of affinity of TAP binding peptides using cascade SVM | |
Trolle et al. | NetTepi: an integrated method for the prediction of T cell epitopes | |
Gutierrez-Arcelus et al. | Autoimmune diseases—connecting risk alleles with molecular traits of the immune system | |
Percus et al. | Predicting the size of the T-cell receptor and antibody combining region from consideration of efficient self-nonself discrimination. | |
JP7410138B2 (en) | Methods and systems for binding affinity prediction and methods for generating candidate protein binding peptides | |
US20040072246A1 (en) | System and method for identifying t cell and other epitopes and the like | |
JP2002502522A (en) | Prediction of relative binding motifs of biologically active and mimetic peptides | |
Racle et al. | Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopes | |
Pertseva et al. | Applications of machine and deep learning in adaptive immunity | |
Robinson et al. | Protein and peptide array analysis of autoimmune disease | |
JP6744909B2 (en) | Method and electronic system for predicting at least one fitness value of a protein and associated computer program product | |
EP4229640A1 (en) | Method, system and computer program product for determining peptide immunogenicity | |
Huang et al. | Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features | |
Ogata et al. | Automatic Sequence Design of Major Histocompatibility Complex Class I Binding Peptides Impairing CD8+ T Cell Recognition* 210 | |
WO2024115638A1 (en) | Method to evaluate the conspicuousness of an epitope towards the repertoire of t-cell receptors | |
US20220238188A1 (en) | Method for determining responsiveness to an epitope | |
Lu et al. | Immunoprofiling correlates of protection against SHIV infection in adjuvanted HIV-1 pox-protein vaccinated Rhesus Macaques | |
Huang et al. | A support vector machine approach for prediction of T cell epitopes | |
Cheng et al. | Prediction of continuous B-cell epitopes using long short term memory networks | |
JP2024514691A (en) | Genetic manipulation of antigen binding proteins | |
Xiao et al. | From bench to bedside via bytes: Multi-omic immunoprofiling and integration using machine learning and network approaches | |
Yasser et al. | Predicting protective linear B-cell epitopes using evolutionary information | |
US20240013860A1 (en) | Methods and systems for personalized neoantigen prediction | |
Rockberg et al. | Prediction of antibody response using recombinant human protein fragments as antigen |