KR20200090855A

KR20200090855A - Reduced presentation of conjugated epitopes for new antigens

Info

Publication number: KR20200090855A
Application number: KR1020207017969A
Authority: KR
Inventors: 브렌던 불릭-술리반; 토마스 프란시스 바우처; 로만 엘렌스키; 제니퍼 버스비
Original assignee: 그릿스톤 온콜로지, 인코포레이티드
Priority date: 2017-11-22
Filing date: 2018-11-21
Publication date: 2020-07-29
Also published as: JP2024012365A; IL274799A; CA3083097A1; EP3714275A4; US20210011026A1; AU2018373154A1; WO2019104203A1; JP2021503897A; CN111630602A; EP3714275A1; US11885815B2

Abstract

치료적 에피토프의 세트가 주어지면, 카세트 서열은 접합 에피토프가 환자에게 제시될 가능성을 감소시키도록 디자인된다. 카세트 서열은 카세트에서 치료적 에피토프 쌍 사이의 접합부에 걸쳐 있는 접합 에피토프의 제시를 고려하여 디자인된다. 카세트 서열은 카세트의 접합부와 관련된 거리 매트릭의 세트에 기초하여 디자인될 수 있다. 거리 매트릭은 인접한 에피토프의 쌍 사이에 걸쳐 있는 접합 에피토프 중 하나 이상이 제시될 가능성을 특정할 수 있다.Given a set of therapeutic epitopes, the cassette sequence is designed to reduce the likelihood that the conjugated epitope will be presented to the patient. Cassette sequences are designed taking into account the presentation of a conjugated epitope spanning the junction between pairs of therapeutic epitopes in the cassette. Cassette sequences can be designed based on a set of distance metrics associated with the cassette's junctions. The distance metric can specify the likelihood that one or more of the junction epitopes spanning between pairs of adjacent epitopes will be presented.

Description

Reduced presentation of conjugated epitopes for new antigens

관련 출원에 대한 상호 참조Cross reference to related applications

본 출원은 2017년 11월 22일자로 출원된 미국 가출원 제62/590,045호의 이익 및 우선권을 주장하며, 이는 그 전문이 본원에 참조로 포함된다.This application claims the interests and priorities of U.S. Provisional Application No. 62/590,045, filed on November 22, 2017, the entirety of which is incorporated herein by reference.

종양-특이적 신생항원에 기초한 치료 백신은 차세대 개인화된 암 면역요법으로 큰 기대를 받고 있다.¹ ^-3 비-소세포 폐암(NSCLC) 및 흑색종과 같은 높은 돌연변이 부하를 가진 암은, 신생항원 생성의 가능성이 상대적으로 높은 것을 고려하면 상기 치료법의 특히 매력적인 표적이다.⁴ ^, ⁵ 조기에 발견된 증거에 따르면 신생항원-기반 백신접종으로 T-세포 반응이 유도될 수 있으며⁶, 신생항원 표적화된 세포-요법은 특정한 상황 하에 선택된 환자에게 종양 퇴화를 유도할 수 있음을 보여준다.⁷ MHC 부류 I 및 MHC 부류 II 모두는 T-세포 반응에 영향을 미친다⁷⁰ ^-71.Therapeutic vaccines based on tumor-specific neoantigens have great promise as the next generation of personalized cancer immunotherapy. Cancers with high mutant loads, such as ¹ ^-3 non-small cell lung cancer (NSCLC) and melanoma, are particularly attractive targets for this therapy given the relatively high potential for neoantigen production. ⁴ ^, ⁵ Early evidence suggests that neoantigen-based vaccination can induce T-cell responses and ⁶ , neoantigen targeted cell-therapy can induce tumor regression in selected patients under specific circumstances. Show. ⁷ MHC class I and MHC class II both affects the T- cell responses ^{^70-71.}

신생항원 백신 디자인에 대한 하나의 질문은, 대상체 종양에 존재하는 많은 암호화 돌연변이 중 어떤 것이 "최상의" 치료 신생항원, 예를 들어 항-종양 면역력을 유도하여 종양 퇴화를 일으킬 수 있는 항원을 생성할 수 있는 것인지 이다. One question about neoantigen vaccine design is that any of the many coding mutations present in a subject's tumor can generate an “best” therapeutic neoantigen, such as an antigen that can induce anti-tumor immunity and cause tumor regression. Is there.

초기의 방법은 차세대 서열분석, RNA 유전자 발현 및 후보 신생항원 펩타이드의 MHC 결합 친화도의 예측을 이용한 돌연변이-기반 분석을 통합하여 제안되었다⁸. 그러나, 상기 제안된 방법은 유전자 발현 및 MHC 결합 이외에도 많은 단계(예를 들어, TAP 수송, 프로테아솜 절단, MHC 결합, 펩타이드-MHC 복합체의 세포 표면으로의 수송, 및/또는 MHC-I에 대한 TCR 인식; 세포내이입 또는 자가 포식, 세포 외 또는 리소좀 프로테아제를 통한 절단 (예를 들어, 카텝신), HLA-DM-촉매된 HLA 결합을 위한 CLIP 펩타이드와의 경쟁, 펩타이드-MHC 복합체의 세포 표면으로의 수송 및/또는 MHC-II에 대한 TCR 인식)를 포함하는 에피토프 생성 프로세스 전체를 모델링하는데 실패할 수 있다.⁹ 결과적으로, 기존의 방법들은 낮은 양성 예측값(PPV) 감소를 겪을 수 있다(도 1a).The initial method was proposed by integrating next-generation sequencing, RNA gene expression and mutation-based analysis using prediction of MHC binding affinity of candidate neoantigen peptides ⁸ . However, the proposed method has many steps in addition to gene expression and MHC binding (e.g., TAP transport, proteasome cleavage, MHC binding, transport of peptide-MHC complexes to the cell surface, and/or for MHC-I). TCR recognition; intracellular or autophagy, cleavage via extracellular or lysosomal proteases (e.g. cathepsin), competition with CLIP peptides for HLA-DM-catalyzed HLA binding, cell surface of peptide-MHC complexes Transport and/or TCR recognition for MHC-II). ^9. As a result, the existing methods may suffer from a low positive predictive value (PPV) decreased (Fig. 1a).

사실상, 여러 그룹에 의해 수행된 종양 세포에 의해 제시된 펩타이드의 분석은, 유전자 발현 및 MHC 결합 친화성을 사용하여, 제시될 것으로 예측되는 펩타이드의 5% 미만이 종양 표면 MHC 상에서 발견될 수 있음을 보여주었다¹⁰ ^,11(도 1b). 결합 예측과 MHC 제시 사이의 이러한 낮은 상관관계는, 돌연변이 단독의 수에 대한 체크포인트 억제제 반응에 대한 결합-제한된 신생항원의 예측 정확도 개선의 최근의 관찰에 의해 더욱 보강되었다.¹² In fact, analysis of peptides presented by tumor cells performed by several groups, using gene expression and MHC binding affinity, shows that less than 5% of the peptides predicted to be present can be found on tumor surface MHCs It gave ¹⁰ ^{and 11} (Fig. 1b). This low correlation between binding prediction and MHC presentation was further reinforced by recent observations of improved prediction accuracy of binding-limited neoantigens to checkpoint inhibitor responses to the number of mutations alone. ¹²

제시를 예측하기 위한 기존 방법의 상기 낮은 양성 예측값(PPV)은 신생항원-기반 백신 설계에 대한 문제점을 제시한다. 낮은 PPV을 갖는 예측을 사용하여 백신을 설계하는 경우, 대부분의 환자는 치료용 신생항원을 접종받지 않을 것이고, (모든 제시된 펩타이드가 면역원성을 갖는다고 가정할지라도) 여전히 하나 이상의 펩타이드를 접종받는 환자는 거의 없다. 따라서 최근의 방법을 이용한 신생항원 백신접종은 종양이 있는 상당한 수의 대상체에서는 성공할 가능성이 낮다. (도 1c)The low positive predictive value (PPV) of the existing method for predicting presentation presents a problem with neoantigen-based vaccine design. When designing vaccines using predictions with low PPV, most patients will not be vaccinated with therapeutic neoantigens, and still receive one or more peptides (even if all the presented peptides are immunogenic) There are very few. Therefore, new antigen vaccination using recent methods is unlikely to succeed in a significant number of subjects with tumors. (Figure 1c)

또한 이전의 접근법은 시스-작용 돌연변이만을 사용하여 후보 신생항원을 생성했으며, 다중 종양 유형에서 발생하고 많은 유전자의 비정상적인 스플라이싱 (splicing)으로 이어지는 스플라이싱 인자의 돌연변이¹³ 및 프로테아제 절단 부위를 생성하거나 제거하는 돌연변이를 포함하는, 신생 ORF의 추가적인 원천은 고려하지 않았다. In addition, the previous approach generated candidate neoantigens using only cis-acting mutations, generating mutation ¹³ and protease cleavage sites of splicing factors that occur in multiple tumor types and lead to abnormal splicing of many genes. Additional sources of emerging ORFs, including or removing mutations, were not considered.

종양 게놈 및 전사체(transcriptome) 해독 분석에 대한 표준 접근법은 라이브러리 구축, 엑솜(exome) 및 전사체 포획, 서열분석 또는 데이터 분석에서의 차선적인 조건으로 인해, 후보 신생항원을 생성시키는 체세포 돌연변이를 놓칠 수 있다. 마찬가지로, 표준 종양 분석 접근법은 신생항원으로써 우연히 서열 인공물 또는 생식 계열 다형성을 각각 촉진시켜, 백신 용량의 비효율적인 사용 또는 자가-면역 위험성을 유도할 수 있다. Standard approaches to tumor genome and transcriptome translation analysis miss missed somatic mutations that produce candidate neoantigens, due to suboptimal conditions in library construction, exome and transcript capture, sequencing or data analysis. Can. Likewise, standard tumor analysis approaches can inadvertently promote sequence artifacts or germline polymorphism, respectively, as neoantigens, leading to inefficient use of vaccine doses or auto-immune risk.

신생항원 백신은 또한 전형적으로 일련의 치료적 에피토프가 차례로 연결되는, 백신 카세트로서 설계된다. 백신 카세트 서열은 인접한 치료적 에피토프 쌍 사이에 링커 서열을 포함하거나 포함하지 않을 수 있다. 카세트 서열은 한 쌍의 치료적 에피토프 사이의 접합에 걸쳐 있는 신규하지만 무관한 에피토프 서열인 접합 에피토프를 야기할 수 있다. 접합 에피토프는 환자의 HLA 부류 I 또는 부류 II 대립유전자에 의해 제시될 가능성을 가지고, 각각, CD8 또는 CD4 T-세포 반응을 자극한다. 이러한 반응은 접합 에피토프에 반응성인 T-세포가 치료적 이점을 갖지 않기 때문에 종종 바람직하지 않으며, 항원 경쟁에 의해 카세트에서 선택된 치료적 에피토프에 대한 면역 반응을 감소시킬 수 있다. Neoantigen vaccines are also typically designed as vaccine cassettes, in which a series of therapeutic epitopes are linked in sequence. The vaccine cassette sequence may or may not contain a linker sequence between adjacent pairs of therapeutic epitopes. Cassette sequences can result in conjugation epitopes, which are novel but unrelated epitope sequences spanning the conjugation between a pair of therapeutic epitopes. The conjugation epitope has the potential to be presented by the patient's HLA class I or class II allele, stimulating the CD8 or CD4 T-cell response, respectively. This reaction is often undesirable because T-cells that are responsive to the conjugation epitope do not have a therapeutic advantage, and may reduce the immune response to the therapeutic epitope selected in the cassette by antigen competition.

본 명세서에서는 개인화된 암 백신에 대한 신생항원을 동정 및 선별하기 위한 최적화된 접근법이 개시되어 있다. An optimized approach for identifying and screening neoantigens for personalized cancer vaccines is disclosed herein.

첫째, 차세대 서열분석(NGS)을 이용한 신생항원 동정을 위해 최적화된 종양 엑솜 및 전사체 분석 접근법을 다룬다. 이들 방법은 NGS 종양 분석을 위한 표준 접근법을 기반으로 하여, 모든 부류의 게놈 변형에 대해 신생항원 후보가 최고의 민감도와 특이성을 갖도록 한다. 둘째, 특이성 문제를 극복하고, 백신 내포물(vaccine inclusion)을 위해 개발된 신생항원이 항-종양 면역력을 유도할 가능성이 높은 것을 보장하기 위해, 고-PPV 신생항원 선택을 위한 신규한 접근법이 제시된다. 이들 접근법은 구현예에 따라, 펩타이드-대립유전자 맵핑 뿐만 아니라 다수의 길이를 갖는 펩타이드에 대한 과-대립유전자(과-allele) 모티프를 공동으로 모델링하고, 상이한 길이의 펩타이드에 걸쳐 통계적인 강도를 공유하는 숙련된 통계적 회귀 또는 비선형 심층 학습 모델을 포함한다. 비선형 심층 학습 모델은 특히 독립적인 동일한 세포에서 상이한 MHC 대립유전자를 치료하도록 설계되고 숙련될 수 있으므로, 서로 간섭하는 선형 모델의 문제를 해결할 수 있다. 마지막으로, 신생항원을 기반으로 한 개인별 백신 설계 및 제조에 대한 추가의 고려 사항들이 다루어진다. First, it deals with an approach to tumor exome and transcriptome analysis optimized for the identification of new antigens using next-generation sequencing (NGS). These methods are based on a standard approach for NGS tumor analysis, ensuring that neoantigen candidates have the highest sensitivity and specificity for all classes of genomic modifications. Second, to overcome the specificity problem and to ensure that neoantigens developed for vaccine inclusion are highly likely to induce anti-tumor immunity, a novel approach for selecting high-PPV neoantigens is proposed. . These approaches jointly model peptide-allele mapping, as well as over-allele (over-allele) motifs for peptides with multiple lengths, and share statistical strength across peptides of different lengths. This includes skilled statistical regression or nonlinear deep learning models. Nonlinear deep learning models can be designed and trained to treat different MHC alleles, especially in the same independent cells, thus solving the problem of linear models interfering with each other. Finally, additional considerations for individual vaccine design and manufacture based on new antigens are addressed.

치료적 에피토프 세트를 고려할 때, 카세트 서열은 접합 에피토프가 환자에서 제시될 가능성을 감소시키도록 설계된다. 카세트 서열은 카세트의 치료적 에피토프　쌍 사이의 접합부에 걸쳐 있는 접합 에피토프의 제시를 고려하여　설계된다. 하나의 양태에서, 카세트 서열은 카세트의 접합과 각각 관련된 거리 매트릭 세트에 기초하여 설계된다.　거리 매트릭은 한 쌍의 인접한 에피토프 사이에 걸쳐 있는 하나 이상의 접합 에피토프가 제시될 가능성을 특정할 수 있다.　하나의 양태에서, 하나 이상의 후보 카세트 서열은 치료적 에피토프 세트가 연결된 순서를 무작위로 치환함으로써 생성되고, 미리 결정된 임계치 미만의 제시 스코어 (예를 들면, 거리 매트릭의 합)을 갖는 카세트 서열이 선택된다.　다른 양태에서, 치료적 에피토프는 노드로서 모델링되고, 인접한 한 쌍의 에피토프에 대한 거리 매트릭은 상응하는 노드 사이의 거리를 나타낸다.　미리 결정된 임계치 미만으로 정확히 1회　각각의 치료적 에피토프를 "방문"하는 총 거리를 초래하는 카세트 서열이 선택된다.When considering a set of therapeutic epitopes, the cassette sequence is designed to reduce the likelihood that the conjugated epitope will be presented in the patient. Cassette sequences are designed to account for the presentation of a conjugation epitope spanning the junction between the therapeutic epitope pairs of cassettes. In one embodiment, the cassette sequence is designed based on a set of distance metrics each associated with the conjugation of the cassette. The distance metric can specify the likelihood that one or more conjugated epitopes spanning a pair of adjacent epitopes will be presented. In one embodiment, one or more candidate cassette sequences are generated by randomly substituting the order in which the sets of therapeutic epitopes are linked, and cassette sequences having a presentation score below a predetermined threshold (eg, the sum of distance metrics) are selected. . In another aspect, the therapeutic epitope is modeled as a node, and the distance metric for a pair of adjacent epitopes represents the distance between corresponding nodes. Cassette sequences are selected that result in a total distance of “visiting” each therapeutic epitope exactly once below a predetermined threshold.

본 발명의 이들 및 다른 특징, 양태 및 이점은 다음의 설명 및 첨부된 도면과 관련하여 더 잘 이해될 것이다:
도면(도) 1a는 신생항원 동정에 대한 최근의 임상적 접근법을 도시한다.
도 1b는 예측된 결합 펩타이드의 5% 미만이 종양 세포 상에 존재함을 나타낸다.
도 1c는 신생항원 예측 특이성 문제의 영향을 나타낸다.
도 1d는 결합 예측이 신생항원 동정에 충분하지 않음을 나타낸다.
도 1e는 펩타이드 길이의 함수로서 MHC-I 제시의 확률을 나타낸다.
도 1f는 프로메가(Promega)의 동적 범위 표준으로부터 생성된 예시적인 펩타이드 스펙트럼을 도시한다.
도 1g는 특징의 추가가 어떻게 모델 양성 예측 값을 증가시키는 지를 나타낸다.
도 2a는 일 구현예에 따라, 환자에서 펩타이드 제시의 가능성(likelihood)을 확인하기 위한 환경의 개요이다.
도 2b 및 2c는 일 구현예에 따른, 제시 정보를 획득하는 방법을 설명한다.
도 3은 일 구현예에 따른, 제시 확인 시스템의 컴퓨터 로직 성분을 나타내는 고-수준 블록 선도이다.
도 4는 일 구현예에 따른 훈련 데이터의 예시적인 세트를 설명한다.
도 5는 MHC 대립유전자와 관련된 예시적인 네트워크 모델을 설명한다.
도 6a는 일 구현예에 따라 MHC 대립유전자에 의해 공유된 예시적인 네트워크 모델 NNH(·)을 설명한다. 도 6b는 다른 구현예에 따라 MHC 대립유전자에 의해 공유된 예시적인 네트워크 모델 NN_H (·)을 설명한다.
도 7은 예시적인 네트워크 모델을 사용하여 MHC 대립유전자와 관련하여 펩타이드에 대한 제시 가능성을 생성하는 것을 설명한다.
도 8은 예시적인 네트워크 모델들을 사용하여 MHC 대립유전자와 관련하여 펩타이드에 대한 제시 가능성을 생성하는 것을 설명한다.
도 9는 예시적인 네트워크 모델들을 사용하여 MHC 대립유전자와 관련하여 펩타이드에 대한 제시 가능성을 생성하는 것을 설명한다.
도 10은 예시적인 네트워크 모델들을 사용하여 MHC 대립유전자와 관련하여 펩타이드에 대한 제시 가능성을 생성하는 것을 설명한다.
도 11은 예시적인 네트워크 모델들을 사용하여 MHC 대립유전자와 관련하여 펩타이드에 대한 제시 가능성을 생성하는 것을 설명한다.
도 12는 예시적인 네트워크 모델들을 사용하여 MHC 대립유전자와 관련된 펩타이드에 대한 제시 가능성을 생성하는 것을 설명한다.
도 13은 2개의 예시적인 카세트 서열에 대한 거리 메트릭을 결정하는 것을 설명한다.
도 14는 도 1 및 3에 나타낸 개체들을 구현하기 위한 예시 컴퓨터를 설명한다.These and other features, aspects and advantages of the invention will be better understood with reference to the following description and accompanying drawings:
Figure 1A shows a recent clinical approach to the identification of new antigens.
1B shows that less than 5% of the predicted binding peptide is present on tumor cells.
1C shows the effect of the neoantigen prediction specificity problem.
1D shows that binding prediction is not sufficient to identify new antigens.
1E shows the probability of MHC-I presentation as a function of peptide length.
1F depicts an exemplary peptide spectrum generated from Promega's dynamic range standards.
1G shows how the addition of features increases the model positive predictive value.
2A is a schematic of an environment for confirming the likelihood of peptide presentation in a patient, according to one embodiment.
2B and 2C illustrate a method of obtaining presentation information, according to one embodiment.
3 is a high-level block diagram showing computer logic components of a presentation verification system, according to one embodiment.
4 describes an exemplary set of training data according to one implementation.
5 describes an exemplary network model associated with the MHC allele.
6A illustrates an exemplary network model NNH (·) shared by the MHC allele according to one embodiment. 6B illustrates an exemplary network model NN _H (·) shared by the MHC allele according to another embodiment.
7 demonstrates using the exemplary network model to generate the potential for presentation to the peptide in relation to the MHC allele.
FIG. 8 demonstrates creating exemplary presentation possibilities for peptides in relation to the MHC allele using exemplary network models.
9 demonstrates using exemplary network models to generate the potential for presentation to the peptide in relation to the MHC allele.
FIG. 10 demonstrates generating exemplary presentation possibilities for peptides in relation to the MHC allele using exemplary network models.
FIG. 11 demonstrates generating exemplary presentation possibilities for peptides in relation to the MHC allele using exemplary network models.
FIG. 12 demonstrates using exemplary network models to generate the potential for presentation of peptides associated with the MHC allele.
13 describes determining distance metrics for two exemplary cassette sequences.
14 illustrates an example computer for implementing the entities shown in FIGS. 1 and 3.

발명의 상세한 설명Detailed description of the invention

Ⅰ. 정의Ⅰ. Justice

일반적으로, 청구범위 및 명세서에서 사용된 용어는 당해 분야의 숙련가가 이해하는 명백한 의미를 갖는 것으로 해석되도록 의도된다. 명확한 추가 설명을 제공하기 위해 특정한 용어가 아래에 정의된다. 명백한 의미와 제공된 정의가 상충하는 경우, 제공된 정의가 사용되어야 한다. In general, the terms used in the claims and specification are intended to be interpreted as having a clear meaning understood by those skilled in the art. Specific terms are defined below to provide further clarification. In the event of a conflict between the apparent meaning and the definition provided, the definition provided should be used.

본 명세서에서 사용된 용어 "항원"은 면역 반응을 유도하는 물질이다. The term "antigen" as used herein is a substance that induces an immune response.

본 명세서에서 사용된 용어 "신생항원(neoantigen)"은 예를 들어, 종양 세포에서의 돌연변이 또는 종양 세포에 특이적인 번역후 변형을 통해 상응하는 야생형, 모(parental) 항원과 구별되게 하는 적어도 하나의 변경을 갖는 항원이다. 신생항원은 폴리펩타이드 서열 또는 뉴클레오타이드 서열을 포함할 수 있다. 돌연변이는 프레임 이동 또는 비-격자 이동 인델(indel), 미스센스(missense) 또는 논센스 (nonsense) 치환, 스플라이스 부위 변경, 게놈 재배열 또는 유전자 융합, 또는 신생 ORF를 야기하는 임의의 게놈 또는 발현 변경을 포함할 수 있다. 돌연변이는 스플라이스 변이(splice variant)도 포함할 수 있다. 종양 세포에 특이적인 번역후 변형은 비정상적인 인산화를 포함할 수 있다. 종양 세포에 특이적인 번역후 변형은 또한 프로테아솜-생성된 스플라이싱된 항원을 포함할 수 있다. Liepe 등, HLA 부류 I 리간드의 많은 부분은 프로테아솜-생성된 스플라이싱된 펩타이드이다; Science. 2016 Oct 21; 354(6310): 354-358를 참고하라.As used herein, the term "neoantigen" is at least one that distinguishes it from the corresponding wild-type, parental antigen, for example through mutation in tumor cells or post-translational modifications specific to tumor cells. It is an antigen with alterations. The neoantigen can include a polypeptide sequence or a nucleotide sequence. Mutations can be frame shifted or non-lattice shifted indels, missense or nonsense substitutions, splice site alterations, genome rearrangements or gene fusions, or any genome or expression that results in a new ORF May include changes. Mutations can also include splice variants. Post-translational modifications specific to tumor cells may include abnormal phosphorylation. Post-translational modifications specific to tumor cells may also include proteasome-generated spliced antigens. Many of the HLA class I ligands, such as Liepe et al., are proteasome-generated spliced peptides; Science. 2016 Oct 21; 354(6310): 354-358.

본 명세서에서 사용된 용어 "종양 신생항원(tumor neoantigen)"은, 대상체의 종양 세포 또는 조직에는 존재하지만 대상체의 상응하는 정상 세포 또는 조직에는 존재하지 않는 신생항원이다. As used herein, the term “tumor neoantigen” is a neoantigen present in a subject's tumor cells or tissues but not in the subject's corresponding normal cells or tissues.

본 명세서에서 사용된 용어 "신생항원-기반 백신(neoantigen-based vaccine)"은, 하나 이상의 신생항원, 예컨대 복수의 신생항원에 기반을 둔 백신 구조물이다. As used herein, the term "neoantigen-based vaccine" is a vaccine construct based on one or more neoantigens, such as multiple neoantigens.

본 명세서에서 사용된 용어 "후보 신생항원(candidate neoantigen)"은, 신생항원을 나타낼 수 있는 신규한 서열을 생성하는 돌연변이 또는 다른 비정상이다. As used herein, the term “candidate neoantigen” is a mutation or other abnormality that produces a novel sequence that can represent a neoantigen.

본 명세서에서 사용된 용어 "암호화 영역(coding region)"은, 단백질을 암호화하는 유전자의 부분(들)이다. As used herein, the term "coding region" is the portion(s) of the gene encoding the protein.

본 명세서에서 사용된 용어 "암호화 돌연변이(coding mutation)"는, 암호화 영역에서 발생하는 돌연변이이다. The term "coding mutation" as used herein is a mutation that occurs in the coding region.

본 명세서에서 사용된 용어 "ORF"는, 열린 해독틀(open reading frame)을 의미한다. As used herein, the term "ORF" means an open reading frame.

본 명세서에서 사용된 용어 "신생 ORF (NEO-ORF)"는, 돌연변이 또는 다른 비정상, 예컨대 스플라이싱으로부터 발생하는 종양-특이적 ORF이다. As used herein, the term “neonatal ORF (NEO-ORF)” is a tumor-specific ORF arising from mutations or other abnormalities such as splicing.

본 명세서에서 사용된 용어 "미스센스 돌연변이"는, 한 아미노산에서 또다른 아미노산으로의 치환을 일으키는 돌연변이이다. As used herein, the term "missense mutation" is a mutation that causes a substitution from one amino acid to another.

본 명세서에서 사용된 용어 "논센스 돌연변이"는, 아미노산에서 정지 코돈으로의 치환을 일으키는 돌연변이이다. As used herein, the term "nonsense mutation" is a mutation that results in a substitution from an amino acid to a stop codon.

본 명세서에서 사용된 용어 "격자 이동 돌연변이(frameshift mutation)"는, 단백질의 프레임에서 변화를 일으키는 돌연변이이다. The term "frameshift mutation" as used herein is a mutation that causes a change in the frame of a protein.

본 명세서에서 사용된 용어 "인델(indel)"은, 하나 이상의 핵산의 삽입 또는 결실이다. The term “indel” as used herein is the insertion or deletion of one or more nucleic acids.

본 명세서에서 사용된 2종 이상의 핵산 또는 폴리펩타이드 서열의 문맥에서의 용어 "동일성(identity)"은, (예를 들어, BLASTP 및 BLASTN 또는 숙련된 기술자가 이용할 수 있는 다른 알고리즘)에 의한 서열 비교 알고리즘 또는 육안 검사에 중 하나를 사용하여 측정된 바와 같이, 최대 관련성을 위해 비교 및 정렬된 경우의 동일한 뉴클레오타이드 또는 아미노산 잔기의 지정된 백분율을 갖는 2종 이상의 서열 또는 하위서열을 지칭한다. 응용예에 따라, 퍼센트 "동일성"은 비교되는 서열의 영역, 예를 들어 기능적 도메인 상에 존재할 수도 있고, 또는 비교될 두 서열의 전장(full lenght)에 존재할 수도 있다. As used herein, the term “identity” in the context of two or more nucleic acid or polypeptide sequences is a sequence comparison algorithm by (eg, BLASTP and BLASTN or other algorithms available to skilled technicians). Or two or more sequences or subsequences having a specified percentage of identical nucleotide or amino acid residues when compared and aligned for maximum relevance, as measured using one of the visual inspections. Depending on the application, the percent "identity" may be on the region of the sequence being compared, eg, a functional domain, or it may be on the full lenght of two sequences to be compared.

서열 비교를 위해, 통상 하나의 서열은 시험 서열이 비교되는 참조 서열로서 작용한다. 서열 비교 알고리즘을 사용할 때, 시험 서열과 참조 서열이 컴퓨터에 입력되고, 필요하다면 하위서열 좌표가 지정되며, 서열 알고리즘 프로그램 파라미터가 지정된다. 이어서, 서열 비교 알고리즘은 지정된 프로그램 파라미터에 기초하여, 참조 서열에 비교한 시험 서열(들)의 서열 동일성 백분율을 계산한다. 대안적으로, 서열 유사성 또는 비유사성은 특정 뉴클레오타이드들, 또는 번역된 서열에 대해서는 선택된 서열 위치(예를 들어, 서열 모티프)의 아미노산의 조합된 존재 또는 부재에 의해 확립될 수 있다. For sequence comparison, usually one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity of the test sequence(s) compared to the reference sequence based on the designated program parameters. Alternatively, sequence similarity or dissimilarity can be established by the combined presence or absence of amino acids at selected sequence positions (eg, sequence motifs) for specific nucleotides, or translated sequences.

비교를 위한 서열의 최적 정렬은, 예를 들어 Smith & Waterman의 국부 상동성 알고리즘 [Adv. Appl. Math. 2: 482(1981)]에 의해, Needleman & Wunsch, J.의 상동성 정렬 알고리즘 [Mol. Biol. 48: 443 (1970)]에 의해, Pearson & Lipman의 유사성 방법 연구 [Proc. Nat'l. Acad. Sci. USA 85: 2444 (1988)]에 의해, 이들 알고리즘 [위스콘신 유전학 소프트웨어 패키지의 GAP, BESTFIT, FASTA, 및 TFASTA (유전학 컴퓨터 그룹, 575 Science Dr., 매디슨, 위스콘신)]의 컴퓨터화된 실행에 의해 또는 육안 검사(일반적으로 Ausubel 등, 아래 참조)에 의해 진행될 수 있다. Optimal alignment of sequences for comparison can be performed, for example, by Smith & Waterman's local homology algorithm [Adv. Appl. Math. 2: 482 (1981), Needleman & Wunsch, J. Homology alignment algorithm [Mol. Biol. 48: 443 (1970)], a study of the similarity method of Pearson & Lipman [Proc. Nat'l. Acad. Sci. USA 85: 2444 (1988)], by computerized execution of these algorithms [GAP, BESTFIT, FASTA, and TFASTA of the Wisconsin Genetics Software Package (Genetics Computer Group, 575 Science Dr., Madison, Wisconsin)] or This can be done by visual inspection (usually Ausubel et al., see below).

퍼센트 서열 동일성 및 서열 유사성을 결정하기에 적합한 알고리즘의 한 예는 BLAST 알고리즘이며, 이는 Altschul 등, J. Mol. Biol. 215: 403-410(1990)에 기술되어 있다. BLAST 분석을 수행하는 소프트웨어는 국립 생명 공학 정보 센터(National Center for Biotechnology Information)를 통해 공공연하게 이용가능하다.One example of an algorithm suitable for determining percent sequence identity and sequence similarity is the BLAST algorithm, which is Altschul et al., J. Mol. Biol. 215: 403-410 (1990). Software for performing BLAST analysis is publicly available through the National Center for Biotechnology Information.

본 명세서에서 사용된 용어 "비-정지 또는 연속-판독(non-stop or read-through)"은, 원래의 정지 코돈의 제거를 일으키는 돌연변이이다. As used herein, the term "non-stop or read-through" is a mutation that causes the removal of the original stop codon.

본 명세서에서 사용된 용어 "에피토프(epitope)"는, 항체 또는 T-세포 수용체가 통상 결합하는 항원의 특이적인 부분이다. As used herein, the term "epitope" is a specific part of an antigen to which an antibody or T-cell receptor binds normally.

본 명세서에서 사용된 용어 "면역원성(immunogenic)"은, 예를 들어, T 세포, B 세포 또는 둘 모두를 통해 면역 반응을 유도할 수 있는 능력이다. The term “immunogenic” as used herein is the ability to induce an immune response, eg, through T cells, B cells or both.

본 명세서에서 사용된 용어 "HLA 결합 친화성(HLA binding affinity)" "MHC 결합 친화성(MHC binding affinity)"은, 특이적인 항원과 특이적인 MHC 대립유전자 사이의 결합 친화성을 의미한다. As used herein, the term "HLA binding affinity" "MHC binding affinity" refers to the binding affinity between a specific antigen and a specific MHC allele.

본 명세서에서 사용된 용어 "유인물질(bait)"은, 샘플로부터 DNA 또는 RNA의 특이적 서열을 풍부하게 하는데 사용되는 핵산 프로브이다. The term “bait” as used herein is a nucleic acid probe used to enrich a specific sequence of DNA or RNA from a sample.

본 명세서에서 사용된 용어 "변이(variant)"는, 대상체의 핵산과 대조군으로 사용되는 참조 인간 게놈 간의 차이다. As used herein, the term “variant” is the difference between the nucleic acid of a subject and the reference human genome used as a control.

본 명세서에서 사용된 용어 "변이 결정(variant call)"은, 통상 서열분석으로부터 변이의 존재를 알고리즘적으로 결정하는 것이다. As used herein, the term "variant call" is to algorithmically determine the existence of a variation, usually from sequencing.

본 명세서에서 사용된 용어 "다형성(polymorphism)"은, 생식 계열 변이, 즉 개체의 모든 DNA-보유 세포에서 발견되는 변이이다. As used herein, the term "polymorphism" is a germline variant, ie a variant found in all DNA-bearing cells of an individual.

본 명세서에서 사용된 용어 "체세포 변이(somatic variant)"는, 개체의 비-생식 계열 세포에서 발생하는 변이이다. As used herein, the term "somatic variant" is a variant that occurs in a non-reproductive lineage cell of an individual.

본 명세서에서 사용된 용어 "대립유전자(allele)"는, 한 버전의 유전자 또는 한 버전의 유전자 서열 또는 한 버전의 단백질이다. The term "allele" as used herein is a version of a gene or a version of a gene sequence or a version of a protein.

본 명세서에서 사용된 용어 "HLA 유형(HLA type)"은, HLA 유전자 대립유전자의 보완물이다. The term "HLA type" as used herein is a complement to the HLA gene allele.

본 명세서에서 사용된 용어 "논센스-매개된 붕괴(nonsense-medicated decay)" 또는 "NMD"는, 조기 중단 코돈으로 인해 세포가 mRNA를 분해하는 것이다.As used herein, the terms "nonsense-medicated decay" or "NMD" are those in which cells degrade mRNA due to premature stop codons.

본 명세서에서 사용된 용어 "몸통 돌연변이(truncal mutation)"는, 종양의 발달 초기에 발생하고, 종양 세포의 상당 부분에 존재하는 돌연변이이다. As used herein, the term "truncal mutation" is a mutation that occurs early in the development of a tumor and is present in a significant portion of tumor cells.

본 명세서에서 사용된 용어 "서브클로날 돌연변이(subclonal mutation)"는 종양의 발생에서 후기에 발생하고, 종양 세포의 서브셋에만 존재하는 돌연변이이다. The term “subclonal mutation” as used herein is a mutation that occurs late in the development of a tumor and is present only in a subset of tumor cells.

본 명세서에서 사용된 용어 "엑솜(exome)"은, 단백질을 암호화하는 게놈의 서브셋이다. 엑솜은 게놈의 전체적인 엑솜일 수 있다. As used herein, the term “exome” is a subset of the genome encoding the protein. The exome may be the whole exome of the genome.

본 명세서에서 사용된 용어 "로지스틱 회귀(logistic regression)"는, 통계로부터의 2원 데이터에 대한 회귀 모델인데, 여기서 종속 변수가 1과 같을 확률의 로짓(logit)은 종속 변수의 선형 함수로서 모델링된다. The term "logistic regression" as used herein is a regression model for binary data from statistics, where the logit of the probability that the dependent variable is equal to 1 is modeled as a linear function of the dependent variable. .

본 명세서에서 사용된 용어 "신경망(neural network)"은, 확률적 구배 강하 및 역-전파를 통해 통상 훈련된 요소별 비선형성이 뒤따르는 선형 변환의 다중 층으로 구성된 분류 또는 회귀에 대한 기계 학습 모델이다. The term "neural network," as used herein, is a machine learning model for classification or regression consisting of multiple layers of linear transformation followed by non-linearity by element, typically trained through stochastic gradient descent and back-propagation. to be.

본 명세서에서 사용된 용어 "단백체(proteome)"는, 세포, 세포 그룹 또는 개인에 의해 발현 및/또는 번역되는 모든 단백질들의 세트이다. The term “proteome” as used herein is a set of all proteins expressed and/or translated by a cell, cell group or individual.

본 명세서에서 사용된 용어 "펩타이돔(peptidome)"은, MHC-I 또는 MHC-Ⅱ에 의해 세포 표면 상에 제시되는 모든 펩타이드들의 세트이다. 펩타이돔은 세포의 특성 또는 세포 집단을 지칭할 수 있다(예를 들어, 종양 펩타이돔은 종양을 포함하는 모든 세포의 펩타이돔의 합체를 의미함). The term "peptidome" as used herein is a set of all peptides presented on the cell surface by MHC-I or MHC-II. Peptidedom can refer to the characteristics of a cell or population of cells (eg, a tumor peptoidome refers to the coalescence of peptidomes of all cells, including tumors).

본 명세서에서 사용된 용어 "ELISPOT"은, 인간 및 동물에서 면역 반응을 모니터링하는 일반적인 방법인 효소-결합 면역흡착 스폿 분석(Enzyme-linked immunosorbent sopt assay)을 의미한다. The term “ELISPOT” as used herein refers to an enzyme-linked immunosorbent sopt assay, a common method for monitoring immune responses in humans and animals.

본 명세서에서 사용된 용어 "덱스트라머(dextramer)"는, 유동 세포계측법에서 항원-특이적 T-세포 염색에 사용되는 덱스트란-기반 펩타이드-MHC 다합체이다. The term "dextramer" as used herein is a dextran-based peptide-MHC multimer used for antigen-specific T-cell staining in flow cytometry.

본 명세서에서 사용된 용어 "내성(tolerance) 또는 면역 내성(immune tolerance)"은, 하나 이상의 항원, 예를 들어 자기-항원에 대한 면역 비-반응성 상태이다. As used herein, the term "tolerance or immune tolerance" is a non-immune state of immunity to one or more antigens, such as self-antigens.

본 명세서에서 사용된 용어 "중심 내성(central tolerance)"은, 자기-반응성 T-세포 클론을 결실시키거나 자기-반응성 T-세포 클론을 면역억제성 조절 T-세포(Tregs)로 분화하는 것을 촉진시킴으로써, 흉선에서 영향을 받는 내성이다. The term “central tolerance” as used herein facilitates the deletion of self-reactive T-cell clones or the differentiation of self-reactive T-cell clones into immunosuppressive regulatory T-cells (Tregs). By doing so, it is resistance that is affected by the thymus.

본 명세서에서 사용된 용어 "말초 내성(peripheral tolerance)"은, 중심 내성을 견뎌내거나 T-세포가 Tregs로 분화되도록 촉진하는 자기 반응성 T-세포를 하향 조절하거나 또는 애네르기화(anergizing)시킴으로써, 말초에서 영향을 받는 내성이다. The term “peripheral tolerance” as used herein refers to peripheral, by downregulating or anerizing autoreactive T-cells that withstand central resistance or promote T-cells to differentiate into Tregs. It is tolerance affected by.

용어 "샘플"은, 정맥천자, 배설, 사정(ejaculation), 마사지, 생검, 침상흡인(needle aspirate), 세척 샘플, 스크래핑(scraping), 외과적 절개 또는 개입 또는 당해 분야에 공지된 다른 수단을 포함하는 수단에 의해 대상체에서 채취한 단일 세포 또는 다중 세포 또는 세포 단편 또는 체액의 분취액을 포함할 수 있다. The term “sample” includes venipuncture, excretion, ejaculation, massage, biopsy, needle aspirate, wash sample, scraping, surgical incision or intervention or other means known in the art. It may include an aliquot of a single cell or multiple cells or cell fragments or body fluids collected from a subject by means.

용어 "대상체(subject)"는, 생체내, 생체외 또는 시험관내, 남성 또는 여성에 관계없이, 세포, 조직 또는 유기체, 인간 또는 비-인간을 포함한다. 용어 대상체는 인간을 포함한 포괄적인 포유동물이다. The term “subject” includes cells, tissues or organisms, human or non-human, whether in vivo, ex vivo or in vitro, male or female. The term subject is a comprehensive mammal, including humans.

용어 "포유동물"은, 인간과 비-인간을 포함하며, 인간, 비-인간 영장류, 개과, 고양이과, 쥐과, 소, 말 및 돼지를 포함하지만, 이에 한정되지는 않는다. The term “mammal” includes human and non-human, and includes, but is not limited to, human, non-human primate, canine, feline, murine, cow, horse, and pig.

용어 "임상 인자(clinical factor)"는, 대상체의 상태, 예를 들어 질병 활성도 또는 중증도의 척도를 지칭한다. "임상 인자"는 비-샘플 마커, 및/또는 연령 및 성별과 같은 대상체의 다른 특성을 포함하나 이에 제한되지 않는 대상체의 건강 상태의 모든 마커를 포함한다. 임상 인자는 대상체 또는 결정된 조건 하에서의 대상체로부터의 샘플(또는 샘플 모집단)의 평가로부터 얻을 수 있는 점수, 값 또는 일련의 값일 수 있다. 임상 인자는 또한 마커 및/또는 다른 파라미터, 예컨대 유전자 발현 대리체에 의해 예상될 수 있다. 임상 인자에는 종양 유형, 종양 하위유형 및 흡연 이력이 포함될 수 있다. The term “clinical factor” refers to a measure of the subject's condition, eg disease activity or severity. “Clinical factor” includes all markers of a subject's health status including, but not limited to, non-sample markers and/or other characteristics of the subject, such as age and gender. The clinical factor may be a score, value or series of values obtainable from evaluation of a sample (or sample population) from a subject or subject under determined conditions. Clinical factors can also be expected by markers and/or other parameters, such as gene expression surrogates. Clinical factors can include tumor type, tumor subtype, and history of smoking.

약어: Abbreviation:

MHC: 주조직적합성 복합체; HLA: 인간 백혈구 항원, 또는 인간 MHC 유전자 좌위; NGS: 차세대 서열분석; PPV: 양성 예측값; TSNA: 종양-특이적 신생항원; FFPE: 포르말린-고정된 파라핀-포매; NMD: 논센스-매개된 붕괴; NSCLC: 비-소세포 폐암; DC: 수지상 세포.MHC: major histocompatibility complex; HLA: human leukocyte antigen, or human MHC locus; NGS: next generation sequencing; PPV: positive predictive value; TSNA: tumor-specific neoantigen; FFPE: formalin-fixed paraffin-embedded; NMD: nonsense-mediated collapse; NSCLC: non-small cell lung cancer; DC: dendritic cells.

명세서 및 첨부된 청구범위에서 사용된 바와 같이, 단수 형태는 문맥 상 다르게 명확히 지시하지 않는 한 복수의 지시대상을 포함한다는 것을 알아야 한다. As used in the specification and the appended claims, it should be understood that the singular form includes a plurality of indications unless the context clearly indicates otherwise.

본 명세서에서 직접 정의되지 않은 임의의 용어는 본 발명의 당해 분야 내에서 이해되는 바와 같이 통상적으로 관련된 의미를 갖는 것으로 이해되어야 한다. 특정한 용어들은 본 발명의 양태의 조성물, 디바이스, 방법 등, 및 이들을 제조하거나 사용하는 방법을 기술할 때 종사자에게 추가적인 지침을 제공하기 위해 본원에 논의된다. 동일한 것을 여러 가지 방법으로 언급할 수 있음이 인정될 것이다. 결과적으로 본원에 언급된 하나 이상의 용어들에 대안적인 언어 및 동의어가 사용될 수 있다. 용어가 본원에서 정교화되거나 논의되는지의 여부는 중요하지 않다. 일부 동의어 또는 대체가능한 방법, 물질 등이 제공된다. 하나 또는 몇개의 동의어 또는 동등한 표현의 설명은 명백하게 언급하지 않는 한 다른 동의어 또는 동등한 표현의 사용을 배제하지 않는다. 용어들의 예를 포함하는 예들의 사용은 단지 설명하기 위한 것이며, 본 발명의 양태의 범주 및 의미를 제한하지 않는다. It should be understood that any term not directly defined herein has a commonly associated meaning as understood within the art of the present invention. Certain terms are discussed herein to provide additional guidance to practitioners when describing the compositions, devices, methods, etc. of aspects of the invention, and methods of making or using them. It will be appreciated that the same can be mentioned in several ways. Consequently, alternative languages and synonyms can be used for one or more of the terms mentioned herein. It does not matter whether the term is elaborated or discussed herein. Some synonyms or alternative methods, materials, and the like are provided. The description of one or several synonyms or equivalent expressions does not exclude the use of other synonyms or equivalent expressions unless explicitly stated. The use of examples, including examples of terms, is for illustrative purposes only, and does not limit the scope and meaning of aspects of the invention.

명세서 전체에 인용된 모든 참고문헌, 발행된 특허 및 특허 출원은 모든 목적을 위해 그 전문이 본 명세서에 참고로 포함된다. All references, issued patents and patent applications cited throughout the specification are hereby incorporated by reference in their entirety for all purposes.

Ⅱ. 접합 에피토프 제시를 감소시키는　방법Ⅱ. How to reduce the presentation of junction epitopes

신생항원 백신에 대한 카세트 서열을 동정하는 방법이 본원에 개시된다. 예시로서, 하나의 이러한 방법은　하기 단계를 포함할 수 있다: 환자에 대하여, 대상체의 종양 세포 및 정상 세포로부터의 엑솜, 전사체 또는 전체 게놈 종양 뉴클레오티드 서열분석 데이터 중 적어도 하나를 얻는 단계로서, 뉴클레오티드 서열분석 데이터는 종양 세포로부터의 뉴클레오티드 서열분석 데이터와 정상 세포로부터의 뉴클레오티드 서열분석 데이터를 비교함으로써 동정된 신생항원 세트 각각의 펩타이드 서열을 나타내는 데이터를 얻는데 사용되며, 각각의 신생항원의 펩타이드 서열은 대상체의 정상 세포로부터 동정된 상응하는 야생형, 모 펩타이드 서열과 구별되게 하는 적어도 하나의 변경을 포함하고 펩타이드 서열을 구성하는 복수의 아미노산 및 펩타이드 서열에서 아미노산의 위치 세트에 관한 정보를 포함하는 단계;　컴퓨터 프로세서를 사용하여 신생항원의 펩타이드 서열을 기계-학습 제시 모델에 입력하여 신생항원 세트에 대한 일련의 수치상 제시 가능성을 생성하는 단계로서, 세트에서 각각의 제시 가능성은 대상체의 종양 세포 표면상의 하나 이상의 MHC 대립유전자에 의해 상응하는 신생항원이 제시될 가능성을 나타내는 단계. 기계 학습 제시　모델은 적어도 훈련 데이터 세트에 기초하여 동정된 복수의 파라미터를 포함한다.　훈련 데이터 세트는　샘플 세트 내의 각 샘플에 대하여, 샘플에 존재하는 것으로 동정된 MHC 대립유전자 세트에서 적어도 하나의 MHC 대립유전자에 결합된 펩타이드의 존재를 측정하는 질량 분광분석에 의해 얻은 라벨;　각각의 샘플에 대하여, 훈련 펩타이드 서열을 구성하는 복수의 아미노산 및 훈련 펩타이드 서열에서의 아미노산 위치 세트에 관한 정보를 포함하는 훈련 펩타이드 서열;　및　입력으로서 수신된 신생항원의 펩타이드 서열과 출력으로서 생성된 제시 가능성 사이의 관계를 나타내는 함수를 포함한다. 상기 방법은　하기의 단계를 추가로 포함할 수 있다: 대상체에 대하여, 신생항원 세트로부터 신생항원의 치료 서브셋을 동정하는 단계, 미리 결정된 임계치 이상의 제시 가능성을 갖는 미리 결정된 개수의 신생항원에 상응하는 신생항원의 치료 서브셋을 동정하는 단계;　및　대상체에 대하여, 신생항원의 치료 서브셋에서　상응하는　신생항원의 펩타이드 서열을 포함하는 일련의 연결된 치료적 에피토프를 포함하는 카세트 서열을　동정하는 단계로서, 카세트 서열은 하나 이상의 인접한 치료적 에피토프 쌍 사이의 상응하는 접합부에 걸쳐 있는 하나 이상의 접합 에피토프의 제시에 기초하여 동정되는 단계.Disclosed herein are methods of identifying cassette sequences for neoantigen vaccines. As an example, one such method may include the following steps: obtaining, for a patient, at least one of exome, transcript, or whole genomic tumor nucleotide sequencing data from a subject's tumor cells and normal cells, the nucleotide. Sequencing data is used to obtain data representing the peptide sequence of each of the identified neoantigen sets by comparing nucleotide sequencing data from tumor cells to nucleotide sequencing data from normal cells, wherein the peptide sequence of each neoantigen is subject Comprising at least one alteration that distinguishes it from the corresponding wild-type, parent peptide sequence identified from the normal cell of and comprising information about a set of amino acids in the peptide sequence and a plurality of amino acids constituting the peptide sequence; Using a computer processor to enter the peptide sequence of the neoantigen into a machine-learning presentation model to generate a series of numerical presentation possibilities for a set of neoantigens, each of the presentation possibilities in the set being one or more on the subject's tumor cell surface. A step indicating the likelihood that a corresponding new antigen will be presented by the MHC allele. The machine learning presentation model includes a plurality of parameters identified based at least on the training data set. The training data set includes, for each sample in the sample set, a label obtained by mass spectrometry measuring the presence of a peptide bound to at least one MHC allele in the MHC allele set identified as present in the sample; A training peptide sequence comprising, for each sample, a plurality of amino acids constituting the training peptide sequence and information regarding a set of amino acid positions in the training peptide sequence; And a function representing the relationship between the peptide sequence of the neoantigen received as input and the likelihood of presentation generated as output. The method may further include the following steps: identifying, to the subject, a therapeutic subset of the neoantigen from the neoantigen set, angiogenesis corresponding to a predetermined number of neoantigens having a potential for presentation above a predetermined threshold. Identifying a therapeutic subset of antigens; And “to a subject,” identifying a cassette sequence comprising a series of linked therapeutic epitopes comprising a peptide sequence of the corresponding “neoantigen” in a therapeutic subset of the neoantigen, wherein the cassette sequence is between one or more adjacent therapeutic epitope pairs. Identification based on presentation of one or more conjugation epitopes across corresponding junctions.

하나 이상의 접합 에피토프의 제시는 하나 이상의 접합 에피토프의 서열을 기계-학습 제시 모델에 입력함으로써 생성된 제시 가능성에 기초하여 결정될 수 있다.The presentation of one or more conjugation epitopes can be determined based on the likelihood of presentation generated by entering the sequence of one or more conjugation epitopes into a machine-learning presentation model.

하나 이상의 접합 에피토프의 제시는 하나 이상의 접합 에피토프 및 하나 이상의 대상체의 MHC 대립유전자 사이의 결합 친화성 예측에 기초하여 결정될 수 있다.The presentation of one or more conjugation epitopes can be determined based on prediction of binding affinity between one or more conjugation epitopes and the MHC alleles of one or more subjects.

하나 이상의 접합 에피토프의 제시는 하나 이상의 접합 에피토프의 결합 안정성 예측에 기초하여 결정될 수 있다.The presentation of one or more conjugation epitopes can be determined based on predicting the binding stability of one or more conjugation epitopes.

하나 이상의 접합 에피토프는　제1 치료적 에피토프의 서열 및 제1 치료적 에피토프 다음에 연결된 제2 치료적 에피토프와 중첩되는 접합 에피토프를 포함할 수 있다.The one or more conjugation epitopes can include a conjugation epitope that overlaps the sequence of the first therapeutic epitope and a second therapeutic epitope linked after the first therapeutic epitope.

링커 서열은 제1 치료적 에피토프의 서열 및 제1 치료적 에피토프 다음에 연결된 제2 치료적 에피토프 사이에 위치할 수 있고, 상기 하나 이상의 접합 에피토프는 링커 서열과 중첩되는 접합 에피토프를 포함한다.The linker sequence can be located between the sequence of the first therapeutic epitope and the second therapeutic epitope linked after the first therapeutic epitope, the one or more conjugation epitopes comprising a conjugation epitope overlapping the linker sequence.

카세트 서열을 동정하는 것은 각각의 정렬된 치료적 에피토프 쌍에 대해, 정렬된 치료적 에피토프 쌍 사이의 접합부에 걸쳐 있는 한 세트의 접합 에피토프를 결정하는 단계;　및　각각의 정렬된 치료적　에피토프　쌍에 대해, 대상체의 하나 이상의 MHC 대립유전자 상의 정렬된 쌍에 대한 접합 에피토프 세트의 제시를 나타내는 거리 매트릭을 결정하는 단계를 추가로 포함할 수 있다.Identifying the cassette sequence comprises, for each aligned therapeutic epitope pair, determining a set of conjugation epitopes spanning the junction between the aligned therapeutic epitope pairs; And for each aligned therapeutic “epitope” pair, determining a distance metric representing the presentation of a set of conjugated epitopes for the aligned pair on one or more MHC alleles of the subject.

카세트 서열을 동정하는 것은 치료적 에피토프의 상이한 서열에 상응하는 후보 카세트 서열 세트를 생성하는 단계; 각각의 후보 카세트 서열에 대해, 후보 카세트 서열에서 각각의 정렬된 치료적 에피토프 쌍에 대한 거리 매트릭에 기초하여 후보 카세트 서열에 대한 제시 스코어를 결정하는 단계;　및　신생항원 백신에 대한 카세트 서열로서 미리 결정된 임계치 미만의 제시 스코어와 관련된 후보 카세트 서열을 선택하는 단계를 추가로 포함할 수 있다.Identifying the cassette sequence comprises: generating a set of candidate cassette sequences corresponding to different sequences of therapeutic epitopes; For each candidate cassette sequence, determining a presentation score for the candidate cassette sequence based on the distance metric for each aligned therapeutic epitope pair in the candidate cassette sequence; And selecting a candidate cassette sequence associated with a presentation score below a predetermined threshold as the cassette sequence for the neoantigen vaccine.

후보 카세트 서열의 세트는 임의로 생성될 수 있다.The set of candidate cassette sequences can be generated arbitrarily.

카세트 서열을 동정하는 것은 하기 최적화 문제에서 x _km 의 값을 해결하는 단계:Identifying the cassette sequence solves the value of x _km in the following optimization problem:

식 중 v는 미리 결정된 수의 신생항원에 상응하고, k는 치료적 에피토프에　상응하고 m은 치료적 에피토프 다음에 연결된 인접한 치료적 에피토프에 상응하며, P는 하기로 주어진 경로 매트릭스이고:Where v is Corresponds to a predetermined number of new antigens, k is Corresponds to the therapeutic epitope and m corresponds to the adjacent therapeutic epitope linked following the therapeutic epitope, P is the pathway matrix given below:

식 중 D는 요소 D(k,m)가 정렬된 치료적 에피토프 k,m 쌍의 거리 매트릭을 나타내는 v x v 매트릭스이고; 및 x _km 의 해결 값에 기초하여 카세트 서열을 선택하는 단계를 추가로 포함할 수 있다.Wherein D is a v x v matrix representing the distance metric of the therapeutic epitope k,m pair with element D ( k,m ) aligned; And selecting a cassette sequence based on the resolution value of x _km .

방법은　카세트 서열을　포함하는 종양 백신을 제조하거나 제조한 단계를 추가로 포함할 수 있다.The method may further comprise preparing or preparing a tumor vaccine comprising the “cassette sequence”.

또한 본원에서 개시된 것은 신생항원 백신의 카세트 서열을 동정하는 방법으로, 하기 단계를 포함한다: 또한, 환자에 대하여, 대상체의 종양 세포 및 정상 세포로부터의 엑솜, 전사체 또는 전체 게놈 종양 뉴클레오티드 서열분석 데이터 중 적어도 하나를 얻는 단계로서, 뉴클레오티드 서열분석 데이터는 종양 세포로부터의 뉴클레오티드 서열분석 데이터와 정상 세포로부터의 뉴클레오티드 서열분석 데이터를 비교함으로써 동정된 신생항원 세트 각각의 펩타이드 서열을 나타내는 데이터를 얻는데 사용되며, 각각 신생항원의 펩타이드 서열은 대상체의 정상 세포로부터 동정된 상응하는 야생형, 모 펩타이드 서열로부터 구별하게 만드는 적어도 하나의 변경을 포함하고 펩타이드 서열을 구성하는 복수의 아미노산　서열 및 펩타이드 서열에서 아미노산의　위치 세트에 관한 정보를 포함하는 단계;　대상체에 대하여, 신생항원 세트로부터 신생항원의 치료 서브셋을 동정하는 단계;　및　대상체에 대하여, 각각　신생항원의 치료 서브셋에서　상응하는　신생항원의 펩타이드 서열을 포함하는 일련의 연결된 치료적 에피토프를 포함하는 카세트 서열을 동정하는 단계로서, 카세트 서열은 하나 이상의 인접한 치료적 에피토프 쌍 사이의 상응하는 접합부에 걸쳐 있는 하나 이상의 접합　에피토프의　제시에 기초하여 동정하는 단계.Also disclosed herein is a method for identifying the cassette sequence of a neoantigen vaccine, comprising the following steps: In addition, for patients, exome, transcript or whole genomic tumor nucleotide sequencing data from tumor cells and normal cells of a subject. As a step of obtaining at least one of, the nucleotide sequencing data is used to obtain data representing the peptide sequence of each of the identified set of neoantigens by comparing nucleotide sequencing data from tumor cells to nucleotide sequencing data from normal cells, The peptide sequence of each neoantigen comprises at least one modification that distinguishes it from the corresponding wild-type, parent peptide sequence identified from the subject's normal cells, and comprises a plurality of amino acid sequences comprising the peptide sequence and a set of amino acid positions in the peptide sequence. Including information about; Identifying, for a subject, a therapeutic subset of the neoantigen from the neoantigen set; And, for the subject, identifying a cassette sequence comprising a series of linked therapeutic epitopes each comprising a peptide sequence of the corresponding “neoantigen” in a therapeutic subset of the neoantigen, wherein the cassette sequence is between one or more adjacent therapeutic epitope pairs. Identifying based on the presentation of one or more conjugation epitopes spanning the corresponding conjugation of.

하나 이상의 접합된 에피토프의 제시는 기계-학습 제시 모델로 하나 이상의 접합 에피토프 서열을 입력하여 생성된 제시 가능성에 기초하여 결정될 수 있고, 제시 가능성은 하나 이상의 접합 에피토프가 환자의 종양 세포의 표면 상의 하나 이상의 MHC 대립유전자에 의해 제시되는 가능성을 나타내며, 제시 가능성 세트는 적어도 수신된 질량 분광분석 데이터에 기초하여 동정되었다.The presentation of one or more conjugated epitopes can be determined based on the likelihood of presentation generated by entering one or more conjugation epitope sequences into a machine-learning presentation model, wherein the likelihood of presentation is one or more conjugation epitopes on the surface of a patient's tumor cells. It represents the likelihood presented by the MHC allele and the set of likelihoods were identified based at least on the received mass spectrometry data.

하나 이상의 접합된 에피토프의 제시는 하나 이상의 접합 에피토프의 결합 안정성 예측에 기초하여 결정될 수 있다.The presentation of one or more conjugated epitopes can be determined based on prediction of binding stability of one or more conjugated epitopes.

하나 이상의 접합 에피토프는 제1 치료적 에피토프의 서열 및 제1 치료적 에피토프 다음에 연결된 제2 치료적 에피토프와 중첩된 접합 에피토프를 포함할 수 있다.The one or more conjugation epitopes can include a conjugation epitope that overlaps the sequence of the first therapeutic epitope and a second therapeutic epitope linked after the first therapeutic epitope.

링커 서열은 제1 치료적 에피토프의 서열 및 제1 치료적 에피토프 다음에 연결된 제2 치료적 에피토프 사이에 위치할 수 있고, 하나 이상의 접합 에피토프는 링커 서열과 중첩된 접합된 에피토프를 포함한다.The linker sequence can be located between the sequence of the first therapeutic epitope and the second therapeutic epitope linked after the first therapeutic epitope, and the one or more conjugation epitopes comprise a conjugated epitope overlapping the linker sequence.

카세트 서열을 동정하는 것은 치료적　에피토프의 각각 정렬된 쌍에 대해, 치료적　에피토프의 각각 정렬된 쌍 사이에 접합부에 걸쳐 있는 접합 에피토프의 세트를 결정하는 단계;　및　치료적　에피토프의 각각 정렬된 쌍에 대해, 대상체의 하나 이상의 MHC 대립유전자 상의 정렬된 쌍에 대한 접합 에피토프 세트의 제시를 나타내는 거리 매트릭을 결정하는 단계를 추가로 포함할 수 있다.Identifying the cassette sequence comprises determining, for each aligned pair of therapeutic “epitopes,” a set of conjugation epitopes spanning the junction between each aligned pair of therapeutic “epitopes”; And for each ordered pair of “therapeutic” epitopes, determining a distance metric indicating the presentation of a set of conjugated epitopes for the ordered pair on one or more MHC alleles of the subject.

카세트 서열을 동정하는 것은 치료적 에피토프의 상이한 서열에 상응하는 후보 카세트 서열의 세트를 생성하는 단계;　각각의 후보 카세트 서열에 대해, 후보 카세트 서열에서 치료적 에피토프의 각각의 정렬된 쌍에　대한 거리 매트릭에 기초하여 후보 카세트 서열에 대한 제시 스코어를　결정하는 단계;　및　신생항원 백신에 대한 카세트 서열로서 미리 결정된 임계치 미만의 제시 스코어와 관련된 후보 카세트 서열을 선택하는　　단계를 추가로 포함할 수 있다.Identifying the cassette sequence comprises generating a set of candidate cassette sequences corresponding to different sequences of therapeutic epitopes; For each candidate cassette sequence, determining a presentation score for the candidate cassette sequence based on the distance metric for each aligned pair of therapeutic epitopes in the candidate cassette sequence; And selecting a candidate cassette sequence associated with a presentation score below a predetermined threshold as the cassette sequence for the neoantigen vaccine.

카세트 서열을 동정하는 것은 하기 최적화 문제에서 xkm의 값을 해결하는 단계:Identifying the cassette sequence solves the value of xkm in the following optimization problem:

식 중 D는 요소 D(k,m)가 정렬된 치료적 에피토프 k,m의 거리 매트릭을 나타내는 v x v 매트릭스이고; 및 x _km 의 해결 값에 기초하여 카세트 서열을 선택하는 단계를 추가로 포함할 수 있다.Wherein D is a v x v matrix representing the distance metric of the therapeutic epitope k,m with the element D ( k,m ) aligned; And selecting a cassette sequence based on the resolution value of x _km .

상기 방법은　카세트 서열을　포함하는 종양 백신을 제조하는 단계를 추가로 포함할 수 있다.The method may further include the step of preparing a tumor vaccine comprising the “cassette sequence”.

또한 본원에서 개시된 것은 신생항원 백신의 카세트 서열을 동정하는 방법으로, 하기 단계를 포함한다: 신생항원 백신 단계와　공유 항원의 치료 일부 또는 복수의 대상을 치료하기 위한 공유 항원의 치료 서브셋 또는 공유 신생항원의 치료 서브셋에 대한 펩타이드 서열을 얻는 단계로서,　치료　서브셋은 미리 결정된 임계치 이상의 제시 가능성을 갖는 미리 결정된 수의 펩타이드 서열에 상응하는 단계;　및　공유 항원의 치료 서브셋 또는 공유 신생항원의 치료 서브셋에서 상응하는 펩타이드 서열을 각각 포함하는 일련의 연결된 치료적 에피토프를 포함하는　카세트 서열을　동정하는 단계, 여기서 카세트 서열을　동정하는 단계는　각각의 치료적 에피토프의 정렬된 쌍에 대해, 치료적 에피토프의 정렬된 쌍 사이의 접합부에 걸쳐 있는 접합 에피토프의 세트를 결정하는 단계;　및　각각의 정렬된 치료적 에피토프 쌍에 대해, 상기 정렬된 쌍에 대한 접합 에피토프 세트의 제시를 나타내는 거리 매트릭을 결정하는 단계를 포함하고, 여기서 거리 매트릭은 각각　상응하는 MHC 대립유전자의　유병률을 나타내는 가중치 세트와 MHC 대립유전자 상의 접합 에피토프 세트의 제시 가능성을 나타내는 상응하는 서브 거리 매트릭의 조합으로 결정됨.Also disclosed herein is a method for identifying the cassette sequence of a neoantigen vaccine, comprising the following steps: treatment of a covalent antigen with a neoantigen vaccine step or a therapeutic subset or covalent neoantigen of a covalent antigen to treat multiple or multiple subjects. Obtaining a peptide sequence for a therapeutic subset of, wherein “therapeutic” subset corresponds to a predetermined number of peptide sequences having a probability of presenting above a predetermined threshold; And “identifying a cassette sequence comprising a series of linked therapeutic epitopes each comprising a corresponding peptide sequence in a therapeutic subset of a covalent antigen or a therapeutic subset of a covalent neoantigen, wherein the step of “identifying the cassette sequence” is the respective therapeutic. For an aligned pair of epitopes, determining a set of conjugation epitopes that span the junction between the aligned pair of therapeutic epitopes; And for each aligned therapeutic epitope pair, determining a distance metric representing the presentation of the set of conjugated epitopes for the aligned pair, wherein the distance metric is a weight representing the prevalence of each corresponding MHC allele. Determined by the combination of the set and the corresponding sub-distance metrics indicating the potential for presentation of a set of conjugated epitopes on the MHC allele.

또한 본원에서 개시된 것은 연결된 치료적 에피토프의 서열을 포함하는 종양 백신으로, 하기의 단계를 수행함으로써 카세트 서열을 동정한다: 환자에 대하여, 대상체의 종양 세포 및 정상 세포로부터의 엑솜, 전사체 또는 전체 게놈 종양 뉴클레오티드 서열분석 데이터 중 적어도 하나를 얻는 단계로서, 뉴클레오티드 서열분석 데이터는 종양 세포로부터의 뉴클레오티드 서열분석 데이터와 정상 세포로부터의 뉴클레오티드 서열분석 데이터를 비교함으로써 동정된 신생항원 세트 각각의 펩타이드 서열을 나타내는 데이터를 얻는데 사용되며, 여기서 각각 신생항원의 펩타이드 서열은 대상의 정상 세포로부터 동정된 상응하는 야생형, 모 펩타이드 서열로부터 구별하게 만드는 적어도 하나의 변경을 포함하고 펩타이드 서열을 구성하는 복수의 아미노산　서열 및 펩타이드 서열에서 아미노산의　위치 세트에 관한 정보를 포함하는 단계; 대상체에 대하여, 신생항원 세트로부터 신생항원의 치료 서브셋을 동정하는 단계; 및　대상체에 대하여, 각각　신생항원의 치료 서브셋에서　상응하는　신생항원의 펩타이드 서열을 포함하는 일련의 연결된 치료적 에피토프를 포함하는 카세트 서열을 동정하는 단계로서, 카세트 서열은 하나 이상의 인접한 치료적 에피토프 쌍 사이의 상응하는 접합부에 걸쳐 있는 하나 이상의 접합　에피토프의　제시에 기초하여 동정되는 단계.Also disclosed herein are tumor vaccines comprising sequences of linked therapeutic epitopes, which identify cassette sequences by performing the following steps: For a patient, exome, transcript or whole genome from a subject's tumor cells and normal cells The step of obtaining at least one of the tumor nucleotide sequencing data, wherein the nucleotide sequencing data is data representing the peptide sequence of each set of neoantigens identified by comparing nucleotide sequencing data from tumor cells to nucleotide sequencing data from normal cells. Wherein each of the peptide sequence of the new antigen is a plurality of amino acid 　 and the peptide sequence comprising at least one alteration to distinguish from the corresponding wild-type, parent peptide sequence identified from the normal cells of the subject and constituting the peptide sequence In the step of including information on the set of amino acid 　 position; Identifying, for a subject, a therapeutic subset of the neoantigen from the neoantigen set; And, for the subject, identifying a cassette sequence comprising a series of linked therapeutic epitopes each comprising a peptide sequence of the corresponding “neoantigen” in a therapeutic subset of the neoantigen, wherein the cassette sequence is between one or more adjacent therapeutic epitope pairs. Identified based on the presentation of one or more conjugation epitopes spanning the corresponding conjugation of.

하나 이상의 접합된 에피토프의 제시는 기계-학습 제시 모델로 하나 또는 이상의 접합 에피토프 서열을 입력하여 생성된 제시 가능성에 기초하여 결정될 수　있고, 제시 가능성은 하나 또는 이상의 접합 에피토프가 환자의 종양 세포의 표면 상의 하나 이상의 MHC 대립유전자에 의해 제시되는 가능성을 나타내며, 제시 가능성 세트는 적어도 수신된 질량 분광분석 데이터에 기초하여 동정된다.The presentation of one or more conjugated epitopes can be determined based on the likelihood of presentation generated by entering one or more conjugated epitope sequences into a machine-learning presentation model, and the likelihood of presentation is one or more conjugated epitopes on the surface of a patient's tumor cells. Represents the likelihood presented by one or more MHC alleles, the set of likelihoods of presentation are identified based at least on the received mass spectrometric data.

식 중 D는 요소 D(k,m)가 정렬된 치료적 에피토프 k,m의 거리 매트릭을 나타내는 v x v 매트릭스이고; 그리고 x _km 의 해결 값에 기초하여 카세트 서열을 선택하는 단계를 추가로 포함할 수 있다.Wherein D is a v x v matrix representing the distance metric of the therapeutic epitope k,m with the element D ( k,m ) aligned; And selecting the cassette sequence based on the resolution value of x _km .

제24항의 종양 백신은, 카세트 서열을　포함하는 종양 백신을 제조하거나 제조한 단계를 추가로 포함할 수 있다.The tumor vaccine of claim 24 may further comprise the step of preparing or preparing a tumor vaccine comprising a cassette sequence.

또한 본원에서 개시된 것은 연결된 치료적 에피토프의 서열을 포함하는 카세트 서열을 포함하는 종양 백신이며, 카세트 서열은 상응　신생항원의 치료 서브셋 내의 상응하는 신생항원의 펩타이드 서열을 각각 포함하도록　정렬되고, 여기서 치료적 에피토프의 서열은 하나 이상의 인접한 치료적 에피토프 쌍 사이의 상응하는 접합부에 걸쳐 있는 하나 이상의 접합 에피토프의 제시에 기초하여 동정되며, 여기서 카세트 서열의 접합 에피토프는 임계 결합 친화도 미만의 HLA 결합 친화도를 갖는다.Also disclosed herein is a tumor vaccine comprising a cassette sequence comprising a sequence of linked therapeutic epitopes, wherein the cassette sequence is aligned to include each peptide sequence of the corresponding neoantigen in a therapeutic subset of the corresponding neoantigen, wherein the therapeutic The sequence of the epitope is identified based on the presentation of one or more conjugation epitopes spanning the corresponding junction between one or more adjacent therapeutic epitope pairs, wherein the conjugation epitope of the cassette sequence has an HLA binding affinity below the critical binding affinity. .

임계 결합 친화도는 1000 NM 이상일 수 있다.The critical binding affinity can be 1000 NM or higher.

또한 본원에서 개시된 것은 연결된 치료적 에피토프의 서열을 포함하는 카세트 서열을 포함하는 종양 백신이며, 카세트 서열은 상응　신생항원의 치료 서브셋 내의 상응하는 신생항원의 펩타이드 서열을 각각 포함하도록　정렬되고, 여기서 치료적 에피토프의 서열은 하나 이상의 인접한 치료적 에피토프 쌍 사이의 상응하는 접합부에 걸쳐 있는 하나 이상의 접합 에피토프의 제시에 기초하여 동정되며, 적어도 카세트 서열의 접합 에피토프의 임계치 백분율은 임계치 제시 가능성 미만의 제시 가능성을 갖는다.Also disclosed herein is a tumor vaccine comprising a cassette sequence comprising a sequence of linked therapeutic epitopes, wherein the cassette sequence is aligned to include each peptide sequence of the corresponding neoantigen in a therapeutic subset of the corresponding neoantigen, wherein the therapeutic Sequences of epitopes are identified based on presentation of one or more conjugation epitopes spanning corresponding junctions between one or more adjacent therapeutic epitope pairs, at least a threshold percentage of conjugation epitopes in a cassette sequence has a potential for presentation below the probability of presenting a threshold. .

임계치 백분율은 50%일 수 있다.The threshold percentage may be 50%.

III. 신생항원에서 종양 특이적 돌연변이의 동정III. Identification of tumor-specific mutations in neoantigens

또한, 특정 돌연변이(예를 들어, 암세포에 존재하는 변이 또는 대립유전자)를 동정하는 방법이 본 명세서에 개시되어 있다. 특히, 이들 돌연변이는 암을 갖는 대상체의 암세포의 게놈, 전사체, 단백체, 또는 엑솜에는 존재할 수 있지만, 대상체의 정상 조직에는 존재하지 않을 수 있다. 　In addition, methods for identifying specific mutations (eg, mutations or alleles present in cancer cells) are disclosed herein. In particular, these mutations may be present in the genome, transcript, protein, or exome of cancer cells of a subject with cancer, but may not be present in the subject's normal tissues.

종양의 유전적 돌연변이는 종양에서만 배타적으로 단백질의 아미노산 서열의 변화를 유도하는 경우 종양의 면역학적 표적화에 유용하다고 간주될 수 있다. 유용한 돌연변이는 하기를 포함한다: (1) 단백질내 상이한 아미노산으로 이어지는 비-동의 돌연변이; (2) 정지 코돈이 변형 또는 결실되어 C-말단에서 새로운 종양-특이적 서열을 갖는 더 긴 단백질의 번역을 유도하는 번역초과(read-through) 돌연변이; (3) 성숙한 mRNA에 인트론을 포함시켜 특유의 종양-특이적 단백질 서열을 포함시키는 스플라이스 부위 돌연변이; (4) 2개의 단백질의 접합부에서 종양-특이적 서열을 갖는 키메라 단백질을 생성시키는 염색체 재배열(즉, 유전자 융합); (5) 새로운 종양-특이적 단백질 서열을 갖는 신규한 열린 해독틀을 이끄는 격자 이동 돌연변이 또는 결실.　 돌연변이는 또한, 비프레임 이동 indel, 미스센스 또는 논센스 치환, 스플라이스 부위 변경, 게놈 재배열 또는 유전자 융합, 또는 신생 ORF를 생성시키는 임의의 게놈 또는 발현 변경 중 하나 이상을 포함할 수 있다. Genetic mutations in tumors can be considered useful for immunological targeting of tumors when they induce changes in the amino acid sequence of the protein exclusively in the tumor. Useful mutations include: (1) non-synonymous mutations leading to different amino acids in the protein; (2) read-through mutations in which the stop codon is modified or deleted to induce the translation of longer proteins with new tumor-specific sequences at the C-terminus; (3) splice site mutations that include introns in mature mRNAs to include unique tumor-specific protein sequences; (4) chromosomal rearrangement (ie, gene fusion) to produce a chimeric protein having a tumor-specific sequence at the junction of two proteins; (5) Lattice shift mutations or deletions leading to new open reading frames with new tumor-specific protein sequences. Mutations can also include one or more of non-frame shifting indels, missense or nonsense substitutions, splice site alterations, genome rearrangements or gene fusions, or any genomic or expression alterations that produce a new ORF.

종양 세포에서의, 예를 들어 스플라이스-부위, 격자 이동, 초과번역 또는 유전자 융합 돌연변이로부터 발생하는 돌연변이를 갖는 펩타이드 또는 돌연변이된 폴리펩타이드 종양 대 정상 세포에서 DNA, RNA 또는 단백질을 서열분석함으로써 동정될 수 있다. 　Peptide or mutated polypeptide with mutations resulting from, for example, splice-site, lattice shift, overtranslation or gene fusion mutations in tumor cells versus sequencing DNA, RNA or proteins in normal cells Can.

또한 돌연변이에는 이전에 확인된 종양 특이적 돌연변이가 포함될 수 있다. 알려진 종양 돌연변이는 암에 있어서 체세포 돌연변이의 카탈로그(Catalogue of Somatic Mutations in Cancer, COSMIC) 데이터베이스에서 찾을 수 있다. 　Mutations can also include previously identified tumor-specific mutations. Known tumor mutations can be found in the Catalogue of Somatic Mutations in Cancer (COSMIC) database.

개개인의 DNA 또는 RNA에서 특정한 돌연변이 또는 대립유전자의 존재를 검출하기 위한 다양한 방법이 이용가능하다. 이 분야의 진전은 정확하고 쉽고 저렴한 대규모 SNP 유전자분석(genotyping)을 제공한다. 예를 들어, 동적 대립유전자-특이적 하이브리드화(DASH), 마이크로플레이트 어레이 대각선 겔 전기영동(MADGE), 파이로서열분석, 올리고뉴클레오타이드-특이적 결찰, TaqMan 시스템 뿐만 아니라 Affymetrix SNP 칩과 같은 다양한 DNA "칩" 기술을 포함하는 여러 기술들이 기술되어 있다. 이들 방법은 통상 PCR에 의해 표적 유전자 영역의 증폭을 이용한다. 또다른 방법들은 침습성 절단에 의한 작은 신호 분자의 생성, 이어서 질량 분광분석법 또는 고정된 패드록 프로브 및 롤링-서클 증폭에 기초되어 있다. 특이적인 돌연변이를 검출하기 위한 당해 분야에 공지된 몇 가지 방법이 하기에 요약되어 있다. Various methods are available for detecting the presence of specific mutations or alleles in an individual's DNA or RNA. Progress in this field provides accurate, easy and inexpensive large-scale SNP genotyping. For example, dynamic allele-specific hybridization (DASH), microplate array diagonal gel electrophoresis (MADGE), pyrosequencing, oligonucleotide-specific ligation, TaqMan system as well as various DNA such as Affymetrix SNP chip Several technologies have been described, including "chip" technology. These methods usually utilize amplification of the target gene region by PCR. Other methods are based on the production of small signal molecules by invasive cleavage, followed by mass spectrometry or fixed padlock probes and rolling-circle amplification. Several methods known in the art for detecting specific mutations are summarized below.

PCR 기반 검출 수단은 복수의 마커의 멀티플렉스 증폭을 동시에 포함할 수 있다. 예를 들어, 크기가 중첩되지 않고 동시에 분석될 수 있는 PCR 생성물을 생성하기 위한 PCR 프라이머를 선택하는 것은 당해 분야에 잘 알려져 있다. 대안적으로, 차별적으로 표지되고 그에 따라서 차별적으로 검출될 수 있는 프라이머에 의해 상이한 마커를 증폭시키는 것이 가능하다. 물론, 하이브리드화 기반의 검출 수단은 샘플내 다중 PCR 생성물의 차별적인 검출을 허용한다. 당해 기술 분야에서 복수의 마커의 멀티플렉스 분석을 가능하게 하는 다른 기술이 알려져 있다　PCR-based detection means may include multiplex amplification of a plurality of markers simultaneously. For example, it is well known in the art to select PCR primers to generate PCR products that do not overlap in size and can be analyzed simultaneously. Alternatively, it is possible to amplify different markers with primers that are differentially labeled and thus can be differentially detected. Of course, hybridization-based detection means allow differential detection of multiple PCR products in a sample. Other techniques are known in the art to enable multiplex analysis of multiple markers.

게놈 DNA 또는 세포 RNA 내 단일 뉴클레오타이드 다형성 분석을 용이하게 하기 위해 여러 가지 방법이 개발되었다. 예를 들어, 단일 염기 다형성은 특화된 엑소뉴클레아제-저항성 뉴클레오타이드를 사용함으로써 검출될 수 있는데, 이는 예를 들어, 하기에 개시되어 있다: Mundy, C.R.(미국특허제4,656,127호).상기 방법에 따라, 다형성 부위의 3' 바로 옆에 있는 대립유전자 서열에 상보적인 프라이머를 특정한 동물 또는 인간으로부터 수득된 표적 분자에 혼성화시킨다. 표적 분자 상의 다형성 부위가 존재하는 특정한 엑소뉴클레아제-저항성 뉴클레오타이드 유도체에 상보적인 뉴클레오타이드를 함유한다면, 그 유도체는 혼성화된 프라이머의 단부 상에 편입될 것이다. 상기 편입은 프라이머를 엑소뉴클레아제에 대하여 저항성이 되도록 하여, 검출을 가능하게 한다. 샘플의 엑소뉴클레아제-저항성 유도체의 신원이 알려져 있기 때문에, 프라이머가 엑소뉴클레아제에 대하여 저항성이 있다는 발견은 표적 분자의 다형성 부위에 존재하는 뉴클레오타이드(들)이 반응에 사용된 뉴클레오타이드 유도체의 뉴클레오타이드와 상보적이라는 것을 나타낸다. 이 방법은 많은 양의 이질적인 서열 데이터를 결정할 필요가 없다는 이점을 갖는다. 　Several methods have been developed to facilitate analysis of single nucleotide polymorphisms in genomic DNA or cellular RNA. For example, single base polymorphism can be detected by using specialized exonuclease-resistant nucleotides, which are disclosed, for example, below: Mundy, CR (US Pat. No. 4,656,127). , A primer complementary to the allele sequence immediately 3'to the polymorphic site is hybridized to a target molecule obtained from a particular animal or human. If a polymorphic site on the target molecule contains a nucleotide complementary to a particular exonuclease-resistant nucleotide derivative, the derivative will be incorporated on the end of the hybridized primer. The incorporation renders the primer resistant to exonuclease, allowing detection. Since the identity of the sample's exonuclease-resistant derivative is known, the discovery that the primer is resistant to the exonuclease is that the nucleotide(s) present at the polymorphic site of the target molecule are used to react the nucleotide of the nucleotide derivative. And complementary. This method has the advantage that it is not necessary to determine large amounts of heterogeneous sequence data.

용액-기반 방법은 다형성 부위의 뉴클레오타이드의 신원을 결정하는데 사용될 수 있다. Cohen, D. 등(프랑스 특허 제2,650,840호; PCT 출원제WO91/02087호).하기의 미국의 Mundy 방법에서와 같이,특허제4,656,127호, 다형성 부위의 3' 바로 옆에 있는 대립유전자 서열에 상보적인 프라이머가 사용된다. 이 방법은 표지된 디데옥시뉴클레오타이드 유도체를 사용하여 그 부위의 뉴클레오타이드의 신원을 결정하는데, 다형성 부위의 뉴클레오타이드에 상보적인 경우 프라이머의 말단에 편입될 것이다. 유전적 Bit 분석 또는 GBA로 알려진 대안적인 방법은 하기에 의해 기술되어 있다: Goelet, P. 등(PCT 출원제92/15712호).Goelet, P. 등의 방법은 표지된 종결자와 서열 3'에 상보적인 프라이머의 혼합물을 다형성 부위에 사용한다. 따라서, 편입된 표지된 종결자는 평가되는 표적 분자의 다형성 부위에 존재하는 뉴클레오타이드에 의해 결정되고, 이에 대해 상보적이다. Cohen 등의 방법과는 대조적으로,(프랑스 특허 제2,650,840호; PCT 출원제WO91/02087) Goelet, P.등의 방법은 프라이머 또는 표적 분자가 고상으로 고정되는 불균질 상 검정일 수 있다. Solution-based methods can be used to determine the nucleotide identity of a polymorphic site. Cohen, D. et al. (France Patent No. 2,650,840; PCT Application No. WO91/02087). As in the Mundy method of the United States below, Patent No. 4,656,127, complementary to the allele sequence immediately 3'to the polymorphic site Primers are used. This method uses a labeled dideoxynucleotide derivative to determine the identity of the nucleotide at that site, which will be incorporated at the end of the primer if it is complementary to the nucleotide at the polymorphic site. An alternative method known as genetic bit analysis or GBA is described by: Goelet, P. et al. (PCT Application No. 92/15712). The method of Goelet, P. et al. is labeled terminator and sequence 3' A mixture of primers complementary to is used for the polymorphic site. Thus, the incorporated labeled terminator is determined by, and complementary to, the nucleotides present at the polymorphic site of the target molecule being evaluated. In contrast to the method of Cohen et al. (France Patent No. 2,650,840; PCT application WO91/02087), the method of Goelet, P. et al. may be a heterogeneous phase assay in which a primer or target molecule is immobilized in a solid phase.

DNA에서 다형성 부위를 검정하기 위한 몇개의 프라이머-유도된 뉴클레오타이드 편입 절차가 기술되어 있다(Komher, J.S. 등,Nucl.Acids.Res.17: 7779-7784(1989); Sokolov, B.P.,Nucl.Acids Res.18: 3671 (1990); Syvanen, A.-C., 등,유전체학 8: 684-692(1990); Kuppuswamy, M.N. 등,Proc.Natl.Acad.Sci.(미국)88: 1143-1147 (1991); Prezant, T.R. 등, Hum. Mutat.1: 159-164 (1992); Ugozzoli, L. 등, GATA 9: 107-112 (1992); Nyren, P. 등, Anal. Biochem. 208: 171-175 (1993)).이들 방법은 다형성 부위의 염기를 구별하기 위해 표지된 데옥시뉴클레오타이드의 편입을 이용하는 점에서 GBA와 상이하다. 그와 같은 포맷에서, 신호는 편입된 데옥시뉴클레오타이드의 수에 비례하기 때문에, 동일한 뉴클레오타이드의 런(run)에서 발생하는 다형성은 런의 길이에 비례하는 신호를 초래할 수 있다(Syvanen, A.-C., 등,Amer.J.Hum.Genet.52: 46-59(1993)).　Several primer-derived nucleotide incorporation procedures for assaying polymorphic sites in DNA have been described (Komher, JS et al., Nucl. Acids. Res. 17: 7779-7784 (1989); Sokolov, BP, Nucl.Acids Res .18: 3671 (1990); Syvanen, A.-C., et al., Genomics 8: 684-692 (1990); Kuppuswamy, MN et al., Proc. Natl.Acad.Sci. (USA) 88: 1143-1147 ( 1991); Prezant, TR et al., Hum. Mutat. 1: 159-164 (1992); Ugozzoli, L. et al., GATA 9: 107-112 (1992); Nyren, P. et al., Anal. Biochem. 208: 171 -175 (1993)). These methods differ from GBA in that they incorporate incorporation of labeled deoxynucleotides to distinguish bases from polymorphic sites. In such a format, since the signal is proportional to the number of deoxynucleotides incorporated, polymorphism occurring in a run of the same nucleotide may result in a signal proportional to the length of the run (Syvanen, A.-C) ., et al., Amer. J. Hum. Genet. 52: 46-59 (1993)).

수많은 계획(initiatives)은 병렬적으로 DNA 또는 RNA의 수백만개의 개별 분자에서 직접적으로 서열 정보를 얻는다. 실시간 단일 분자 합성을 통한 서열분석 기술은 형광성 뉴클레오타이드가 서열분석되는 주형에 상보적인 DNA의 발생기 가닥에 편입될 때 형광성 뉴클레오타이드의 검출에 의존한다. 하나의 방법으로, 길이 30-50 염기의 올리고뉴클레오타이드가 유리 커버 슬립에 5' 단부에 공유적으로 고정된다. 이러한 고정된 가닥은 두 가지 기능을 수행한다. 첫째, 주형이 표면-결합된 올리고뉴클레오타이드에 상보적인 포획 꼬리(capture tail)로 구성된 경우, 표적 템플레이트 가닥에 대한 포획 부위로서 작용한다. 그들은 또한 서열 판독의 기초를 이루는 주형 지향된 프라이머 연장을 위한 프라이머 역할을 한다. 포획 프라이머는 염료를 제거하기 위해 염료-링커의 합성, 검출 및 화학적 절단의 다중 주기를 사용하여 서열 결정을 위한 고정된 위치 부위로서 기능한다. 각각의 주기는 폴리머라제/표지된 뉴클레오타이드 혼합물의 첨가, 세정, 이미지형성 및 염료의 절단으로 구성된다. 대안적인 방법에서, 폴리머라제는 형광 공여체 분자에 의해 변형되고, 유리 슬라이드 상에 고정된 반면, 각각의 뉴클레오타이드는 감마-포스페이트에 부착된 수용체 형광 모이어티로 색상-코딩된다. 이 시스템은 뉴클레오타이드가 드 노보(de novo) 사슬에 편입됨에 따라 형광-표지된(fluorescently-tagged) 폴리머라제와 형광-변형된 뉴클레오타이드 사이의 상호작용을 검출한다. 다른 합성을 통한 서열분석 기술도 존재한다. 　Numerous initiatives obtain sequence information directly from millions of individual molecules of DNA or RNA in parallel. Sequencing techniques through real-time single molecule synthesis rely on detection of fluorescent nucleotides when fluorescent nucleotides are incorporated into the generator strand of DNA complementary to the template being sequenced. In one method, an oligonucleotide of 30-50 bases in length is covalently fixed to the 5'end on a glass cover slip. This fixed strand serves two functions. First, when the template consists of a capture tail complementary to a surface-bound oligonucleotide, it acts as a capture site for the target template strand. They also serve as primers for template oriented primer extensions that form the basis for sequence reads. The capture primer functions as a fixed site site for sequencing using multiple cycles of synthesis, detection and chemical cleavage of the dye-linker to remove the dye. Each cycle consists of the addition of polymerase/labeled nucleotide mixture, washing, imaging and cleavage of the dye. In an alternative method, the polymerase is modified by a fluorescent donor molecule and immobilized on a glass slide, while each nucleotide is color-coded with a receptor fluorescent moiety attached to gamma-phosphate. This system detects the interaction between a fluorescently-tagged polymerase and a fluorescence-modified nucleotide as the nucleotide is incorporated into the de novo chain. Other synthetic sequencing techniques exist.

임의의 적합한 합성을 통한 서열분석 플랫폼을 사용하여 돌연변이를 확인할 수 있다. 상기에 기술된 바와 같이, 최근 네 가지 주요 합성을 통한 서열분석 플랫폼이 이용가능하다: Roche/454 Life Sciences의 게놈 시퀀서(Genome Sequencers), Illumina/Solexa의 1G 분석기, Applied BioSystems의 SOLiD 시스템, 및 Helicos Biosciences의 Heliscope 시스템.합성을 통한 서열분석 플랫폼은 Pacific BioSciences 및 VisiGen Biotechnologies에 의해 기술되었다. 일부 구현예에서, 서열분석된 복수의 핵산 분자는 지지체(예를 들어, 고형 지지체)에 결합된다. 지지체 상에 핵산을 고정화시키기 위해, 주형의 3' 및/또는 5' 단부에 포획 서열/보편적인 프라이밍 부위가 첨가될 수 있다. 핵산은 지지체에 공유결합된 상보적 서열에 포획 서열을 하이브리드화시킴으로써 지지체에 결합될 수 있다. 포획 서열(또한, 보편적인 포획 서열로도 지칭됨)은 보편적인 프라이머로서 이중으로 작용할 수 있는 지지체에 부착된 서열에 상보적인 핵산 서열이다. 　Mutations can be identified using any suitable synthetic sequencing platform. As described above, recently four major synthesis sequencing platforms are available: Roche/454 Life Sciences' Genome Sequencers, Illumina/Solexa's 1G analyzer, Applied BioSystems' SOLiD system, and Helicos Biosciences' Heliscope system. The sequencing platform through synthesis was described by Pacific BioSciences and VisiGen Biotechnologies. In some embodiments, a plurality of sequenced nucleic acid molecules are bound to a support (eg, a solid support). To immobilize the nucleic acid on the support, a capture sequence/universal priming site can be added to the 3'and/or 5'ends of the template. The nucleic acid can be bound to the support by hybridizing the capture sequence to a complementary sequence covalently bound to the support. A capture sequence (also referred to as a universal capture sequence) is a nucleic acid sequence complementary to a sequence attached to a support capable of double acting as a universal primer.

포획 서열에 대한 대안으로서, 커플링 쌍의 구성원(예컨대, 예를 들어, 항체/항원, 수용체/리간드 또는 아비딘-바이오틴 쌍, 예를 들어,미국 특허 출원 번호제2006/0252077호)은 각각의 단편에 연결되어, 그 커플링 쌍의 각각의 제2 구성원에 의해 코팅된 표면 상에 포획될 수 있다. 　As an alternative to the capture sequence, a member of a coupling pair (eg, an antibody/antigen, receptor/ligand or avidin-biotin pair, eg, US Patent Application No. 2006/0252077) Connected to, can be captured on the surface coated by each second member of the coupling pair.

포획 후, 서열은 예를 들어, 단일 분자 검출/서열분석에 의해 분석될 수 있으며, 예를 들어 실시예 및 미국특허제7,283,337호(합성을 통한 주형-의존적 서열분석 포함)에 기술되어 있다. 합성을 통한 서열분석에서, 표면-결합된 분자는 폴리머라제의 존재하에 복수의 표지된 뉴클레오타이드 삼인산염에 노출된다. 주형의 서열은 성장하는 사슬의 3' 단부에 편입된 표지된 뉴클레오타이드의 순서에 의해 결정된다. 이 작업은 실시간으로 수행되거나 단계별 반복 방식으로 수행될 수 있다. 실시간 분석을 위해, 각 뉴클레오타이드에 대한 다양한 광학 라벨이 편입될 수 있으며, 편입된 뉴클레오타이드의 자극을 위해 여러개의 레이저가 사용될 수 있다. 　After capture, the sequence can be analyzed, for example, by single molecule detection/sequencing, and is described, for example, in Examples and US Pat. No. 7,283,337 (including template-dependent sequencing through synthesis). In sequencing through synthesis, surface-bound molecules are exposed to a plurality of labeled nucleotide triphosphates in the presence of polymerase. The sequence of the template is determined by the sequence of labeled nucleotides incorporated at the 3'end of the growing chain. This can be done in real time or it can be done in a step-by-step iteration. For real-time analysis, various optical labels for each nucleotide can be incorporated, and multiple lasers can be used for stimulation of the incorporated nucleotides.

서열분석은 또한 다른 대량 병렬 서열분석 또는 차세대 서열분석(NGS) 기술 및 플랫폼을 포함할 수 있다. 대량 병렬 서열분석 기술 및 플랫폼의 추가의 예로는 Illumina HiSeq 또는 MiSeq, Thermo PGM 또는 Proton, Pac Bio RS Ⅱ 또는 Sequel, Qiagen's 유전자 판독기 및 Oxford Nanopore MinION이 있다. 추가로 유사한 최신 대량 병렬 서열분석 기술뿐만 아니라 차세대 기술이 사용될 수 있다. Sequencing may also include other mass parallel sequencing or next generation sequencing (NGS) technologies and platforms. Additional examples of mass parallel sequencing techniques and platforms include Illumina HiSeq or MiSeq, Thermo PGM or Proton, Pac Bio RS II or Sequel, Qiagen's Gene Reader and Oxford Nanopore MinION. In addition, similar next-generation mass parallel sequencing technologies as well as next-generation technologies can be used.

임의의 세포 유형 또는 조직을 이용하여 본 명세서에 기재된 방법에서 사용하기 위한 핵산 샘플을 수득할 수 있다. 예를 들어, DNA 또는 RNA 샘플은 공지된 기술(예를 들어, 정맥천자) 또는 타액에 의해 수득된 종양 또는 체액, 예를 들어 혈액으로부터 수득될 수 있다. 대안적으로, 핵산 테스트는 건조 샘플(예를 들어, 모발 또는 피부)에서 수행될 수 있다. 　 또한, 종양으로부터 서열분석을 위한 샘플이 수득될 수 있고, 정상 조직이 종양과 동일한 조직 유형인 경우 서열분석을 위해 다른 샘플이 정상 조직으로부터 수득될 수 있다. 종양으로부터 서열 분석을 위한 샘플이 수득될 수 있고, 정상 조직이 종양과 관련하여 구별되는 조직 유형인 경우 서열 분석을 위해 정상 조직으로부터 또 다른 샘플이 수득될 수 있다. Any cell type or tissue can be used to obtain a nucleic acid sample for use in the methods described herein. For example, DNA or RNA samples can be obtained from tumors or body fluids, such as blood, obtained by known techniques (eg, venipuncture) or saliva. Alternatively, nucleic acid testing can be performed on dry samples (eg, hair or skin). In addition, samples for sequencing can be obtained from tumors, and other samples for sequencing can be obtained from normal tissues if the normal tissue is of the same tissue type as the tumor. Samples for sequencing can be obtained from a tumor, and another sample can be obtained from a normal tissue for sequencing if normal tissue is a distinct tissue type in relation to the tumor.

종양은 폐암, 흑색종, 유방암, 난소암, 전립선암, 신장암, 위암, 결장암, 고환암, 두경부암, 췌장암, 뇌암, B-세포 림프종, 급성 골수성 백혈병, 만성 골수성 백혈병, 만성 림프구성 백혈병 및 T 세포 림프구성 백혈병, 비-소세포 폐암 및 소세포 폐암 중 하나 이상을 포함할 수 있다. Tumors include lung cancer, melanoma, breast cancer, ovarian cancer, prostate cancer, kidney cancer, stomach cancer, colon cancer, testicular cancer, head and neck cancer, pancreatic cancer, brain cancer, B-cell lymphoma, acute myeloid leukemia, chronic myelogenous leukemia, chronic lymphocytic leukemia and T Cell lymphocytic leukemia, non-small cell lung cancer and small cell lung cancer.

대안적으로, 단백질 질량 분광분석법을 사용하여 종양 세포 상에서 MHC 단백질에 결합된 돌연변이된 펩타이드의 존재를 확인하거나 입증할 수 있다. 펩타이드는 종양 세포로부터, 또는 종양으로부터 면역침강된 HLA 분자로부터 산-용출될 수 있고, 그다음 질량 분광분석법을 사용하여 동정될 수 있다. 　Alternatively, protein mass spectrometry can be used to confirm or demonstrate the presence of mutated peptides bound to MHC proteins on tumor cells. Peptides can be acid-eluted from tumor cells, or from immunoprecipitated HLA molecules from tumors, and then identified using mass spectrometry.

Ⅳ. 신생항원Ⅳ. New Port

신생항원에는 뉴클레오타이드 또는 폴리펩타이드가 포함될 수 있다. 예를 들어, 신생항원은 폴리펩타이드 서열을 암호화하는 RNA 서열일 수 있다. 그러므로 백신에 유용한 신생항원은 뉴클레오타이드 서열 또는 폴리펩타이드 서열을 포함할 수 있다. New antigens may include nucleotides or polypeptides. For example, the neoantigen can be an RNA sequence encoding a polypeptide sequence. Therefore, a new antigen useful for a vaccine may include a nucleotide sequence or a polypeptide sequence.

본원에 개시된 방법에 의해 동정된 종양 특이적 돌연변이, 공지된 종양 특이적 돌연변이를 포함하는 펩타이드 및 본원에 개시된 방법에 의해 동정된 돌연변이체 폴리펩타이드 또는 그의 단편을 포함하는 단리된 펩타이드가 본원에 개시된다. 신생항원 펩타이드는 암호화 서열의 문맥으로 기재될 수 있으며, 여기서 신생항원은 뉴클레오타이드 서열 (예를 들어, DNA 또는 RNA)로서, 관련된 폴리펩타이드 서열을 암호화하는 서열을 포함한다.Disclosed herein are isolated peptides comprising tumor specific mutations identified by the methods disclosed herein, peptides comprising known tumor specific mutations and mutant polypeptides identified by the methods disclosed herein or fragments thereof. . Neoantigen peptides can be described in the context of a coding sequence, where the neoantigen is a nucleotide sequence (eg, DNA or RNA), which includes a sequence that encodes a related polypeptide sequence.

신생항원 뉴클레오타이드 서열에 의해 암호화되는 하나 이상의 폴리펩타이드는 하기 중 적어도 하나를 포함할 수 있다: 8-15, 8, 9, 10, 11, 12, 13, 14 또는 15개 아미노산의 길이의 MHC 부류 I 펩타이드에 대하여 IC50 값이 1000nM 미만인 MHC와의 결합 친화도, 프로테아솜 절단을 촉진시키는 펩타이드 내 또는 근처에 서열 모티프 존재, 및 TAP 수송을 촉진시키는 서열 모티프 또는 존재. 6-30, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 또는 30개의 아미노산 길이의 MHC 부류 II 폴리펩타이드에 대해, 세포외 또는 리소좀 프로테아제 (예를 들어, 카텝신) 또는 HLA-DM 촉매화된 HLA 결합에 의한 절단을 촉진하는 펩타이드 내부 또는 근처의 서열 모티프의 존재.The one or more polypeptides encoded by the neoantigen nucleotide sequence can include at least one of the following: MHC class I of lengths of 8-15, 8, 9, 10, 11, 12, 13, 14 or 15 amino acids Binding affinity to MHCs with IC50 values below 1000 nM for peptides, sequence motifs in or near peptides that promote proteasome cleavage, and sequence motifs or presence that promote TAP transport. 6-30, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, For 29, or 30 amino acid long MHC class II polypeptides, sequences within or near peptides that promote cleavage by extracellular or lysosomal proteases (e.g. cathepsin) or HLA-DM catalyzed HLA binding Motif presence.

하나 이상의 신생항원이 종양의 표면 상에 제시될 수 있다. One or more neoantigens may be presented on the surface of the tumor.

하나 이상의 신생항원은 종양이 있는 대상체에서 면역원성이며, 예를 들어 대상체에서 T 세포 반응 또는 B 세포 반응을 유도할 수 있다. The one or more neoantigens are immunogenic in the subject with the tumor and can e.g. induce a T cell response or a B cell response in the subject.

대상체에서 자가면역 반응을 유도하는 하나 이상의 신생항원은 종양이 있는 대상체에 대한 백신 생성의 맥락에서의 고려에서 제외될 수 있다.One or more neoantigens that induce an autoimmune response in a subject can be excluded from consideration in the context of vaccine production for a subject with tumor.

적어도 하나의 신생항원성 펩타이드 분자의 크기는 비제한적으로 약 5, 약 6, 약 7, 약 8, 약 9, 약 10, 약 11, 약 12, 약 13, 약 14, 약 15, 약 16, 약 17, 약 18, 약 19, 약 20, 약 21, 약 22, 약 23, 약 24, 약 25, 약 26, 약 27, 약 28, 약 29, 약 30, 약 31, 약 32, 약 33, 약 34, 약 35, 약 36, 약 37, 약 38, 약 39, 약 40, 약 41, 약 42, 약 43, 약 44, 약 45, 약 46, 약 47, 약 48, 약 49, 약 50, 약 60, 약 70, 약 80, 약 90, 약 100, 약 110, 약 120 또는 그 이상의 아미노 분자 잔기 및 그로부터 유도가능한 임의의 범위이다. 특정 구현예에서 신생항원성 펩타이드 분자는 50개 이하의 아미노산이다. 　The size of the at least one neoantigenic peptide molecule is not limited to about 5, about 6, about 7, about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, About 17, about 18, about 19, about 20, about 21, about 22, about 23, about 24, about 25, about 26, about 27, about 28, about 29, about 30, about 31, about 32, about 33 , About 34, about 35, about 36, about 37, about 38, about 39, about 40, about 41, about 42, about 43, about 44, about 45, about 46, about 47, about 48, about 49, about 50, about 60, about 70, about 80, about 90, about 100, about 110, about 120 or more amino molecular residues and any range derivable therefrom. In certain embodiments, the neoantigenic peptide molecule is no more than 50 amino acids.

신생항원성 펩타이드와 폴리펩타이드는 하기일 수 있다: MHC 부류 I의 경우 길이가 15개 이하의 잔기이고, 일반적으로 약 8 내지 약 11개 잔기, 특히 9 또는 10개 잔기로 구성되며; MHC 부류 Ⅱ의 경우 6-30개 잔기(경계값 포함).　Neoantigenic peptides and polypeptides can be: for MHC class I, no more than 15 residues in length, generally consisting of about 8 to about 11 residues, especially 9 or 10 residues; 6-30 residues for MHC class II (including boundary values).

바람직하다면, 더 긴 펩타이드가 여러 가지 방법으로 설계될 수 있다. 본원의 경우, HLA 대립유전자상의 펩타이드의 제시 가능성이 예측되거나 공지될 때, 보다 긴 펩타이드는 하기 중 하나로 구성될 수 있다: (1) 각각의 상응하는 유전자 생성물의 N- 및 C-말단을 향하여 2 내지 5개 아미노산의 연장을 갖는 개별의 제시된 펩타이드; (2) 제시된 펩타이드의 일부 또는 전부와 각각에 대한 연장된 서열의 결합.또 다른 경우에, 서열분석이 종양에 존재하는 긴(10개 잔기 초과) 네오에피토프 서열(예를 들어, 신규한 펩타이드 서열로 이어지는 격자 이동, 초과번역 또는 인트론 포함으로 인함)을 나타내는 경우, 보다 긴 펩타이드는 하기로 구성된다: (3) 신규한 종양-특이적 아미노산의 전체 스트레치 - 따라서 가장 강한 HLA가 제시된 더 짧은 펩타이드의 선택에 기초한- 컴퓨터이용 또는 시험관내 시험의 필요성을 우회한다. 두 경우 모두, 더 긴 펩타이드의 사용은 환자 세포에 의한 내인성 프로세싱을 가능하게 하고, 보다 효과적인 항원 제시 및 T 세포 반응의 유도를 유도할 수 있다. 　If desired, longer peptides can be designed in a number of ways. In the present case, when the likelihood of presentation of the peptide on the HLA allele is predicted or known, the longer peptide may consist of one of the following: (1) 2 towards the N- and C-termini of each corresponding gene product. Individual presented peptides with an extension of from 5 amino acids; (2) Binding of some or all of the presented peptides with extended sequences for each. In other cases, long (greater than 10 residues) neoepitope sequences present in the tumor (e.g., novel peptide sequences) Longer peptides consist of: (3) total stretch of new tumor-specific amino acids-thus the shortest peptide with the strongest HLA presented. Based on selection-bypassing the need for computerized or in vitro testing. In both cases, the use of longer peptides enables endogenous processing by patient cells and can induce more effective antigen presentation and induction of T cell responses.

신생항원성 펩타이드 및 폴리펩타이드는 HLA 단백질 상에 제시될 수 있다. 일부 양태에서, 신생항원성 펩타이드 및 폴리펩타이드는 야생형 펩타이드보다 큰 친화도를 갖는 HLA 단백질 상에 제시된다. 일부 양태에서, 신생항원성 펩타이드 또는 폴리펩타이드는 적어도 5000 nM 미만, 적어도 1000 nM 미만, 적어도 500 nM 미만, 적어도 250 nM 미만, 적어도 200 nM 미만, 적어도 150 nM 미만, 적어도 100 nM 미만, 적어도 50 nM 미만 또는 그 이하의 IC50을 가질 수 있다. 　Neoantigenic peptides and polypeptides can be presented on HLA proteins. In some embodiments, neoantigenic peptides and polypeptides are presented on HLA proteins with greater affinity than wild-type peptides. In some embodiments, the neoantigenic peptide or polypeptide is at least less than 5000 nM, at least less than 1000 nM, at least less than 500 nM, at least less than 250 nM, at least less than 200 nM, at least less than 150 nM, at least less than 100 nM, at least 50 nM. It may have an IC50 of less than or less.

일부 양태에서, 신생항원성 펩타이드 및 폴리펩타이드는 자가면역 반응을 유도하지 않으며, 및/또는 대상체에게 투여될 때 면역학적 내성을 일으킨다. 　In some embodiments, neoantigenic peptides and polypeptides do not induce an autoimmune response and/or develop immunological resistance when administered to a subject.

또한 적어도 2종 또는 그 이상의 신생항원성 펩타이드를 포함하는 조성물이 제공된다. 일부 구현예에서, 조성물은 적어도 2종의 구별되는 펩타이드를 함유한다. 동일한 폴리펩타이드로부터 적어도 2종의 구별되는 펩타이드가 유래될 수 있다. 구별되는 폴리펩타이드는 펩타이드가 길이, 아미노산 서열 또는 둘 모두에 의해 다양함을 의미한다. 펩타이드는 종양 특이적 돌연변이를 포함하는 것으로 알려진 또는 발견된 임의의 폴리펩타이드로부터 유래된다. 신생항원성 펩타이드가 유래될 수 있는 적합한 폴리펩타이드는 예를 들어 COSMIC 데이터베이스에서 찾을 수 있다. COSMIC은 인간 암의 체세포 돌연변이에 대한 포괄적인 정보를 수집한다. 펩타이드는 종양 특이적 돌연변이를 포함한다. 일부 양태에서 종양 특이적인 돌연변이는 특정한 암 유형에 대한 유발 돌연변이이다. Also provided are compositions comprising at least two or more neoantigenic peptides. In some embodiments, the composition contains at least two distinct peptides. At least two distinct peptides can be derived from the same polypeptide. Distinguishing polypeptides mean that peptides vary in length, amino acid sequence, or both. Peptides are derived from any polypeptide known or found to contain tumor specific mutations. Suitable polypeptides from which neoantigenic peptides can be derived can be found, for example, in the COSMIC database. COSMIC collects comprehensive information about somatic mutations in human cancer. Peptides include tumor specific mutations. In some embodiments a tumor specific mutation is a trigger mutation for a specific cancer type.

원하는 활성 또는 특성을 갖는 신생항원성 펩타이드 및 폴리펩타이드는 원하는 MHC 분자를 결합시키고 적절한 T 세포를 활성화시키기 위해 비변형된 펩타이드의 실질적으로 모든 생물학적 활성을 증가시키면서, 또는 적어도 유지하면서, 특정 원하는 속성, 예를 들어 개선된 약리적 특징을 제공하도록 변형될 수 있다. 예를 들어, 신생항원성 펩타이드 및 폴리펩타이드는 보존적 또는 비-보존적인 치환과 같은 다양한 변화를 겪을 수 있으며, 이러한 변화는 개선된 MHC 결합, 안정성 또는 제시와 같은 용도의 특정한 이점을 제공할 수 있다. 보존적 치환은 아미노산 잔기를 생물학적으로 및/또는 화학적으로 유사한 또다른 아미노산 잔기, 예를 들어 다른 것에 대한 하나의 소수성 잔기, 또는 다른 것에 대한 하나의 극성 잔기로 대체하는 것을 의미한다. 치환은 Gly, Ala; Val, Ile, Leu, Met; Asp, Glu; Asn, Gln; Ser, Thr; Lys, Arg; 및 Phe, Tyr와 같은 조합을 포함한다. 단일 아미노산 치환의 효과는 D-아미노산을 사용하여 프로빙될 수도 있다. 이러한 변형은 공지된 펩타이드 합성 절차를 사용하여, 예를 들어, 하기에 기술된 바와 같이 이루어질 수 있다: Merrifield, Science 232: 341-347(1986), Barany & Merrifield, 펩타이드, Gross & Meienhofer, eds.(N.Y.,Academic Press), pp.1-284(1979); 및 Stewart & Young, 고상 펩타이드 합성, (Rockford, Ill.,Pierce), 2d Ed.(1984).　Neoantigenic peptides and polypeptides having the desired activity or properties increase, or at least maintain, substantially all biological activity of the unmodified peptide to bind the desired MHC molecule and activate the appropriate T cells, while maintaining certain desired properties, For example, it can be modified to provide improved pharmacological properties. For example, neoantigenic peptides and polypeptides can undergo a variety of changes, such as conservative or non-conservative substitutions, and these changes can provide specific benefits for applications such as improved MHC binding, stability or presentation. have. Conservative substitution means replacing an amino acid residue with another biologically and/or chemically similar amino acid residue, for example, one hydrophobic residue for another, or one polar residue for another. Substitutions are Gly, Ala; Val, Ile, Leu, Met; Asp, Glu; Asn, Gln; Ser, Thr; Lys, Arg; And combinations such as Phe and Tyr. The effect of single amino acid substitutions can also be probed using D-amino acids. Such modifications can be made using known peptide synthesis procedures, for example, as described below: Merrifield, Science 232: 341-347 (1986), Barany & Merrifield, Peptide, Gross & Meienhofer, eds. (NY, Academic Press), pp. 1-284 (1979); And Stewart & Young, Solid Peptide Synthesis, (Rockford, Ill., Pierce), 2d Ed. (1984).

다양한 아미노산 모방체 또는 비천연 아미노산을 갖는 펩타이드 및 폴리펩타이드의 변형은 생체 내에서 펩타이드 및 폴리펩타이드의 안정성을 증가시키는데 특히 유용할 수 있다. 안정성은 수많은 방법으로 분석될 수 있다. 예를 들어, 펩티다아제 및 인간 혈장 및 혈청과 같은 다양한 생물학적 배지가 안정성 테스트에 사용되어왔다. 예를 들어, 하기를 참조한다: Verhoef 등, Eur. J. Drug Metab Pharmacokin. 11: 291-302 (1986).펩타이드의 반감기는 25% 인간 혈청(v/v) 분석법을 사용하여 편리하게 결정될 수 있다. 프로토콜은 일반적으로 다음과 같다. 풀링된 인간 혈청(유형 AB, 비-열 불활성화된)은 사용 전에 원심 분리에 의해 탈지된다. RPMI 조직 배양 배지에 의해 혈청이 25%로 희석되고, 펩타이드 안정성을 시험하는데 사용하였다. 예정된 시간 간격으로 소량의 반응 용액이 제거되고, 6% 수성 트리클로로아세트산 또는 에탄올에 첨가된다. 흐린 반응 샘플이 15분 동안 냉각(4℃)된 다음, 침전된 혈청 단백질이 펠렛으로 스핀된다. 이어서, 펩타이드의 존재는 안정성-특이적인 크로마토그래피 조건을 사용하는 역상 HPLC에 의해 결정된다. 　Modifications of peptides and polypeptides with various amino acid mimetics or non-natural amino acids can be particularly useful for increasing the stability of peptides and polypeptides in vivo. Stability can be analyzed in a number of ways. For example, various biological media such as peptidase and human plasma and serum have been used for stability testing. See, for example, Verhoef et al., Eur. J. Drug Metab Pharmacokin. 11: 291-302 (1986). The half-life of a peptide can be conveniently determined using a 25% human serum (v/v) assay. The protocol is generally as follows. Pooled human serum (type AB, non-thermal inactivated) is degreased by centrifugation prior to use. Serum was diluted to 25% with RPMI tissue culture medium and used to test peptide stability. A small amount of the reaction solution is removed at scheduled time intervals and added to 6% aqueous trichloroacetic acid or ethanol. The cloudy reaction sample is cooled (4° C.) for 15 minutes, and then the precipitated serum protein is spun into pellets. The presence of the peptide is then determined by reverse phase HPLC using stability-specific chromatography conditions.

펩타이드 및 폴리펩타이드는 개선된 혈청 반감기 이외의 원하는 속성을 제공하도록 변형될 수 있다. 예를 들어, CTL 활성을 유도하는 펩타이드의 능력은 T 헬퍼 세포 반응을 유도할 수 있는 적어도 하나의 에피토프를 함유하는 서열로의 결합에 의해 강화될 수 있다. 면역원성 펩타이드/T 헬퍼 접합체는 스페이서 분자에 의해 연결될 수 있다. 스페이서는 통상 생리적 조건하에 실질적으로 충전되지 않은 비교적 작고 중성인 분자, 예컨대 아미노산 또는 아미노산 모방체로 구성된다. 스페이서는 통상 예를 들어, 하기로부터 선택된다: Ala, Gly, 또는 무극성 아미노산 또는 중성 극성 아미노산의 기타 중성 스페이서.임의로 존재하는 스페이서는 동일한 잔기로 구성될 필요는 없으며, 따라서 헤테로- 또는 호모-올리고머일 수 있는 것으로 이해될 것이다. 존재하는 경우, 스페이서는 일반적으로 적어도 1 또는 2개의 잔기, 보다 일반적으로 3 내지 6개의 잔기 일 것이다. 대안적으로, 펩타이드는 스페이서없이 T 헬퍼 펩타이드에 연결될 수 있다. 　Peptides and polypeptides can be modified to provide desired properties other than improved serum half-life. For example, the ability of a peptide to induce CTL activity can be enhanced by binding to a sequence containing at least one epitope capable of inducing a T helper cell response. The immunogenic peptide/T helper conjugates can be linked by spacer molecules. Spacers are usually composed of relatively small, neutral molecules that are not substantially filled under physiological conditions, such as amino acids or amino acid mimetics. The spacer is usually selected from, for example, the following: Ala, Gly, or other neutral spacers of apolar amino acids or neutral polar amino acids. Arbitrarily present spacers need not consist of the same residues, and thus hetero- or homo-oligomeryl It will be understood as possible. If present, the spacer will generally be at least 1 or 2 residues, more typically 3 to 6 residues. Alternatively, the peptide can be linked to the T helper peptide without spacers.

신생항원성 펩타이드는 직접적으로 또는 펩타이드의 아미노 또는 카르복시 말단에서 스페이서를 통해 T 헬퍼 펩타이드에 연결될 수 있다. 신생항원성 펩타이드 또는 T 헬퍼 펩타이드의 아미노 말단은 아실화될 수 있다. 예시적인 T 헬퍼 펩타이드는 테타누스독소증 변성독소 830-843, 인플루엔자 307-319, 말라리아 시르쿰스포로조이테(malaria circumsporozoite) 382-398 및 378-389를 포함한다. 　The neoantigenic peptide can be linked to the T helper peptide either directly or through a spacer at the amino or carboxy terminus of the peptide. The amino terminus of the neoantigenic peptide or T helper peptide can be acylated. Exemplary T helper peptides include tetanus toxinosis denatured toxin 830-843, influenza 307-319, malaria circumsporozoite 382-398 and 378-389.

단백질 또는 펩타이드는 표준 분자 생물학적 기술을 통한 단백질, 폴리펩타이드 또는 펩타이드의 발현, 천연 원천으로부터의 단백질 또는 펩타이드의 단리, 또는 단백질 또는 펩타이드의 화학적 합성을 포함하는 당해 분야의 숙련가에게 공지된 임의의 기술로 제조될 수 있다. 다양한 유전자에 상응하는 뉴클레오타이드 및 단백질, 폴리펩타이드 및 펩타이드 서열은 이전에 개시되어 있으며, 당해 분야의 숙련가에게 공지된 컴퓨터화된 데이터베이스에서 발견될 수 있다. 그러한 데이터베이스 중 하나는 국립 보건원 웹사이트에 있는 미국 국립생물공학정보센터의 Genbank 및 GenPept 데이터베이스이다. 공지된 유전자에 대한 암호화 영역은 본원에 개시된 기술을 사용하여, 또는 당해 분야의 숙련가에게 공지된 바와 같이 증폭 및/또는 발현될 수 있다. 대안적으로, 단백질, 폴리펩타이드 및 펩타이드의 다양한 상업적 제제가 당해 분야의 숙련가에게 공지되어 있다. 　Proteins or peptides are any technique known to those skilled in the art, including expression of proteins, polypeptides or peptides through standard molecular biological techniques, isolation of proteins or peptides from natural sources, or chemical synthesis of proteins or peptides. Can be manufactured. Nucleotide and protein, polypeptide and peptide sequences corresponding to various genes have been previously disclosed and can be found in computerized databases known to those skilled in the art. One such database is the Genbank and GenPept databases of the National Center for Biotechnology Information on the National Institutes of Health website. The coding region for a known gene can be amplified and/or expressed using techniques disclosed herein, or as known to those skilled in the art. Alternatively, various commercial preparations of proteins, polypeptides and peptides are known to those skilled in the art.

추가의 양태에서 신생항원은 신생항원성 펩타이드 또는 그의 일부를 암호화하는 핵산(예를 들어, 폴리뉴클레오타이드)을 포함한다. 폴리뉴클레오타이드는 예를 들어, 하기일 수 있으며: DNA, cDNA, PNA, CNA, RNA (예를 들어, mRNA), 단일-가닥 및/또는 이중-가닥, 또는 천연 또는 안정화된 형태의 폴리뉴클레오타이드, 예컨대 예를 들어 포스포로티에이트 백본을 갖는 폴리뉴클레오타이드 또는 이들의 조합, 그리고 인트론을 포함하거나, 포함하지 않을 수도 있다. 또 추가의 양태는 폴리펩타이드 또는 그의 일부를 발현할 수 있는 발현 벡터를 제공한다. 상이한 세포 유형에 대한 발현 벡터는 당해 분야에 잘 알려져 있으며, 과도한 실험과정없이 선택될 수 있다. 일반적으로, DNA는 플라스미드와 같은 발현 벡터에 적절한 배향으로 및 발현을 위한 정확한 해독틀로 삽입된다. 필요하면, DNA는 원하는 숙주에 의해 인식되는 적절한 전사 및 번역 조절 제어 뉴클레오타이드 서열에 연결될 수 있지만, 이러한 제어는 일반적으로 발현 벡터에서 이용 가능하다. 그런 다음 벡터는 표준 기술을 통해 숙주에 도입된다. 안내는 예를 들어 하기에서 찾아볼 수 있다: Sambrook 등(1989) 분자 클로닝, 연구실 매뉴얼, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.　In a further aspect, the neoantigen comprises a nucleic acid (eg, polynucleotide) that encodes a neoantigenic peptide or portion thereof. The polynucleotide can be, for example, DNA: cDNA, PNA, CNA, RNA (e.g., mRNA), single-stranded and/or double-stranded, or polynucleotides in natural or stabilized form, such as It may or may not include, for example, polynucleotides having a phosphoroate backbone or combinations thereof, and introns. Another aspect provides an expression vector capable of expressing a polypeptide or a portion thereof. Expression vectors for different cell types are well known in the art and can be selected without undue experimentation. Generally, the DNA is inserted into an expression vector, such as a plasmid, in the proper orientation and into the correct reading frame for expression. If desired, the DNA can be linked to appropriate transcriptional and translational regulatory control nucleotide sequences recognized by the desired host, but such control is generally available in expression vectors. The vector is then introduced into the host through standard techniques. Guidance can be found, for example, in the following: Sambrook et al. (1989) Molecular Cloning, Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

V. 백신 조성물V. Vaccine composition

또한, 특정한 면역 반응, 예를 들어, 종양-특이적 면역 반응을 일으킬 수 있는 면역원성 조성물, 예를 들어, 백신 조성물이 본원에 개시된다. 백신 조성물은 통상, 예를 들어 본원에 기재된 방법을 사용하여 선택된 복수의 신생항원을 포함한다. 백신 조성물은 또한 백신이라고 지칭될 수 있다. Also disclosed herein are immunogenic compositions, e.g., vaccine compositions, capable of eliciting a specific immune response, e.g., a tumor-specific immune response. The vaccine composition usually comprises a plurality of neoantigens selected using, for example, the methods described herein. Vaccine compositions can also be referred to as vaccines.

백신은 1 내지 30개의 펩타이드, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 또는 30개의 상이한 펩타이드, 6, 7, 8, 9, 10 11, 12, 13, 또는 14개의 상이한 펩타이드, 또는 12, 13 또는 14개의 상이한 펩타이드를 함유할 수 있다　펩타이드는 번역후 변형을 포함할 수 있다. 백신은 1 내지 100개 또는 그 이상의 뉴클레오타이드 서열, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100개, 또는 그 이상의 상이한 뉴클레오타이드 서열, 6, 7, 8, 9, 10 11, 12, 13, 또는 14개의 상이한 뉴클레오타이드 서열, 또는 12, 13, 또는 14개의 상이한 뉴클레오타이드 서열을 함유할 수 있다. 백신은 1 내지 30개의 신생항원 서열, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100개 또는 그 이상의 상이한 신생항원 서열, 6, 7, 8, 9, 10 11, 12, 13, 또는 14개의 상이한 신생항원 서열, 또는 12, 13, 또는 14개의 상이한 신생항원 서열을 함유할 수 있다. Vaccines include 1 to 30 peptides, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 different peptides, 6, 7, 8, 9, 10 11, 12, 13, or 14 different peptides, or 12, 13 or 14 different peptides The peptide may contain post-translational modifications. Vaccines include 1 to 100 or more nucleotide sequences, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more different nucleotide sequences, 6, 7, 8, 9, 10 11, 12, 13, or 14 different nucleotide sequences, or 12, 13, or 14 different nucleotide sequences Sequence. Vaccines include 1 to 30 neoantigen sequences, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 or more different neoantigen sequences, 6, 7, 8, 9, 10 11, 12, 13, or 14 different neoantigen sequences, or 12, 13, or 14 different neoantigen sequences Sequence.

일 구현예에서, 펩타이드 및/또는 폴리펩타이드가 상이한 MHC 분자, 예컨대 상이한 MHC 부류 I 분자 및/또는 상이한 MHC 부류 II 분자와 결합할 수 있도록, 상이한 펩타이드 및/또는 폴리펩타이드 또는 이들을 암호화하는 뉴클레오타이드 서열이 선택된다. 일부 양태에서, 하나의 백신 조성물은 가장 빈번하게 발생하는 MHC 부류 I 분자 및/또는 MHC 부류 II 분자와 결합할 수 있는 펩타이드 및/또는 폴리펩타이드에 대한 암호화 서열을 포함한다. 따라서, 백신 조성물은 적어도 2개의 바람직한, 적어도 3개의 바람직한, 또는 적어도 4개의 바람직한 MHC 부류 I 분자 및/또는 MHC 부류 II 분자와 결합할 수 있는 상이한 단편을 포함할 수 있다. 　In one embodiment, different peptides and/or polypeptides or nucleotide sequences encoding them, such that the peptides and/or polypeptides can bind different MHC molecules, such as different MHC class I molecules and/or different MHC class II molecules, Is selected. In some embodiments, one vaccine composition comprises coding sequences for peptides and/or polypeptides capable of binding to the most frequently occurring MHC class I molecules and/or MHC class II molecules. Accordingly, the vaccine composition may include different fragments capable of binding at least two preferred, at least three preferred, or at least four preferred MHC class I molecules and/or MHC class II molecules.

백신 조성물은 특이적인 세포독성 T-세포 반응 및/또는 특이적인 헬퍼 T-세포 반응을 일으킬 수 있다. 　The vaccine composition can elicit a specific cytotoxic T-cell response and/or a specific helper T-cell response.

백신 조성물은 아쥬반트 및/또는 담체를 추가로 포함할 수 있다. 유용한 아쥬반트 및 담체의 예는 하기에 주어져 있다. 조성물은 담체, 예컨대 예를 들어, 단백질 또는 항원-제시 세포, 예컨대 예를 들어 T-세포에 펩타이드를 제시할 수 있는 수지상 세포(DC)와 결합될 수 있다. 　The vaccine composition may further include an adjuvant and/or carrier. Examples of useful adjuvants and carriers are given below. The composition can be combined with a dendritic cell (DC) capable of presenting a peptide to a carrier, eg, a protein or antigen-presenting cell, such as a T-cell.

아쥬반트는 백신 조성물과 혼합하여 신생항원에 대한 면역 반응을 증가시키거나 그렇지 않으면 변경시키는 임의의 물질이다. 담체는 스캐폴드 구조, 예를 들어 신생항원이 결합될 수 있는 폴리펩타이드 또는 다당류일 수 있다. 선택적으로, 아쥬반트는 공유결합 또는 비공유결합된다. 　Adjuvants are any substance that, when combined with a vaccine composition, increases or otherwise alters the immune response to a neoantigen. The carrier can be a scaffold structure, for example a polypeptide or polysaccharide to which a neoantigen can be bound. Optionally, the adjuvant is covalently or non-covalently.

항원에 대한 면역 반응을 증가시키는 아쥬반트의 능력은 통상 면역-매개된 반응의 상당한 또는 실질적인 증가, 또는 질환 증상의 감소에 의해 나타난다. 예를 들어 체액성 면역의 증가는 통상, 항원에 대해 상승된 항체의 역가가 유의미하게 증가함으로써 나타나며, T-세포 활성의 증가는 통상, 증가된 세포증식 또는 세포성 세포독성 또는 사이토카인 분비에서 나타난다. 아쥬반트는 또한 예를 들어 주로 체액성 또는 Th 반응을 주로 세포성 또는 Th 반응으로 변화시킴으로써 면역 반응을 변화시킬 수 있다. 　The adjuvant's ability to increase the immune response to an antigen is usually indicated by a significant or substantial increase in the immune-mediated response, or a reduction in disease symptoms. For example, an increase in humoral immunity is usually indicated by a significant increase in the titer of an antibody raised against an antigen, and an increase in T-cell activity is usually indicated by increased cell proliferation or cellular cytotoxicity or cytokine secretion. . Adjuvants can also alter the immune response, for example, primarily by changing the humoral or Th response to primarily a cellular or Th response.

적합한 아쥬반트는 1018 ISS, 명반, 알루미늄 염, Amplivax, AS15, BCG, CP-870,893, CpG7909, CyaA, dSLIM, GM-CSF, IC30, IC31, 이미퀴모드(Imiquimod), ImuFact IMP321, IS 패치, ISS, ISCOMATRIX, JuvImmune, LipoVac, MF59, 모노포스포릴 지질 A, 몬타나이드(Montanide) IMS 1312, 몬타나이드 ISA 206, 몬타나이드 ISA 50V, 몬타나이드 ISA-51, OK-432, OM-174, OM-197-MP-EC, ONTAK, PepTel 벡터 시스템, PLG 극미립자, 레시퀴모드(resiquimod), SRL172, 바이로좀(Virosomes) 및 기타 바이러스-유사 입자, YF-17D, VEGF 트랩, R848, 베타-글루칸, Pam3Cys, Aquila's QS21 stimulon(Aquila Biotech, Worcester, Mass.,USA)(사포닌, 마이코박테리아 추출물 및 합성 박테리아 세포벽 모방체, 다른 전매 아쥬반트, 예컨대 Ribi's Detox로부터 유래됨).Quil 또는 Superfos.불완전 프로인트 또는 GM-CSF와 같은 아쥬반트가 유용하다. 여러 면역학적 아쥬반트(예를 들어,MF59)(수지상 세포에 특이적임) 및 이들의 제조는 이전에 기재되어 있다(Dupuis M, 등,세포 면역학1998; 186(1): 18-27; Allison A C; Dev Biol Stand.1998; 92: 3-11).또한 사이토카인이 사용될 수 있다. 몇 개의 사이토카인(예를 들어, TNF-알파)은 직접 연결되어, 림프구 조직으로의 수지상 세포 이동에 영향을 미치며, 수지상 세포의 T-림프구에 대한 효율적인 항원-제시 세포로의 성숙을 촉진시키며(예를 들어, GM-CSF, IL-1 및 IL-4)(미국 특허 제5,849,589호, 특히 그 전체가 참고문헌으로 본원에 통합됨) 및 면역 아쥬반트로서 작용한다(예를 들어, IL-12)(Gabrilovich D I, 등,J Immunother Emphasis Tumor Immunol.1996(6): 414-418).　Suitable adjuvants are 1018 ISS, alum, aluminum salt, Amplivax, AS15, BCG, CP-870,893, CpG7909, CyaA, dSLIM, GM-CSF, IC30, IC31, Imiquimod, ImuFact IMP321, IS patch, ISS , ISCOMATRIX, JuvImmune, LipoVac, MF59, Monophosphoryl Lipid A, Montanide IMS 1312, Montanide ISA 206, Montanide ISA 50V, Montanide ISA-51, OK-432, OM-174, OM-197 -MP-EC, ONTAK, PepTel vector system, PLG microparticles, resiquimod, SRL172, Virosomes and other virus-like particles, YF-17D, VEGF trap, R848, beta-glucan, Pam3Cys, Aquila's QS21 stimulon (Aquila Biotech, Worcester, Mass., USA) (saponins, mycobacterial extracts and synthetic bacterial cell wall mimics, derived from other proprietary adjuvants such as Ribi's Detox).Quil or Superfos.Incomplete Freund or Adjuvants such as GM-CSF are useful. Several immunological adjuvants (e.g., MF59) (specific for dendritic cells) and their preparation have been previously described (Dupuis M, et al., Cellular Immunology 1998; 186(1): 18-27; Allison AC ; Dev Biol Stand. 1998; 92: 3-11). Cytokines can also be used. Several cytokines (e.g., TNF-alpha) are directly linked, affecting the migration of dendritic cells to lymphocyte tissue, promoting the maturation of dendritic cells into efficient antigen-presenting cells for T-lymphocytes ( For example, GM-CSF, IL-1 and IL-4) (U.S. Patent No. 5,849,589, in particular incorporated herein by reference in its entirety) and acts as an immune adjuvant (e.g., IL-12) (Gabrilovich DI, et al., J Immunother Emphasis Tumor Immunol. 1996(6): 414-418).

CpG 면역자극성 올리고뉴클레오타이드는 또한 백신 환경에서 아쥬반트의 효과를 향상시키는 것으로 보고되었다. RNA 결합 TLR 7, TLR 8 및/또는 TLR 9와 같은 다른 TLR 결합 분자가 또한 사용될 수 있다. 　CpG immunostimulatory oligonucleotides have also been reported to enhance the effectiveness of adjuvants in a vaccine environment. Other TLR binding molecules such as RNA binding TLR 7, TLR 8 and/or TLR 9 can also be used.

유용한 아쥬반트의 다른 예는 비제한적으로 하기를 포함한다: 화학적으로 변형된 CpGs(예를 들어,CpR, Idera), 폴리(I: C)(예를 들어, polyi: CI2U), 비-CpG 박테리아 DNA 또는 RNA 뿐만 아니라 면역활성소분자 및 항체, 예컨대 사이클로포스파미드(cyclophosphamide), 수니티닙(sunitinib), 베바시주맙(bevacizumab), 셀레브렉스(Celebrex), NCX-4016, 실데나필(sildenafil), 타달라필(tadalafil), 바르데나필(vavardenafil), 소라피닙(sorafinib), XL-999, CP-547632, 파조파닙(pazopanib), ZD2171, AZD2171, 이필리무맙(ipilimumab), 트레멜리무맙(tremelimumab) 및 SC58175(이들은 치료제 및/또는 아쥬반트로서 작용할 수 있음)아쥬반트 및 첨가제의 양 및 농도는 과도한 실험과정없이 숙련가에 의해 용이하게 결정될 수 있다. 추가의 아쥬반트는 콜로니-자극 인자, 예컨대 과립구 대식세포 집락 자극 인자[GM-CSF, 사르그라모스팀(sargramostim)]를 포함한다. 　Other examples of useful adjuvants include, but are not limited to: chemically modified CpGs (e.g. CpR, Idera), poly(I: C) (e.g. polyi: CI2U), non-CpG bacteria DNA or RNA as well as immunoactive molecules and antibodies such as cyclophosphamide, sunitinib, bevacizumab, Celebrex, NCX-4016, sildenafil, tadalafil Tadalafil, vavardenafil, sorafinib, XL-999, CP-547632, pazopanib, ZD2171, AZD2171, ipilimumab, tremelimumab And SC58175 (they can act as therapeutic agents and/or adjuvants) The amount and concentration of adjuvants and additives can be readily determined by the skilled person without undue experimentation. Additional adjuvants include colony-stimulating factors, such as granulocyte macrophage colony stimulating factor (GM-CSF, sargramostim).

백신 조성물은 하나 이상의 상이한 아쥬반트를 포함할 수 있다. 또한, 치료 조성물은 상기의 임의의 것 또는 이들의 조합을 포함하는 임의의 아쥬반트 보조물질을 포함할 수 있다. 백신 및 아쥬반트는 함께 또는 임의의 적절한 순서로 개별적으로 투여될 수 있는 것으로 고려된다. 　The vaccine composition can include one or more different adjuvants. In addition, the therapeutic composition may include any adjuvant adjuvant, including any of the above or combinations thereof. It is contemplated that the vaccine and adjuvant can be administered together or separately in any suitable order.

담체(또는 부형제)는 아쥬반트와 독립적으로 존재할 수 있다. 담체의 기능은 예를 들어, 활성 또는 면역원성을 증가시키고, 안정성을 부여하거나, 생물학적 활성을 증가시키거나, 또는 혈청 반감기를 증가시키기 위해 돌연변이체의 분자량을 증가시키는 것일 수 있다. 또한, 담체는 펩타이드를 T-세포에 제시하는 것을 도울 수 있다. 담체는 당해 기술의 숙련가에게 공지된 임의의 적합한 담체, 예를 들어 단백질 또는 항원 제시 세포일 수 있다. 담체 단백질은 키홀 림펫 헤모시아닌, 혈청 단백질, 예컨대 트랜스페린, 소 혈청 알부민, 인간 혈청 알부민, 티로글로불린 또는 난백알부민, 면역글로불린, 또는 호르몬, 예컨대 인슐린 또는 팔미트산일 수 있다. 인간의 면역화를 위해, 담체는 일반적으로 인간에게 허용가능하고 안전한, 생리적으로 허용가능한 담체이다. 그러나, 테타누스독소증 변성독소 및/또는 디프테리아 독소가 적합한 담체이다. 대안적으로, 담체는 덱스트란, 예를 들어 세파로오스일 수 있다. 　The carrier (or excipient) can be present independently of the adjuvant. The function of the carrier can be, for example, to increase the molecular weight of the mutant to increase activity or immunogenicity, confer stability, increase biological activity, or increase serum half-life. In addition, the carrier can help present the peptide to the T-cell. The carrier can be any suitable carrier known to those skilled in the art, such as proteins or antigen presenting cells. The carrier protein can be a keyhole limpet hemocyanin, a serum protein such as transferrin, bovine serum albumin, human serum albumin, tyroglobulin or egg white albumin, immunoglobulin, or a hormone such as insulin or palmitic acid. For human immunization, carriers are generally acceptable and safe, physiologically acceptable carriers for humans. However, tetanus toxinosis denatured toxin and/or diphtheria toxin are suitable carriers. Alternatively, the carrier can be dextran, for example sepharose.

세포 독성 T-세포(CTL)는 온전한 외래 항원 자체보다는 MHC 분자에 결합된 펩타이드의 형태로 항원을 인식한다. MHC 분자 자체는 항원 제시 세포의 세포 표면에 위치한다. 따라서, 펩타이드 항원, MHC 분자 및 APC의 삼량체 복합체가 존재한다면 CTL의 활성화가 가능하다. 그에 상응하여, CTL의 활성화에 펩타이드가 사용될뿐만 아니라, 추가로 각각의 MHC 분자를 갖는 APC가 첨가되는 경우 면역 반응을 향상시킬 수 있다. 따라서, 일부 구현예에서, 백신 조성물은 적어도 하나의 항원 제시 세포를 추가로 함유한다. 　Cytotoxic T-cells (CTLs) recognize antigens in the form of peptides bound to MHC molecules rather than intact foreign antigens themselves. The MHC molecule itself is located on the cell surface of the antigen presenting cell. Thus, activation of CTL is possible if a trimeric complex of peptide antigen, MHC molecule and APC is present. Correspondingly, not only peptides are used for activation of CTLs, but also can enhance the immune response when APCs with respective MHC molecules are added. Thus, in some embodiments, the vaccine composition further contains at least one antigen presenting cell.

신생항원은 또한 바이러스 벡터-기반 백신 플랫폼, 예컨대 백시니아(vaccinia), 계두(fowlpox), 자기-복제 알파바이러스, 마라바바이러스, 아데노바이러스 [예를 들어, Tatsis 등, 아데노바이러스, Molecular Therapy (2004) 10, 616--629를 참고하라], 또는 특정한 세포 유형 또는 수용체를 표적으로 하도록 설계된 임의의 세대의 제2, 제3 또는 하이브리드 제2/제3 세대 렌티바이러스 및 재조합 렌티바이러스를 포함하지만 이에 한정되지 않는 렌티바이러스(예를 들어, Hu 등, 암 및 전염병에 대한 렌티바이러스 벡터에 의해 전달된 면역화, Immunol Rev. (2011) 239(1): 45-61, Sakuma 등,렌티바이러스 벡터: 기본에서 번역으로, Biochem J.(2012) 443(3): 603-18, Cooper 등, 스플라이싱-매개된 인트론 손실의 구조는 인간 유비퀴틴 C 프로모터를 함유하는 렌티바이러스 벡터에서의 발현을 최대화한다, Nucl.Acids Res.(2015) 43(1): 682-690, Zufferey 등, 안전하고 효율적인 생체내 유전자 전달을 위한 자가-불활성화 렌티바이러스 벡터, J. Virol .(1998) 72(12): 9873-9880)에 포함될 수 있다. 상기 언급된 바이러스 벡터-기반 백신 플랫폼의 패키징 용량에 의존적으로, 이 접근법은 하나 이상의 신생항원 펩타이드를 암호화하는 하나 이상의 뉴클레오타이드 서열을 전달할 수 있다. 상기 서열은 돌연변이가 없는 서열이 측접할 수 있고, 링커에 의해 분리될 수 있거나, 세포하 구획을 표적으로 하는 하나 이상의 서열이 선행될 수 있다 [예를 들어, Gros 등, 흑색종 환자의 말초 혈액에서 신생항원-특이적 림프구의 유망한 동정, Nat Med .(2016) 22(4): 433-8, Stronen 등, 공여체-유래된 T 세포 수용체 레퍼토리를 가진 암 신생항원의 표적화, Science.(2016) 352(6291): 1337-41, Lu et al, 내구성 종양 퇴화와 관련된 T 세포에 의해 인식되는 돌연변이된 암 항원의 효율적인 동정, Clin Cancer Res .(2014) 20(13): 3401-10 참조]. 숙주 내로 도입되면, 감염된 세포는 신생항원을 발현하여 숙주 면역(예를 들어, CTL) 반응을 펩타이드(들)에 대하여 유도하였다. 면역화 프로토콜에 유용한 백시니아 벡터 및 방법은 예를 들어, 미국 특허 제4,722,848호에 기재되어 있다. 또 다른 벡터는 BCG(Bacille Calmette Guerin)이다. BCG 벡터는 Stover 등 [Nature 351: 456-460(1991)]에 기재되어 있다. 신생항원의 치료적 투여 또는 면역화에 유용한 다양한 다른 백신 벡터, 예를 들어, 살모넬라 타이피 벡터 등은 본원의 설명으로부터 당해 분야의 숙련가에게 분명할 것이다. Neoantigens are also viral vector-based vaccine platforms, such as vaccinia, fowlpox, self-replicating alphaviruses , marabaviruses, adenoviruses [ eg, Tatsis et al., adenoviruses, Molecular Therapy (2004) 10, 616--629], or any generation of second, third or hybrid second/third generation lentivirus and recombinant lentivirus designed to target specific cell types or receptors. Lentiviruses, including but not limited to (e.g., Hu et al., immunization delivered by lentiviral vectors against cancer and epidemics, Immunol Rev. (2011) 239(1): 45-61, Sakuma et al., lentiviruses Vector: From basic to translation, the structure of splicing-mediated intron loss, Biochem J. (2012) 443(3): 603-18, Cooper et al., expresses expression in lentiviral vectors containing the human ubiquitin C promoter. Maximize, Nucl.Acids Res. (2015) 43(1): 682-690, Zufferey et al . , a self-inactivating lentiviral vector for safe and efficient gene delivery in vivo, J. Virol . (1998) 72(12 ): 9873-9880). Depending on the packaging capacity of the aforementioned viral vector-based vaccine platform, this approach can deliver one or more nucleotide sequences encoding one or more neoantigenic peptides. The sequence can be flanked by mutation-free sequences, separated by a linker, or preceded by one or more sequences targeting the subcellular compartment [eg, peripheral blood of melanoma patients, such as Gros et al. Promising Identification of Neoantigen-specific Lymphocytes, Nat Med . (2016) 22(4): 433-8, Stronen et al., Targeting of cancer neoantigens with donor-derived T cell receptor repertoire, Science. (2016) 352(6291): 1337-41, Lu et al, Efficient Identification of Mutant Cancer Antigens Recognized by T Cells Associated with Durable Tumor Degeneration, Clin Cancer Res . (2014) 20(13): 3401-10]. When introduced into the host, the infected cell expresses a neoantigen to induce a host immunity (eg, CTL) response against the peptide(s). Vaccinia vectors and methods useful for immunization protocols are described, for example, in US Pat. No. 4,722,848. Another vector is BCG (Bacille Calmette Guerin). BCG vectors are described in Stover et al. [Nature 351: 456-460 (1991)]. Various other vaccine vectors useful for therapeutic administration or immunization of neoantigens, such as Salmonella typhi vectors, etc., will be apparent to those skilled in the art from the description herein.

V.A. 신생항원 카세트V.A. New antigen cassette

하나 이상의 신생 항원의 선택, "카세트"의 클로닝 및 작제 및 바이러스 벡터 내로의 삽입에 사용된 방법은 본원에 제공된 교시를 고려하여 당 업계의 기술 내이다. "신생항원 카세트"는 선택된 신생항원 또는 복수의 신생항원 및 신생항원(들)을 전사하고 전사된 생성물을 발현시키는데 필요한 다른 조절 요소의 조합을 의미한다. 신생항원 또는 복수의 신생항원은 전사를 허용하는 방식으로 조절 성분에 작동 가능하게 연결될 수 있다.　이러한 성분은 바이러스 벡터로 형질 감염된 세포에서 신생항원(들)의 발현을 유도할 수 있는 통상적인 조절 요소를 포함한다.　따라서, 신생항원　카세트는 또한 신생항원(들)에 연결되고 재조합 벡터의 선택된 바이러스 서열 내에 다른, 선택적 조절 요소와 함께 위치하는 선택된 프로모터를 함유할 수 있다.　The methods used for the selection of one or more neoantigens, cloning and construction of “cassettes” and insertion into viral vectors are within the skill of the art in view of the teachings provided herein. “Neoantigen cassette” means a combination of a selected neoantigen or a plurality of neoantigens and other regulatory elements necessary to transcribe the neoantigen(s) and express the transcribed product. The neoantigen or plurality of neoantigens can be operably linked to regulatory components in a manner that allows transcription. These components include conventional regulatory elements that can induce the expression of neoantigen(s) in cells transfected with a viral vector. Thus, the neoantigen-cassette may also contain a selected promoter that is linked to the neoantigen(s) and is located with other, selective regulatory elements within the selected viral sequence of the recombinant vector.

유용한 프로모터는 발현되는 신생항원(들)의 양을 제어할 수 있게 하는, 구성적 프로모터 또는 조절 (유도성) 프로모터일 수 있다. 예를 들어, 바람직한 프로모터는 사이토메갈로바이러스 즉시 초기 프로모터/인핸서의 프로모터이다 [예를 들어, Boshart et al, Cell, 41:521-530 (1985) 참조]. 다른 바람직한 프로모터는 라우스 육종 바이러스 LTR 프로모터/인핸서를 포함한다. 또 다른 프로모터/인핸서 서열은 닭 세포질 베타-액틴 프로모터이다 [T. A. Kost et al, Nucl. Acids Res., 11(23):8287 (1983)]. 다른 적합한 또는 바람직한 프로모터는 당업자에 의해 선택될 수 있다.　Useful promoters can be constitutive promoters or regulatory (inducible) promoters that allow control of the amount of neoantigen(s) expressed. For example, the preferred promoter is the promoter of the cytomegalovirus immediate early promoter/enhancer (see, eg, Boshart et al, Cell, 41:521-530 (1985)). Other preferred promoters include the Rous Sarcoma Virus LTR promoter/enhancer. Another promoter/enhancer sequence is the chicken cytoplasm beta-actin promoter [T. A. Kost et al, Nucl. Acids Res., 11(23):8287 (1983)]. Other suitable or preferred promoters can be selected by those skilled in the art.

신생항원 카세트는 또한 기능적 스플라이스 공여체 및 수용체 부위를 가지는 전사체 (폴리-A 또는 pA) 및 인트론의 효율적 폴리아데닐화를 위한 신호를 제공하는 서열을 포함하는 바이러스 벡터 서열에 이종성인 핵산 서열을 포함할 수 있다. 본 발명의 예시적인 벡터에 사용되는 일반적인 폴리-A 서열은　파포바 바이러스　SV-40으로부터 유래한다. 폴리-A 서열은 일반적으로 신생항원 기반 서열에 이어서 바이러스 벡터 서열 전에 카세트에 삽입될 수 있다. 공통 인트론 서열은 또한 SV-40으로부터 유래될 수 있고, SV-40 T 인트론 서열로 지칭된다. 신생항원 카세트는 또한 프로모터/인핸서 서열과 신생항원(들) 사이에 위치한 이러한 인트론을 함유할 수 있다.　이들 및 다른 공통 벡터 요소의 선택은 통상적이고 [예를 들어, Sambrook et al, "Molecular Cloning. A Laboratory Manual.", 2d edit., Cold Spring Harbor Laboratory, New York (1989) 및 그에 인용된 참고 문헌 참조] 많은 이러한 서열은 Genbank 뿐만 아니라 상업 및 산업 공급원으로부터 입수 가능하다.　The neoantigen cassette also includes a nucleic acid sequence that is heterologous to a viral vector sequence comprising a sequence that provides a signal for efficient polyadenylation of transcripts (poly-A or pA) with functional splice donor and acceptor sites and introns. can do. The general poly-A sequence used in the exemplary vectors of the invention is derived from “Papava virus” SV-40. The poly-A sequence can generally be inserted into a cassette prior to the viral vector sequence following the neoantigen based sequence. The consensus intron sequence can also be derived from SV-40 and is referred to as the SV-40 T intron sequence. The neoantigen cassette may also contain such an intron located between the promoter/enhancer sequence and the neoantigen(s). The selection of these and other common vector elements is conventional [see, eg, Sambrook et al, "Molecular Cloning. A Laboratory Manual.", 2d edit., Cold Spring Harbor Laboratory, New York (1989) and references cited therein. See] Many of these sequences are available from Genbank as well as commercial and industrial sources.

신생항원 카세트는 하나 이상의 신생항원을 가질 수 있다. 예를 들어, 주어진 카세트는 1-10, 1-20, 1-30, 10-20, 15-25, 15-20, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 또는 그 이상의 신생항원을 포함할 수 있다. 신생항원은 서로 직접 연결될 수 있다. 신생항원은 또한 링커를 통해 서로 연결될 수 있다. 신생항원은 N 내지 C 또는 C 내지 N을 포함하여 서로에 대해 임의의 배향일 수 있다.The neoantigen cassette can have one or more neoantigens. For example, given cassettes are 1-10, 1-20, 1-30, 10-20, 15-25, 15-20, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 , 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or more. New antigens can be directly linked to each other. Neoantigens can also be linked to each other via a linker. Neoantigens can be in any orientation relative to each other, including N to C or C to N.

상기 언급된 바와 같이, 신생항원 카세트는 바이러스 벡터에서 임의의 선택된 결실 부위, 예를 들어 E1 유전자 영역 결실 또는 E3 유전자 영역 결실 부위에 위치할 수 있고, 다른 것들 중에서 선택될 수 있다.As mentioned above, the neoantigen cassette can be located at any selected deletion site in the viral vector, such as the E1 gene region deletion or E3 gene region deletion site, and can be selected among others.

V.B. 면역 체크포인트V.B. Immunity checkpoint

본원에 기술된 벡터, 예를 들어 본원에 기술된 C68 벡터 또는 본원에 기술된 알파바이러스 벡터는 적어도 하나의 신생항원 및 동일한 것을 암호화하는 핵산을 포함할 수 있거나 별개의 벡터는 면역 체크포인트 분자에 결합하여 활성을 차단하는 적어도 하나의 면역 조절제 (예를 들어, scFv와 같은 항체)를 암호화하는 핵산을 포함할 수 있다. 벡터는 신생항원 카세트 및 체크포인트 억제제를 암호화하는 하나 이상의 핵산 분자를 포함할 수 있다.The vectors described herein, such as the C68 vectors described herein, or the alphavirus vectors described herein, can include at least one neoantigen and a nucleic acid encoding the same, or a separate vector binds an immune checkpoint molecule Thus, it may include a nucleic acid encoding at least one immunomodulator (eg, an antibody such as scFv) that blocks activity. The vector can include a neoantigen cassette and one or more nucleic acid molecules encoding a checkpoint inhibitor.

차단 또는 억제를 목표로 할 수 있는 예시적인 면역 체크포인트 분자는 CTLA-4, 4-1BB (CD137), 4-1BBL (CD137L), PDL1, PDL2, PD1, B7-H3, B7-H4, BTLA, HVEM, TIM3, GAL9, LAG3, TIM3, B7H3, B7H4, VISTA, KIR, 2B4 (분자의 CD2 패밀리에 속하며 모든 NK, γδ, 및 기억 CD8+ (αβ) T 세포 상에 발현됨), CD160 (BY55 로도 지칭됨), 및 CGEN-15049를 포함하지만, 이에 제한되지 않는다. 면역 체크포인트 억제제는 CTLA-4, PDL1, PDL2, PD1, B7-H3, B7-H4, BTLA, HVEM, TIM3, GAL9, LAG3, TIM3, B7H3, B7H4, VISTA, KIR, 2B4, CD160, 및 CGEN-15049 중 하나 이상에 결합하고 이의 활성을 차단 또는 억제하는, 항체, 또는 이의 항원 결합 단편, 또는 다른 결합 단백질을 포함한다. 예시적인 면역 체크포인트 억제제는　트레멜리무맙 (CTLA-4 차단 항체), 항-OX40, PD-L1 단일클론성 항체 (항-B7-H1; MEDI4736), 이필리무맙, MK-3475 (PD-1 차단제), 니볼루맙 (항-PD1 항체), CT-011 (항-PD1 항체), BY55 단일클론성 항체, AMP224 (항-PDL1 항체), BMS-936559 (항-PDL1 항체), MPLDL3280A (항-PDL1 항체), MSB0010718C (항-PDL1 항체) 및 여보이(Yervoy)/이필리무맙　(항-CTLA-4 체크포인트 억제제)을 포함한다. 항체-암호화 서열은 당 업계 기술을 사용하여 C68과 같은 벡터로 조작될 수 있다. 예시적인 방법은 본원에 모든 목적을 위해 참고로 포함된, Fang 등, Stable antibody expression at therapeutic levels using the 2A peptide. Nat Biotechnol.　2005 May;23(5):584-90. Epub 2005 Apr 17에 기술되어 있다.Exemplary immune checkpoint molecules that may target blocking or inhibition include CTLA-4, 4-1BB (CD137), 4-1BBL (CD137L), PDL1, PDL2, PD1, B7-H3, B7-H4, BTLA, HVEM, TIM3, GAL9, LAG3, TIM3, B7H3, B7H4, VISTA, KIR, 2B4 (belonging to the CD2 family of molecules and expressed on all NK, γδ, and memory CD8+ (αβ) T cells), CD160 (also referred to as BY55 ), and CGEN-15049. Immune checkpoint inhibitors are CTLA-4, PDL1, PDL2, PD1, B7-H3, B7-H4, BTLA, HVEM, TIM3, GAL9, LAG3, TIM3, B7H3, B7H4, VISTA, KIR, 2B4, CD160, and CGEN- Antibodies, or antigen-binding fragments thereof, or other binding proteins, that bind to one or more of 15049 and block or inhibit its activity. Exemplary immune checkpoint inhibitors include tremelimumab (CTLA-4 blocking antibody), anti-OX40, PD-L1 monoclonal antibody (anti-B7-H1; MEDI4736), ipilimumab, MK-3475 (PD-1 Blockers), nivolumab (anti-PD1 antibody), CT-011 (anti-PD1 antibody), BY55 monoclonal antibody, AMP224 (anti-PDL1 antibody), BMS-936559 (anti-PDL1 antibody), MPLDL3280A (anti- PDL1 antibody), MSB0010718C (anti-PDL1 antibody) and Yervoy/ipilimumab (anti-CTLA-4 checkpoint inhibitor). Antibody-encoding sequences can be engineered into vectors such as C68 using art techniques. Exemplary methods are incorporated herein by reference for all purposes, such as Fang et al., Stable antibody expression at therapeutic levels using the 2A peptide. Nat Biotechnol. 2005 May;23(5):584-90. Epub 2005 Apr 17.

V.AV.A . 백신 설계 및 . Vaccine design and 제조에 대한 추가의 고려사항들Additional considerations for manufacturing

V.AV.A .1. 모든 종양 .One. All tumors 서브클론을Subclone 커버하는Covered 펩타이드Peptide 세트 결정 Set decision

모든 또는 대부분의 종양 서브클론이 나타내는 몸통(truncal) 펩타이드는 백신에 포함시키기 위해 우선 순위가 결정될 것이다.⁵³ 선택적으로, 높은 확률로 제시되고 면역원성이 예상되는 몸통 펩타이드가 없는 경우, 또는 높은 확률로 제시되고 면역원성이 예상되는 몸통 펩타이드의 수는 추가로 비-몸통 펩타이드가 백신에 포함될 수 있을 정도로 충분히 작은 경우, 그러면 백신에 커버되는 종양 서브클론의 수를 극대화하도록 종양 서브클론의 수와 동일성을 평가하고, 펩타이드를 선택함으로써 펩타이드가 우선순위 부여될 수 있다.⁵⁴ Truncal peptides represented by all or most tumor subclones will be prioritized for inclusion in the vaccine. ⁵³ Optionally, if there are no torso peptides presented with high probability and expected immunogenicity, or the number of torso peptides presented with high probability and expected immunogenicity is sufficient to allow additional non-body peptides to be included in the vaccine. In small cases, the peptide can then be prioritized by evaluating the number and identity of the tumor subclones and maximizing the number of tumor subclones covered by the vaccine. ⁵⁴

V.AV.A .2. 신생항원 우선순위 부여.2. Prioritize new ports

상기 신생항원 필터를 모두 적용한 후에는 백신 기술이 지원할 수 있는 것보다 더 많은 후보 신생항원이 백신접종에 사용될 수 있다. 또한, 신생항원 분석의 다양한 측면에 대한 불확실성이 남아있을 수 있으며, 후보 백신 신생항원의 상이한 특성들 간에 상충 관계가 존재할 수 있다. 따라서, 선택 과정의 각 단계에서 소정의 필터 대신에, 적어도 하기 축을 갖는 공간에 후보 신생항원을 위치시키고, 통합 접근법을 사용하여 선택을 최적화하는 통합된 다중-차원 모델이 고려될 수 있다. After all of the above neoantigen filters are applied, more candidate neoantigens can be used for vaccination than the vaccine technology can support. In addition, uncertainties may remain for various aspects of neoantigen analysis, and there may be trade-offs between different characteristics of candidate vaccine neoantigens. Thus, instead of a given filter at each stage of the selection process, an integrated multi-dimensional model that positions candidate neoantigens in a space with at least the following axes and optimizes the selection using an integrated approach can be considered.

1. 자가면역 또는 내성의 위험(생식 계열의 위험) (자가면역의 위험성이 더 낮은 것이 통상 바람직함)1. Risk of autoimmunity or tolerance (reproductive risk) (it is usually desirable to have a lower risk of autoimmunity)

2. 서열분석 인공물의 확률(인공물 발생 확률이 더 낮은 것이 통상 바람직함)2. Probability of sequencing artefacts (it is usually desirable to have a lower probability of artifacts)

3. 면역원성 확률(면역원성의 확률이 더 높은 것이 통상 바람직함)3. Probability of immunogenicity (higher probability of immunogenicity is usually desirable)

4. 제시 확률(제시의 확률이 더 높은 것이통상 바람직함)4. Probability of presentation (higher probability of presentation is usually desirable)

5. 유전자 발현(더 높은 발현률이 통상 바람직함)5. Gene expression (higher expression rate is usually desirable)

6. HLA 유전자의 적용범위(신생항원 세트의 제시에 관여하는 HLA 분자의 수가 많을수록 종양이 HLA 분자의 하향 조절 또는 돌연변이를 통한 면역 공격을 피할 확률을 낮출 수 있음).6. HLA gene coverage (the higher the number of HLA molecules involved in the presentation of a new antigenic set, the lower the probability that a tumor will avoid immune attacks through down-regulation or mutation of HLA molecules).

7. HLA 부류의 적용 범위 (HLA-I 및 HLA-II를 모두 포함하면 치료 반응의 가능성이 증가하고 종양 탈출의 가능성이 감소할 수 있음)7. HLA class coverage (including both HLA-I and HLA-II may increase the likelihood of treatment response and decrease the likelihood of tumor escape)

Ⅵ. 치료 및 제조 방법Ⅵ. Treatment and manufacturing method

또한, 본원에 개시된 방법을 사용하여 동정된 하나 이상의 신생항원, 예컨대 복수의 신생항원을 대상체에게 투여함으로써, 대상체에서 종양 특이적 면역 반응을 유도하고, 종양에 대해 백신접종하고, 대상체의 암의 증상을 치료 및/또는 경감시키는 방법이 제공된다. 　In addition, by administering to a subject one or more neoantigens identified using the methods disclosed herein, such as a plurality of neoantigens, induce a tumor-specific immune response in the subject, vaccination against the tumor, and symptoms of the subject's cancer Methods of treating and/or alleviating the disease are provided.

일부 양태에서, 대상체는 암으로 진단되었거나 암이 발병할 위험이 있다. 대상체는 인간, 개, 고양이, 말 또는 종양 특이적 면역 반응이 요구되는 임의의 동물일 수 있다. 종양은 임의의 고형 종양, 예컨대 유방, 난소, 전립선, 폐, 신장, 위, 결장, 고환, 두경부, 췌장, 뇌, 흑색종 및 기타 조직기관 종양 및 혈액 종양, 예컨대 림프종 및, 급성 골수성 백혈병, 만성 골수성 백혈병, 만성 림프구성 백혈병, T 세포 림프구성 백혈병 및 B 세포 림프종을 포함하는, 백혈병일 수 있다. 　In some embodiments, the subject has been diagnosed with cancer or is at risk of developing cancer. The subject can be a human, dog, cat, horse or any animal requiring a tumor-specific immune response. Tumors can be any solid tumor, such as breast, ovary, prostate, lung, kidney, stomach, colon, testis, head and neck, pancreas, brain, melanoma and other tissue organ tumors and blood tumors such as lymphoma and acute myeloid leukemia, chronic Can be leukemia, including myeloid leukemia, chronic lymphocytic leukemia, T cell lymphocytic leukemia and B cell lymphoma.

신생항원은 CTL 반응을 유도하기에 충분한 양으로 투여될 수 있다. 　The neoantigen can be administered in an amount sufficient to induce a CTL response.

신생항원은 단독으로 또는 다른 치료제와 조합하여 투여될 수 있다. 치료제는 예를 들어 화학요법제, 방사선 또는 면역요법이다. 특정한 암에 대한 임의의 적합한 치료적 처치가 투여될 수 있다. The neoantigen can be administered alone or in combination with other therapeutic agents. The therapeutic agent is, for example, chemotherapy, radiation or immunotherapy. Any suitable therapeutic treatment for a particular cancer can be administered.

또한, 대상체는 체크포인트 억제제와 같은 항-면역억제성/면역자극성 제제를 추가로 투여받을 수 있다. 예를 들어, 대상체는 항-CTLA 항체 또는 항-PD-1 또는 항-PD-L1을 추가로 투여받을 수 있다. 항체에 의한 CTLA-4 또는 PD-L1의 봉쇄는 환자의 암성 세포에 대한 면역 반응을 향상시킬 수 있다. 특히 CTLA-4 봉쇄는 백신접종 프로토콜을 따르는 경우 효과적인 것으로 나타났다. 　In addition, the subject may be further administered an anti-immunosuppressive/immunostimulatory agent such as a checkpoint inhibitor. For example, the subject can be additionally administered an anti-CTLA antibody or anti-PD-1 or anti-PD-L1. Blockade of CTLA-4 or PD-L1 by the antibody may enhance the immune response of the patient to cancerous cells. In particular, CTLA-4 blockade was shown to be effective when following the vaccination protocol.

백신 조성물에 포함되는 각각의 신생항원의 최적량 및 최적의 투약 요법을 결정할 수 있다. 예를 들어, 신생항원 또는 그것의 변이체는 정맥내(i.v.) 주사, 피하(s.c.) 주사, 진피내(i.d.) 주사, 복강내(i.p.) 주사, 근육내(i.m.) 주사를 위해 제조될 수 있다. 주사 방법은 피하, 진피내, 복강내, 근육내 및 정맥내 주사를 포함한다. DNA 또는 RNA 주사의 방법은 진피내, 근육내, 피하, 복강내 및 정맥내 주사를 포함한다. 백신 조성물의 다른 투여 방법은 당해 분야의 숙련가에게 공지되어 있다. 　The optimal amount and optimal dosage regime of each neoantigen included in the vaccine composition can be determined. For example, a neoantigen or variant thereof can be prepared for intravenous (iv) injection, subcutaneous (sc) injection, intradermal (id) injection, intraperitoneal (ip) injection, or intramuscular (im) injection. . Injection methods include subcutaneous, intradermal, intraperitoneal, intramuscular and intravenous injections. Methods of DNA or RNA injection include intradermal, intramuscular, subcutaneous, intraperitoneal and intravenous injection. Other methods of administration of vaccine compositions are known to those skilled in the art.

본 조성물에 존재하는 신생항원의 선택, 수 및/또는 양이 조직, 암 및/또는 환자-특이적이 되도록 백신이 컴파일링될 수 있다. 예를 들어, 펩타이드의 정확한 선택은 주어진 조직에서 모 단백질의 발현 패턴에 의해 유도될 수 있다. 선택은 암의 특이적 유형, 질환의 상태, 초기 치료 요법, 환자의 면역 상태, 및 물론 환자의 HLA-일배체형에 의존될 수 있다. 더욱이, 백신은 특정한 환자의 개인적 필요에 따라, 개별화된 성분을 함유할 수 있다. 예로는 특정한 환자에서 신생항원 항원의 발현에 따른 신생항원의 선택 또는 1차 치료법 또는 1차 치료 계획에 따른 2차 치료에 대한 조정을 변화시키는 것이 포함된다. 　Vaccines can be compiled such that the selection, number and/or amount of new antigen present in the composition is tissue, cancer and/or patient-specific. For example, the correct selection of peptides can be driven by the expression pattern of the parent protein in a given tissue. The choice can depend on the specific type of cancer, the condition of the disease, the initial treatment regimen, the patient's immune status, and of course the patient's HLA-haplotype. Moreover, the vaccine may contain individualized ingredients, depending on the specific patient's individual needs. Examples include the selection of a neoantigen according to the expression of a neoantigen in a particular patient, or changing the coordination for secondary therapy according to a primary treatment regimen or primary treatment regimen.

조성물을 암 백신으로 사용하기 위해, 정상 조직에서 다량으로 발현되는 유사한 정상적인 자가-펩타이드를 갖는 신생항원은 본원에 기재된 조성물에서 회피되거나 또는 소량으로 존재할 수 있다. 반면에, 환자의 종양이 다량의 특정한 신생항원을 발현한다는 것이 알려지면, 이 암 치료를 위한 약제학적 조성물은 다량으로 존재할 수 있으며, 및/또는 상기 특별히 신생항원을 위해 특이적인 하나의 신생항원 또는 상기 신생항원의 경로가 포함될 수 있다. 　For use with the composition as a cancer vaccine, neoantigens with similar normal self-peptides expressed in large amounts in normal tissue can be avoided or present in small amounts in the compositions described herein. On the other hand, if it is known that the patient's tumor expresses a large amount of a specific neoantigen, the pharmaceutical composition for the treatment of this cancer may be present in a large amount, and/or one neoantigen specifically specific for the above neoantigen or The path of the new antigen may be included.

신생항원을 포함하는 조성물은 이미 암을 앓고 있는 개체에게 투여될 수 있다. 치료적 적용에서, 조성물은 종양 항원에 대한 효과적인 CTL 반응을 유도하고, 증상 및/또는 합병증을 치료하거나 적어도 부분적으로 억제하기에 충분한 양으로 환자에게 투여된다. 이것을 달성하기에 충분한 양은 "치료 유효량"으로 정의된다. 이러한 용도에 효과적인 양은 예를 들어 조성물, 투여 방식, 치료되는 질환의 단계 및 중증도, 환자의 체중 및 일반적인 건강 상태 및 처방 의사의 판단에 좌우될 것이다. 일반적으로 조성물은 생명을 위협하거나 잠재적으로 생명을 위협하는 상황에서, 특히 암이 전이된 경우, 사용될 수 있음을 명심해야 한다. 그와 같은 경우에, 외인성 물질의 최소화 및 신생항원의 상대적 무독성 특성의 관점에서, 치료 의사는 이들 조성물의 실질적인 과량을 투여하는 것이 가능하고 바람직하다고 느낄 수 있다. 　Compositions comprising a neoantigen can be administered to an individual already suffering from cancer. In therapeutic applications, the composition is administered to the patient in an amount sufficient to induce an effective CTL response to the tumor antigen, and to treat or at least partially suppress symptoms and/or complications. An amount sufficient to achieve this is defined as a “therapeutically effective amount”. The amount effective for this use will depend, for example, on the composition, mode of administration, stage and severity of the disease being treated, the patient's body weight and general health condition and the judgment of the prescribing physician. It should be borne in mind that the composition in general can be used in life-threatening or potentially life-threatening situations, especially if the cancer has spread. In such cases, in view of the minimization of exogenous substances and the relative non-toxic nature of the neoantigen, the treating physician may feel that it is possible and desirable to administer a substantial excess of these compositions.

치료 용도를 위해, 투여는 종양의 검출 또는 외과적 제거에서 시작될 수 있다. 그 다음에 적어도 증상이 실질적으로 약화될 때까지 그리고 그 이후의 기간 동안 투여량을 증가시킨다. 　For therapeutic use, administration can begin with the detection or surgical removal of the tumor. The dose is then increased, at least until the symptoms are substantially attenuated and for a period thereafter.

치료적 처치를 위한 약제학적 조성물(예를 들어, 백신 조성물)은 비경구, 국소, 비강, 경구 또는 국소 투여를 위한 것이다. 약제학적 조성물은 비경구로, 예를 들어, 정맥내로, 피하로, 진피내로, 또는 근육내로 투여될 수 있다. 상기 조성물은 종양에 대한 국소 면역 반응을 유도하기 위해 외과적 절제 부위에 투여될 수 있다. 본원에 신생항원의 용액을 포함하는 비경구 투여용 조성물이 개시되어 있으며, 백신 조성물은 허용가능한 담체, 예를 들어 수성 담체에 용해시키거나 현탁된다. 다양한 수성 담체, 예를 들어 물, 완충된 물, 0.9% 염수, 0.3% 글리신, 히알루론산 등이 사용될 수 있다. 이들 조성물은 통상의 잘 알려진 멸균 기술에 의해 멸균될 수 있거나, 멸균 여과될 수 있다. 수득된 수용액은 그대로 사용하기 위해 포장되거나, 동결건조되며, 동결건조된 제제는 투여 전에 무균 용액과 조합된다. 상기 조성물은 생리적 조건을 근사화하는데 필요한 약제학적으로 허용가능한 보조 물질, 예컨대 pH 조절 및 완충제, 긴장성 조절제, 습윤제 등, 예를 들어 아세트산나트륨, 젖산나트륨, 염화나트륨, 염화칼륨, 염화칼슘, 소르비탄 모노라우레이트, 트리에탄올아민 올레이트 등을 함유할 수 있다. 　Pharmaceutical compositions for therapeutic treatment (eg, vaccine compositions) are for parenteral, topical, nasal, oral or topical administration. The pharmaceutical composition can be administered parenterally, for example intravenously, subcutaneously, intradermally, or intramuscularly. The composition can be administered to a surgical resection site to induce a local immune response to the tumor. Disclosed herein is a composition for parenteral administration comprising a solution of a neoantigen, wherein the vaccine composition is dissolved or suspended in an acceptable carrier, such as an aqueous carrier. Various aqueous carriers can be used, for example water, buffered water, 0.9% saline, 0.3% glycine, hyaluronic acid and the like. These compositions can be sterilized by conventional well-known sterilization techniques or can be sterilized by filtration. The obtained aqueous solution is packaged for use as it is, or lyophilized, and the lyophilized formulation is combined with a sterile solution prior to administration. The composition is a pharmaceutically acceptable auxiliary substance necessary to approximate physiological conditions, such as pH adjusting and buffering agent, tonicity adjusting agent, wetting agent, etc., for example, sodium acetate, sodium lactate, sodium chloride, potassium chloride, calcium chloride, sorbitan monolaurate, And triethanolamine oleate.

신생항원은 또한 리포솜을 통해 투여될 수 있으며, 이는 림프양 조직과 같은 특정한 세포 조직을 표적으로 한다. 리포좀은 반감기를 증가시키는 데에도 유용하다. 리포좀은 유제, 발포체, 미셀, 불용성 단일층, 액정, 인지질 분산액, 라멜라층 등을 포함한다. 이들 제제에서, 전달되는 신생항원은 리포좀의 일부로서, 단독으로 또는, 예를 들어, CD45 항원에 결합하는 단일클론성 항체와 같은 림프양 세포 중 만연한 수용체, 또는 다른 치료용 또는 면역원성 조성물과 접합하여 편입된다. 따라서, 원하는 신생항원으로 충전된 리포솜은 림프양 세포의 부위로 유도될 수 있으며, 여기서 리포솜은 선택된 치료적/면역원성 조성물을 전달한다. 리포좀은 일반적으로 중성 및 음전하인 인지질 및 스테롤, 예컨대 콜레스테롤을 포함하는 표준 소포-형성 지질로부터 형성될 수 있다. 지질의 선택은 일반적으로 예를 들어, 리포좀 크기, 산 불안정성 및 혈류내 리포솜의 안정성을 고려하여 유도된다. 리포솜을 제조하기 위해 여러 방법들이 사용될 수 있으며, 예를 들어 Szoka 등, Ann.Rev. Biophys. Bioeng.9; 467(1980), 미국 특허 제4,235,871호, 제4,501,728호, 제4,501,728호, 제4,837,028호, 및 제5,019,369호에 기재되어 있다.　Neoantigens can also be administered via liposomes, which target specific cellular tissue, such as lymphoid tissue. Liposomes are also useful for increasing half-life. Liposomes include emulsions, foams, micelles, insoluble monolayers, liquid crystals, phospholipid dispersions, lamellar layers, and the like. In these agents, the delivered neoantigen is part of the liposome, either alone or conjugated with a prevalent receptor in lymphoid cells such as, for example, a monoclonal antibody that binds to the CD45 antigen, or other therapeutic or immunogenic composition. Is incorporated. Thus, liposomes filled with the desired neoantigen can be directed to the site of lymphoid cells, where the liposomes deliver the selected therapeutic/immunogenic composition. Liposomes can be formed from standard vesicle-forming lipids, including phospholipids and sterols, such as cholesterol, which are generally neutral and negatively charged. The selection of lipids is generally driven by taking into account, for example, liposome size, acid instability and stability of liposomes in the bloodstream. Several methods can be used to prepare liposomes, for example Szoka et al., Ann. Rev. Biophys. Bioeng.9; 467 (1980), U.S. Patent Nos. 4,235,871, 4,501,728, 4,501,728, 4,837,028, and 5,019,369.

면역 세포를 표적화하기 위해, 리포솜에 편입될 리간드는 예를 들어 원하는 면역계 세포의 세포 표면 결정 인자에 특이적인 항체 또는 그의 단편을 포함할 수 있다. 리포솜 현탁액은 특히 투여 방식, 전달되는 펩타이드 및 치료되는 질환의 단계에 따라 달라지는 투여량으로 정맥내, 국부적으로, 국소적으로 투여될 수 있다. 치료 또는 면역화 목적을 위해, 펩타이드를 암호화하는 핵산 및 임의로는 본원에 기재된 하나 이상의 펩타이드가 환자에게 투여될 수 있다. 핵산을 환자에게 전달하기 위해 다수의 방법이 편리하게 사용된다. 예를 들어, 핵산은 "네이키드(naked) DNA"로 직접 전달될 수 있다. 이 접근법은 예를 들어, Wolff 등, Science 247: 1465-1468(1990), 및 미국 특허 제5,580,859호 및 제5,589,466호에 기술되어 있다. 핵산은 또한 예를 들어, 미국 특허 제5,204,253호에 기술된 바와 같이, 탄도전달(ballistic delivery)을 사용하여 투여될 수 있다. DNA만으로 구성된 입자가 투여될 수 있다. 대안적으로, DNA는 금 입자와 같은 입자에 부착될 수 있다. 　핵산 서열을 전달하기 위한 접근법은 바이러스 벡터, mRNA 벡터, 및 전기천공이 있거나 없는 DNA 벡터를 포함할 수 있다. To target immune cells, the ligand to be incorporated into the liposome can include, for example, an antibody or fragment thereof specific for the cell surface determinant of a desired immune system cell. Liposomal suspensions can be administered intravenously, topically, or topically, particularly at dosages that vary depending on the mode of administration, the peptide being delivered, and the stage of disease being treated. For therapeutic or immunization purposes, a nucleic acid encoding a peptide and optionally one or more peptides described herein can be administered to a patient. A number of methods are conveniently used to deliver nucleic acids to patients. For example, nucleic acids can be delivered directly to "naked DNA". This approach is described, for example, in Wolff et al., Science 247: 1465-1468 (1990), and US Pat. Nos. 5,580,859 and 5,589,466. Nucleic acids can also be administered using ballistic delivery, for example, as described in US Pat. No. 5,204,253. Particles consisting only of DNA can be administered. Alternatively, DNA can be attached to particles such as gold particles. Approaches for delivering nucleic acid sequences can include viral vectors, mRNA vectors, and DNA vectors with or without electroporation.

핵산은 또한 양이온성 지질과 같은 양이온성 화합물과 복합체화되어 전달될 수 있다. 지질-매개된 유전자 전달 방법은 예를 들어, 하기에 기재되어 있다: 9618372WOAWO 96/18372; 9324640WOAWO 93/24640; Mannino & Gould-Fogerite, BioTechniques 6(7): 682-691(1988); 미국 특허 제5,279,833호 Rose 미국 특허 제5,279,833호; 9106309WOAWO 91/06309; 및 Felgner 등, Proc. Natl. Acad. Sci. USA 84: 7413-7414(1987).　Nucleic acids can also be delivered in complex with cationic compounds such as cationic lipids. Methods for lipid-mediated gene delivery are described, for example, below: 9618372WOAWO 96/18372; 9324640WOAWO 93/24640; Mannino & Gould-Fogerite, BioTechniques 6(7): 682-691 (1988); US Patent 5,279,833 Rose US Patent 5,279,833; 9106309WOAWO 91/06309; And Felgner et al., Proc. Natl. Acad. Sci. USA 84: 7413-7414 (1987).

신생항원은 또한 바이러스 벡터-기반 백신 플랫폼, 예컨대 백시니아, 계두, 자기-복제 알파바이러스, 마라바바이러스, 아데노바이러스 [(예를 들어, Tatsis 등, 아데노바이러스, Molecular Therapy (2004) 10, 616--629)을 참고하라], 또는 특정한 세포 유형 또는 수용체를 표적으로 하도록 설계된 임의의 세대의 제2, 제3 또는 하이브리드 제2/제3 세대 렌티바이러스 및 재조합 렌티바이러스를 포함하지만 이에 한정되지 않는 렌티바이러스 [예를 들어, Hu 등, 암 및 전염병에 대한 렌티바이러스 벡터에 의해 전달된 면역화, Immunol Rev.(2011) 239(1): 45-61, Sakuma 등, 렌티바이러스 벡터: 기본에서 번역으로, Biochem J.(2012) 443(3): 603-18, Cooper 등, 스플라이싱-매개된 인트론 손실의 구조는 인간 유비퀴틴 C 프로모터를 함유하는 렌티바이러스 벡터에서의 발현을 최대화한다, Nucl . Acids Res .(2015) 43(1): 682-690, Zufferey 등, 안전하고 효율적인 생체내 유전자 전달을 위한 자가-불활성화 렌티바이러스 벡터, J. Virol .(1998) 72(12): 9873-9880]에 포함될 수 있다. 상기 언급된 바이러스 벡터-기반 백신 플랫폼의 패키징 용량에 의존적으로, 이 접근법은 하나 이상의 신생항원 펩타이드를 암호화하는 하나 이상의 뉴클레오타이드 서열을 전달할 수 있다. 상기 서열은 돌연변이가 없는 서열이 측접할 수 있고, 링커에 의해 분리될 수 있거나, 세포하 구획을 표적으로 하는 하나 이상의 서열이 선행될 수 있다 [예를 들어, Gros 등,흑색종 환자의 말초 혈액에서 신생항원-특이적 림프구의 유망한 동정, Nat Med.(2016) 22(4): 433-8, Stronen 등, 공여체-유래된 T 세포 수용체 레퍼토리를 가진 암 신생항원의 표적화, Science . (2016) 352(6291): 1337-41, Lu et al, 내구성 종양 퇴화와 관련된 T 세포에 의해 인식되는 돌연변이된 암 항원의 효율적인 동정, Clin Cancer Res.(2014) 20(13): 3401-10 참조]. 숙주 내로 도입되면, 감염된 세포는 신생항원을 발현하여 숙주 면역(예를 들어, CTL) 반응을 펩타이드(들)에 대하여 유도하였다. 면역화 프로토콜에 유용한 백시니아 벡터 및 방법은 예를 들어, 미국 특허 제4,722,848호에 기재되어 있다. 또 다른 벡터는 BCG(Bacille Calmette Guerin)이다. BCG 벡터는 Stover 등 [Nature 351: 456-460(1991)]에 기재되어 있다. 신생항원의 치료적 투여 또는 면역화에 유용한 다양한 다른 백신 벡터, 예를 들어, 살모넬라 타이피 벡터 등은 본원의 설명으로부터 당해 분야의 숙련가에게 분명할 것이다. 　Neoantigens are also viral vector-based vaccine platforms such as vaccinia, fowlpox, self-replicating alphaviruses , marabaviruses, adenoviruses [( eg, Tatsis et al., adenoviruses, Molecular Therapy (2004) 10, 616- -629), or any generation of second, third or hybrid second/third generation lentiviruses and recombinant lentiviruses designed to target specific cell types or receptors, including but not limited to. Virus [e.g., Hu et al., Immunization delivered by lentiviral vectors against cancer and epidemics, Immunol Rev. (2011) 239(1): 45-61, Sakuma et al., lentiviral vectors: from basic to translation, Biochem J. (2012) 443(3): 603-18, Cooper et al., The structure of splicing-mediated intron loss maximizes expression in lentiviral vectors containing the human ubiquitin C promoter, Nucl . Acids Res . (2015) 43(1): 682-690, Zufferey et al . , a self-inactivating lentiviral vector for safe and efficient gene delivery in vivo, J. Virol . (1998) 72(12): 9873-9880. Depending on the packaging capacity of the aforementioned viral vector-based vaccine platform, this approach can deliver one or more nucleotide sequences encoding one or more neoantigenic peptides. The sequence can be flanked by mutation-free sequences, separated by a linker, or preceded by one or more sequences targeting the subcellular compartment [eg, peripheral blood of melanoma patients, such as Gros et al. Promising Identification of Neoantigen-specific Lymphocytes in Nat Med. (2016) 22(4): 433-8, Stronen et al., Targeting of cancer neoantigens with donor-derived T cell receptor repertoire, Science . (2016) 352(6291): 1337-41, Lu et al, Efficient Identification of Mutant Cancer Antigens Recognized by T Cells Associated with Durable Tumor Degeneration, Clin Cancer Res. (2014) 20(13): 3401-10]. When introduced into the host, the infected cell expresses a neoantigen to induce a host immunity (eg, CTL) response against the peptide(s). Vaccinia vectors and methods useful for immunization protocols are described, for example, in US Pat. No. 4,722,848. Another vector is BCG (Bacille Calmette Guerin). BCG vectors are described in Stover et al. [Nature 351: 456-460 (1991)]. Various other vaccine vectors useful for therapeutic administration or immunization of neoantigens, such as Salmonella typhi vectors, etc., will be apparent to those skilled in the art from the description herein.

핵산을 투여하는 수단은 하나 또는 다수의 에피토프를 암호화하는 미니유전자(minigene) 작제물을 사용한다. 인간 세포에서의 발현을 위해 선택된 CTL 에피토프(미니유전자)를 암호화하는 DNA 서열을 생성하기 위해, 에피토프의 아미노산 서열은 역번역된다. 인간 코돈 사용법 표는 각각의 아미노산에 대한 코돈 선택을 안내하는데 사용된다. 이들 에피토프-암호화 DNA 서열은 직접 인접하여, 연속 폴리펩타이드 서열을 생성한다. 발현 및/또는 면역원성을 최적화하기 위해, 추가 요소가 미니유전자 디자인에 편입될 수 있다. 역번역되고 미니유전자 서열에 포함될 수 있는 아미노산 서열의 예는 헬퍼 T 림프구, 에피토프, 리더(신호) 서열 및 소포체 보유 신호를 포함한다. 또한, CTL 에피토프의 MHC 제시는 CTL 에피토프에 인접한 합성(예를 들어, 폴리-알라닌) 또는 자연 발생 측접 서열을 포함시킴으로써 개선될 수 있다. 　미니유전자 서열은 미니유전자의 플러스 및 마이너스 가닥을 암호화하는 올리고뉴클레오타이드를 조립하여 DNA로 전환된다. 중첩된 올리고뉴클레오타이드(30-100 염기 길이)가 합성되고, 인산화되고, 정제되고, 공지된 기술을 사용하여 적절한 조건하에 어닐링된다. 올리고뉴클레오타이드의 단부는 T4 DNA 리가제를 사용하여 연결된다. CTL 에피토프 폴리펩타이드를 암호화하는 이 합성 미니유전자는 원하는 발현 벡터로 클로닝될 수 있다. 　Means for administering nucleic acids use minigene constructs that encode one or more epitopes. To generate a DNA sequence encoding a CTL epitope (minigene) selected for expression in human cells, the amino acid sequence of the epitope is reverse translated. The human codon usage table is used to guide codon selection for each amino acid. These epitope-encoding DNA sequences are directly contiguous, resulting in a continuous polypeptide sequence. To optimize expression and/or immunogenicity, additional elements can be incorporated into the minigene design. Examples of amino acid sequences that can be reverse translated and included in the minigene sequence include helper T lymphocytes, epitopes, leader (signal) sequences and vesicle retention signals. In addition, the MHC presentation of a CTL epitope can be improved by including synthetic (eg, poly-alanine) or naturally occurring flanking sequences adjacent to the CTL epitope. The minigene sequence is converted into DNA by assembling oligonucleotides encoding the plus and minus strands of the minigene. Overlapping oligonucleotides (30-100 bases in length) are synthesized, phosphorylated, purified, and annealed under appropriate conditions using known techniques. The ends of the oligonucleotides are linked using T4 DNA ligase. This synthetic minigene encoding the CTL epitope polypeptide can be cloned into a desired expression vector.

정제된 플라스미드 DNA는 다양한 제형을 사용하여 주사를 위해 제조될 수 있다. 이들의 가장 간단한 방법은 멸균된 인산염-완충 식염수(PBS)에서 동결건조된 DNA를 재구성하는 것이다. 다양한 방법이 기재되었으며, 새로운 기술이 이용가능해질 수 있다. 전술한 바와 같이, 핵산은 양이온성 지질로 편리하게 제형화된다. 또한, 당지질, 융합유도(fusogenic) 리포좀, 펩타이드 및, 보호성, 상호작용, 비-응축(PINC)으로 총칭되는 화합물은 정제된 플라스미드 DNA와 복합체화되어 안정성, 근육내 분산 또는 특이적인 기관 또는 세포 유형에 대한 이송(trafficking)과 같은 변수에 영향을 줄 수 있다. 　Purified plasmid DNA can be prepared for injection using a variety of formulations. Their simplest method is to reconstitute lyophilized DNA in sterile phosphate-buffered saline (PBS). Various methods have been described, and new technologies can be made available. As described above, nucleic acids are conveniently formulated with cationic lipids. In addition, glycolipids, fusogenic liposomes, peptides, and compounds collectively referred to as protective, interactive, non-condensing (PINC) are complexed with purified plasmid DNA to stabilize, intramuscularly disperse, or specific organs or cells. It can affect variables such as type trafficking.

또한, 본원에 개시된 방법의 단계를 수행하는 단계; 및 복수의 신생항원 또는 상기 복수의 신생항원의 서브셋을 포함하는 종양 백신을 생산하는 단계를 포함하는 종양 백신의 제조 방법이 개시된다. In addition, performing the steps of the methods disclosed herein; And producing a tumor vaccine comprising a plurality of neoantigens or a subset of the plurality of neoantigens.

본원에 개시된 신생항원은 당해 분야에 공지된 방법을 사용하여 제조될 수 있다. 예를 들어, 본원에 개시된 신생항원 또는 벡터(예를 들어, 하나 이상의 신생항원을 암호화하는 적어도 하나의 서열을 포함하는 벡터)를 생산하는 방법은 신생항원 또는 벡터를 발현시키기에 적합한 조건하에 숙주 세포를 배양하는 단계로서, 상기 숙주세포가 신생항원 또는 벡터를 암호화하는 적어도 하나의 폴리뉴클레오타이드를 포함하는 단계, 및 신생항원 또는 벡터를 정제하는 단계를 포함할 수 있다. 표준 정제 방법에는 크로마토그래피 기술, 전기영동, 면역학, 침전, 투석, 여과, 농축 및 크로마토포커싱 기술이 포함된다. The neoantigens disclosed herein can be prepared using methods known in the art. For example, a method of producing a neoantigen or vector disclosed herein (e.g., a vector comprising at least one sequence encoding one or more neoantigens) can be used in a host cell under conditions suitable for expressing the neoantigen or vector. As a step of culturing, the host cell may include the step of containing at least one polynucleotide encoding a neoantigen or vector, and the step of purifying the neoantigen or vector. Standard purification methods include chromatographic techniques, electrophoresis, immunology, precipitation, dialysis, filtration, concentration and chromatographic focusing techniques.

숙주 세포에는 중국 햄스터 난소(CHO) 세포, NS0 세포, 효모 또는 HEK293 세포가 포함될 수 있다. 숙주 세포는 본원에 개시된 신생항원 또는 벡터를 암호화하는 적어도 하나의 핵산 서열을 포함하는 하나 이상의 폴리뉴클레오타이드로 형질전환될 수 있으며, 임의로, 단리된 폴리뉴클레오타이드는 신생항원 또는 벡터를 암호화하는 적어도 하나의 핵산 서열에 작동가능하게 연결된 프로모터 서열을 추가로 포함한다. 특정한 구현예에서, 단리된 폴리뉴클레오타이드는 cDNA일 수 있다. Host cells may include Chinese hamster ovary (CHO) cells, NS0 cells, yeast or HEK293 cells. The host cell can be transformed with one or more polynucleotides comprising at least one nucleic acid sequence encoding a neoantigen or vector disclosed herein, optionally, the isolated polynucleotide encoding at least one nucleic acid encoding a neoantigen or vector. A promoter sequence operably linked to the sequence is further included. In certain embodiments, the isolated polynucleotide can be cDNA.

Ⅶ. 신생항원 동정Ⅶ. New Port Sympathy

ⅦⅦ .A.A . 신생항원 후보 동정.. Identification of new antigen candidates.

종양과 정상 엑솜 및 전사체들의 NGS 분석을 위한 연구 방법은 신생항원 동정 공간에서 기재되고 적용되었다.⁶ ^,14,15 아래의 예는 임상 환경에서 신생항원 동정에 대한 민감도와 특이도를 높이기 위한 특정한 최적화를 고려한다. 이러한 최적화는 실험실 프로세스와 관련된 영역 및 NGS 데이터 분석과 관련된 영역의 두 영역으로 그룹화될 수 있다. Research methods for NGS analysis of tumors, normal exomes and transcripts have been described and applied in the neoantigen identification space. ^{^6, 14 and 15} The following example considers the specific optimization to improve the sensitivity and specificity of the new antigens identified in a clinical setting. These optimizations can be grouped into two areas: areas related to laboratory processes and areas related to NGS data analysis.

ⅦⅦ .A.A .1. 실험실 프로세스 최적화.One. Lab process optimization

이 프로세스 개선은 표적화된 암 패널¹⁶에서 신뢰할 만한 암 드라이버 유전자 평가를 위해 개발된 개념을 확장하여 종양 함량이 낮고 용적이 적은 임상 시료로부터, 신생항원 동정에 필요한 전체- 엑솜 및 -전사체 설정까지, 고-정확도 신생항원 발견에 대한 과제를 다룬다. 특히, 이러한 개선 사항은 하기의 것들을 포함한다: This process improvement extends the concept developed for reliable cancer driver gene evaluation in Targeted Cancer Panel ¹⁶ , from clinical samples with low tumor content and low volume, to full-exome and -transcript settings required for the identification of new antigens. It deals with the task of discovering high-accuracy new antigens. In particular, these improvements include:

1. 낮은 종양 함량 또는 하위클론 상태로 인해 낮은 돌연변이체 대립유전자에 존재하는 돌연변이를 검출하기 위해 종양 엑솜에 걸친 깊은 (> 500×) 특유의 평균 적용범위를 표적화한다. 1. Target deep (>500×) distinct average coverage across tumor exomes to detect mutations present in low mutant alleles due to low tumor content or subclone status.

2. <100×에서 커버된 염기의 5% 미만을 가진 종양 엑솜에 걸친 균일한 적용범위를 표적화하여 가능한 최소한의 신생항원을 놓치며, 예를 들어: 2. Targeting a uniform coverage across tumor exomes with less than 5% of the base covered at <100× misses the least possible neoantigen, eg:

a. 개별 프로브 QC로 DNA-기반 포획 프로브 사용¹⁷ a. Using DNA-based capture probes as individual probe QCs ¹⁷

b. 저조하게 커버된 영역에 대한 추가의 유인물질 포함 b. Including additional attractants for poorly covered areas

3. 정상적인 엑솜에서 균일한 적용범위를 표적화하며, 염기의 5% 미만이 <20×에서 커버되어, 가장 적은 신생항원이 체세포/생식 계열 상태에 대해 분류되지 않은 채로 남아있을 수 있음(및 따라서 TSNA로는 사용할 수 없음)3. Targets a uniform coverage in normal exomes and less than 5% of bases are covered at <20×, so that the fewest neoantigens may remain unsorted for somatic/germline status (and thus TSNA Can not be used)

4. 요구되는 서열분석의 총량을 최소화하기 위해, 서열 포획 프로브는 유전자의 암호화 영역에 대해서만 설계될 것이며, 비-암호화 RNA는 신생항원을 생성시킬 수 없다. 추가의 최적화는 하기의 것들을 포함한다: 4. To minimize the total amount of sequencing required, sequence capture probes will be designed only for the coding region of the gene, and non-encoding RNA cannot produce a neoantigen. Additional optimizations include:

a. GC-풍부하고 표준 엑솜 서열분석으로는 잘 포착되지 않는, HLA 유전자에 대한 보충 프로브¹⁸ a. Supplementary probe for HLA gene, GC-rich and not well captured by standard exome sequencing ¹⁸

b. 불충분한 발현, 프로테아솜에 의한 차선의 소화 또는 비정상적인 서열 특징과 같은 인자로 인해 후보 신생항원을 거의 또는 전혀 생성하지 않을 것으로 예상되는 유전자의 배제. b. Exclusion of genes expected to produce little or no candidate neoantigens due to factors such as insufficient expression, suboptimal digestion by proteasomes, or abnormal sequence characteristics.

5. 종양 RNA는 변이형 검출, 유전자 및 스플라이스 변이체("동형체") 발현의 정량화 및 융합 검출을 가능하게 하기 위해 높은 깊이(> 100M 판독)에서 마찬가지로 서열분석될 것이다. FFPE 샘플의 RNA는 DNA의 엑솜을 포획하는데 사용되는 프로브와 동일하거나 유사한 프로브를 갖는 프로브-기반 농축물을 사용하여 추출될 것이다.¹⁹ 5. Tumor RNA will likewise be sequenced at high depths (>100M read) to enable variant detection, quantification of gene and splice variant (“homolog”) expression and fusion detection. The RNA of the FFPE sample will be extracted using a probe-based concentrate with the same or similar probe used to capture the exome of the DNA. ¹⁹

Ⅶ.A.2.NGS 데이터 분석 최적화A.A.2.NGS data analysis optimization

분석 방법의 개선은 일반적인 연구 돌연변이 결정 접근법의 차선의 민감도와 특이성을 다루며, 구체적으로 임상 환경에서 신생항원 동정과 관련된 맞춤화를 고려한다. 이들은 하기를 포함한다: Improvements in the analytical method address suboptimal sensitivity and specificity of the general research mutant determination approach, specifically taking into account customizations related to the identification of neoantigens in the clinical setting. These include:

1. HG38 참조 인간 게놈 또는 이후 버전의 정렬을 사용하여, 다중 MHC 영역 어셈블리가 포함되어 있으므로 이전 게놈 릴리스와 대조적으로 모집단 다형성을 더 잘 반영한다. 1. Using the alignment of the HG38 reference human genome or later versions, multiple MHC region assemblies are included to better reflect population polymorphism as opposed to previous genomic releases.

2. 상이한 프로그램의 결과를 병합하여 단일 변이 결정²⁰의 한계 극복.⁵ 2. Overcoming the limitations of single mutation decision ²⁰ by merging the results of different programs. ⁵

a. 단일 뉴클레오타이드 변이 및 인델은 종양 DNA, 종양 RNA 및 정상 DNA에서 하기를 포함하는 도구 모음을 통해 검출될 것이다: Strelka²¹ 및 Mutect²²와 같은 종양 및 정상 DNA의 비교를 기반으로 한 프로그램; 및 종양 DNA, 종양 RNA 및 UNCeqR과 같은 정상 DNA를 포함시키는 프로그램을 포함하며, 이는 저-순도 샘플²³에서 특히 유리하다. a. Single nucleotide variations and indels will be detected through a collection of tools in tumor DNA, tumor RNA, and normal DNA, including: programs based on comparison of tumor and normal DNA such as Strelka ²¹ and Mutect ²² ; And programs that include normal DNA such as tumor DNA, tumor RNA and UNCeqR, which is particularly advantageous in low-purity sample ²³ .

b. Indrel은 Strelka 및 ABRA²⁴와 같은 국부 재-조립을 수행하는 프로그램으로 결정될 것이다. b. Indrel will be determined as a program that performs local re-assembly such as Strelka and ABRA ²⁴ .

c. 구조 재배열은 Pindel²⁵ 또는 Breakseq²⁶과 같은 전용 도구를 사용하여 결정될 것이다. c. Structural rearrangements will be determined using dedicated tools such as Pindel ²⁵ or Breakseq ²⁶ .

3. 샘플 교환을 감지하고 방지하기 위해, 동일한 환자의 샘플에서 변이 결정이 선택된 다형성 부위 수와 비교될 것이다. 3. To detect and prevent sample exchange, variation determinations in samples from the same patient will be compared to the number of polymorphic sites selected.

4. 예를 들어 하기와 같은 방법으로 인공물질의 결정을 광범위하게 필터링할 것이다: 4. We will filter artificial crystals extensively, for example:

a. 낮은 적용범위의 경우 완화된 검출 파라미터로 잠재적으로 정상 DNA에서 발견된 변이의 제거 및 인델의 경우 허용되는 근접성 기준으로 제거 a. Elimination of variances potentially found in normal DNA with relaxed detection parameters for low coverage and elimination based on acceptable proximity for indels

b. 낮은 맵핑 품질 또는 낮은 기본 품질로 인해 변이 제거²⁷.b. Eliminate mutations due to low mapping quality or low base quality ²⁷ .

c. 상응하는 정상에서 관찰되지 않더라도 반복적인 서열분석 인공물로 인한 변이 제거²⁷. 예로는 주로 한 가닥 상에서 검출된 변이를 포함한다. c. Eliminate mutations due to repetitive sequencing artifacts even if not observed in the corresponding normal ²⁷ . Examples mainly include mutations detected on one strand.

d. 관련없는 대조군 세트에서 감지된 변이 제거²⁷.d. Eliminate mutations detected in unrelated control sets ²⁷ .

5. seq2HLA²⁸, ATHLATES²⁹ 또는 Optitype 중 하나를 사용하고, 엑솜과 RNA 서열분석 데이터를 조합하여 정상 엑솜에서 정확한 HLA 결정.²⁸ 추가의 잠재적인 최적화로는 장시간-판독 DNA 서열분석과 같은 HLA 타이핑을 위한 전용 분석의 채택³⁰, 또는 연속성을 유지하기 위해 RNA 단편을 결합하는 방법의 조정³¹이 포함된다. 5. Accurate HLA determination in normal exomes using either seq2HLA ²⁸ , ATHLATES ²⁹ or Optitype, and combining exome and RNA sequencing data. As a potential optimization of ²⁸ is added for a long time - ³¹ includes the adjustment of a method for combining the RNA fragments to maintain the adoption of dedicated analysis ^30, or continuity for HLA typing such as reading DNA sequencing.

6. 종양 특이적인 스플라이스 변이에서 발생하는 신생 ORF의 강력한 검출은 CLASS³², Bayesembler³³, StringTie³⁴ 또는 유사 프로그램을 그의 참조-지침 모드로 사용하여 (즉, 각 실험에서 그의 전부에서 전사체를 재작성하려는 시도가 아니라 알려진 전사체 구조를 사용하여) RNA-서열 분석 데이터에서 전사체를 조합하여 수행될 것이다. 이 목적을 위해 Cufflinks³⁵가 일반적으로 사용되지만, 흔히 믿기 어려울 정도의 많은 수의 스플라이스 변이체를 생성하며, 대다수가 전장 유전자보다 훨씬 짧으며, 간단한 양성 대조군을 복구하지 못할 수 있다. 암호화 서열 및 논센스-매개된 붕괴 가능성은 SpliceR³⁶ 및 MAMBA³⁷와 같은 도구를 사용하여 측정될 것이며, 돌연변이체 서열이 재-도입된다. 유전자 발현은 Cufflinks 또는 Express(Roberts 및 Pachter, 2013)와 같은 도구로 측정될 것이다³⁵. 야생형 및 돌연변이체-특이적인 발현 양 및/또는 상대 수준은 ASE³⁸ 또는 HTSeq³⁹와 같이 이러한 목적을 위해 개발된 도구로 측정될 것이다. 잠재적인 필터링 단계는 하기의 것들을 포함한다: 6. Strong detection of new ORFs that occur in tumor-specific splice variants can be achieved by reusing CLASS ³² , Bayesembler ³³ , StringTie ³⁴ or similar programs in their reference-guided mode (ie, transcripts in all of them in each experiment). This will be done by combining transcripts in RNA-sequencing data (using known transcript structures, rather than attempting to create). Cufflinks ³⁵ are commonly used for this purpose, but often produce an incredibly large number of splice variants, many of which are much shorter than full-length genes, and may not be able to recover simple positive controls. The coding sequence and nonsense-mediated decay potential will be measured using tools such as SpliceR ³⁶ and MAMBA ^37, and the mutant sequence is re-introduced. Gene expression will be measured by tools such as Cufflinks or Express (Roberts and Pachter, 2013) ³⁵ . Wild type and mutant-specific expression amounts and/or relative levels will be determined with tools developed for this purpose, such as ASE ³⁸ or HTSeq ³⁹ . Potential filtering steps include:

a. 불충분하게 발현된 것으로 간주되는 후보 신생 ORF의 제거. a. Elimination of candidate new ORFs deemed insufficiently expressed.

b. 논센스-매개된 붕괴(NMD)를 유발할 것으로 예상되는 후보 신생 ORF의 제거. b. Elimination of candidate new ORFs that are expected to cause nonsense-mediated collapse (NMD).

7. 종양-특이적으로 직접 확인될 수 없는 RNA (예를 들어, 신생 ORF)에서만 관찰되는 후보 신생항원은 추가의 파라미터에 따라, 예를 들어 하기를 고려하여 종양-특이적일 가능성이 높은 것으로 분류될 것이다: 7. Candidate neoantigens observed only on RNA that cannot be directly identified as tumor-specific (eg, a new ORF) are classified as likely to be tumor-specific according to additional parameters, for example, considering the following Will be:

a. 종양 DNA-단독 시스-작용 프레임 이동 또는 스플라이스-부위 돌연변이를 지지하는 것의 존재 a. The presence of tumor DNA-only cis-acting frame shifts or supporting splice-site mutations

b. 스플라이싱 인자에서 종양 DNA-단독 트랜스-작용 돌연변이 확증의 존재.예를 들어, R625-돌연변이체 SF3B1을 이용한 독립적으로 발표된 3건의 실험에서, 하나의 실험에서 포도막 흑색종 환자⁴⁰, 두 번째 포도막 흑색종 세포주⁴¹ 및 세 번째 유방암 환자⁴²를 검사했지만, 가장 차별적인 스플라이싱을 나타내는 유전자는 일치했다. b. Presence of tumor DNA-only trans-acting mutation confirmation in splicing factors, e.g., in 3 independently published experiments with R625-mutant SF3B1, in one experiment ⁴⁰ patients with uveal melanoma, 2nd uveal membrane The melanoma cell line ⁴¹ and the third breast cancer patient ⁴² were examined, but the genes that showed the most differential splicing were consistent.

c. 신규한 스플라이싱 동형체의 경우, RNASeq 데이터에서 확증된 "신규한" 스플라이스-접합 판독의 존재. c. For new splicing isoforms, the presence of a “new” splice-conjugated read confirmed in RNASeq data.

d. 새로운 재조합의 경우, 정상 DNA에 없는 종양 DNA내 확증하는 juxta-엑손 판독의 존재. d. In the case of new recombination, the presence of corroborating juxta-exon readings in tumor DNA not found in normal DNA.

e. GTEx⁴³과 같은 유전자 발현 개요의 부재(즉, 생식 계열 기원의 가능성을 낮추는 것)e. Absence of gene expression outlines such as GTEx ⁴³ (ie, reducing the likelihood of germline origin)

8. 조립된 DNA 종양과 정상 판독(또는 그러한 판독으로부터의 k-량체)을 직접 비교하여 정렬 및 주석 기반 오류 및 인공물을 피함으로써 참조 게놈 정렬-기반 분석을 보완(예를 들어, 생식 계열 변이체 또는 반복-컨텍스트 인델 근처에서 발생하는 체세포 변이).8. Complement the reference genome alignment-based analysis (e.g., germline variants or) by directly comparing the assembled DNA tumor with normal reads (or k-mers from such reads) to avoid alignment and annotation-based errors and artifacts. Repeat-context somatic mutations that occur near the indel).

폴리-아데닐화 RNA가 있는 샘플에서, RNA-서열 분석 데이터의 바이러스 및 미생물 RNA의 존재 여부는 환자반응을 예측할 수 있는 추가 요인을 확인하기 위해 RNA CoMPASS⁴⁴ 또는 유사한 방법을 사용하여 평가될 것이다. In samples with poly-adenylated RNA, the presence of viral and microbial RNA in RNA-sequencing data will be assessed using RNA CoMPASS ⁴⁴ or a similar method to identify additional factors that can predict patient response.

ⅦⅦ .B.B .. HLAHLA 펩타이드의Peptide 분리 및 검출 Separation and detection

HLA-펩타이드 분자의 단리는 조직 샘플의 용해 및 가용화 후에 통상적인 면역침강(IP) 방법을 사용하여 수행하였다⁵⁵ ^-58. 정화된 용해물을 HLA 특이적 IP로 사용하였다. Isolation of HLA-peptide molecules was performed using conventional immunoprecipitation (IP) methods after dissolution and solubilization of tissue samples ⁵⁵ ^-58 . The clarified lysate was used as HLA specific IP.

면역침강은 항체가 HLA 분자에 특이적인 비드에 커플링된 항체를 사용하여 수행하였다. 범-부류(pan-Class) I HLA 면역침강의 경우, 범-부류 I CR 항체가 사용되며, 부류 Ⅱ HLA-DR의 경우, HLA-DR 항체가 사용된다. 항체를 밤새 배양하면서 NHS-세파로스 비드에 공유결합시킨다. 공유결합 후, 비드를 세정하고 IP에 대해 분주하였다.^59, ⁶⁰ 면역침강은 또한 비드에 공유적으로 부착되지 않는 항체로 수행될 수 있다. 일반적으로 이는 단백질 A 및/또는 단백질 G로 코딩된 세파로스 또는 자기 비드를 사용하여 수행되어 항체를 컬럼에 고정시킨다. MHC/펩타이드를 선택적으로 풍부하게 하기 위해 사용될 수 있는 일부 항체가 아래에 나열되어 있다.Immunoprecipitation was performed using antibodies with antibodies coupled to beads specific for HLA molecules. For pan-Class I HLA immunoprecipitation, pan-Class I CR antibodies are used, and for Class II HLA-DR, HLA-DR antibodies are used. Antibodies were incubated overnight and covalently bound to NHS-Sepharose beads. After covalent bonding, the beads were washed and dispensed against IP. ^59, ⁶⁰ immunoprecipitation can also be performed with antibodies that do not covalently attach to beads. Typically this is done using Sepharose or magnetic beads encoded with Protein A and/or Protein G to immobilize the antibody to the column. Some antibodies that can be used to selectively enrich MHC/peptides are listed below.

면역침강을 위해 항체 비드에 상기 정화된 조직 용해물을 첨가한다. 면역침강 후, 용해물에서 비드를 제거하고, 용해물은 추가의 IP를 포함하여 추가 실험을 위해 저장된다. IP 비드를 세정하여 비특이적 결합을 제거하고, 표준 기술을 사용하여 HLA/펩타이드 복합체를 비드에서 용출한다. 단백질 성분은 분자량 스핀 컬럼 또는 C18 분별화를 사용하여 펩타이드로부터 제거된다. 수득된 펩타이드를 SpeedVac 증발에 의해 건조시키고, 일부 경우에는 MS 분석 전에 -20C에서 저장한다. The purified tissue lysate is added to the antibody beads for immunoprecipitation. After immunoprecipitation, the beads are removed from the lysate, and the lysate is stored for further experiments, including additional IP. IP beads are washed to remove non-specific binding, and HLA/peptide complexes are eluted from the beads using standard techniques. Protein components are removed from the peptide using molecular weight spin column or C18 fractionation. The obtained peptide is dried by SpeedVac evaporation and in some cases stored at -20C prior to MS analysis.

건조된 펩타이드를 역상 크로마토그래피에 적합한 HPLC 완충액에서 재구성하고, 퓨전 루모스(Fusion Lumos) 질량 분광분석기(Thermo)에서 구배 용출을 위한 C-18 미세모세관 HPLC 칼럼에 로딩하였다. 펩타이드 질량/전하(m/z)의 MS1 스펙트럼을 Orbitrap 검출기에서 고해상도로 수집한 다음, 선택된 이온의 HCD 단편화 후에 이온 트랩 검출기에서 수집한 MS2 저해상도 스캔을 수행하였다. 추가로, MS2 스펙트럼은 CID 또는 ETD 단편화 방법 또는, 펩타이드의 더 큰 아미노산 적용범위를 달성하기 위한 세 가지 기술의 임의의 조합을 사용하여 얻어질 수 있다. MS2 스펙트럼은 또한 Orbitrap 검출기에서 고해상도 질량 정확도로 측정될 수 있다. The dried peptide was reconstituted in an HPLC buffer suitable for reverse phase chromatography and loaded onto a C-18 microcapillary HPLC column for gradient elution in a Fusion Lumos mass spectrometer (Thermo). The MS1 spectrum of peptide mass/charge (m/z) was collected at high resolution on an Orbitrap detector, followed by an MS2 low resolution scan collected on an ion trap detector after HCD fragmentation of selected ions. Additionally, MS2 spectra can be obtained using CID or ETD fragmentation methods, or any combination of the three techniques to achieve greater amino acid coverage of the peptide. The MS2 spectrum can also be measured with high resolution mass accuracy on an Orbitrap detector.

각각의 분석으로부터의 MS2 스펙트럼은 Comet⁶¹ ^, ⁶²을 사용하여 단백질 데이터베이스에 대해 검색하고, 펩타이드 확인은 퍼콜레이터(Percolator)⁶³ ^-65를 사용하여 채점한다. PEAKS studio (Bioinformatics Solutions Inc.)를 사용하여 추가 서열분석을 수행하고 스펙트럼 매칭 및 데노보 (de novo) 서열분석을 포함한 다른 검색 엔진 또는 서열분석 방법을 사용할 수 있다⁷⁵.MS2 spectrum from each of the assay using the Comet ^{^61,} ⁶² to search for a protein database, and the peptide is confirmed by using the marking peokol concentrator (Percolator) ⁶³ ^-65. PEAKS studio (Bioinformatics Solutions Inc.) Novo perform further sequence analysis and spectral matching and to use the (de novo) may use a different search engine or sequencing methods, including sequencing ^75.

ⅦⅦ .B.B .1. 포괄적인 .One. Comprehensive HLAHLA 펩타이드Peptide 서열분석을 지원하는 검출 연구의 Of detection studies that support sequencing MSMS 한계. Limit.

펩타이드 YVYVADVAAK를 사용하여 어떤 검출 한계가 LC 칼럼 상에 로딩된 상이한 양의 펩타이드를 사용하는지가 결정되었다. 시험된 펩타이드의 양은 1 pmol, 100 fmol, 10 fmol, 1f mol 및 100 amol이었다. (표 1) 결과를 도 1f에 나타내었다. 이들 결과는 최저 검출 한계(LoD)가 아토몰 범위(10^-18)에 있고, 동적 범위가 5배 이상이며 노이즈에 대한 신호가 낮은 펨토몰 범위(10^-15)에서의 서열분석에 충분하다는 것을 나타낸다. Peptide YVYVADVAAK was used to determine which detection limit used different amounts of peptide loaded on the LC column. The amounts of peptides tested were 1 pmol, 100 fmol, 10 fmol, 1f mol and 100 amol. Table 1 shows the results. These results indicate that the lowest detection limit (LoD) is in the atomol range ( ^10-18 ), the dynamic range is more than 5 times, and the signal for noise is sufficient for sequencing in the low femtomol range ( ^10-15 ). Shows.

Ⅷ. 제시 모델Ⅷ. Jesse model

Ⅷ.A. 시스템 개요A.A. System overview

도 2a는 일 구현예에 따라, 환자에서의 펩타이드 제시 가능성을 확인하기 위한 환경(100)의 개요이다. 환경(100)은 제시 정보 저장소(165)를 포함하는 제시 확인 시스템(160)을 도입하기 위한 문맥을 제공한다. 2A is an overview of an environment 100 for confirming the possibility of peptide presentation in a patient, according to one embodiment. Environment 100 provides a context for introducing a presentation verification system 160 that includes a presentation information store 165.

제시 확인 시스템(160)은 도 14와 관련하여 후술되는 바와 같이 컴퓨팅 시스템에서 구현되는 것 또는 컴퓨터 모델이며, MHC 대립유전자 세트와 관련된 펩타이드 서열을 수신하고 펩타이드 서열이 하나 이상의 MHC 대립유전자 세트에 의해 제시될 가능성을 결정한다. 제시 확인 시스템(160)은 부류 I 및 부류 II MHC 대립유전자 둘 모두에 적용될 수 있다. 이것은 다양한 상황에서 유용한다. 제시 확인 시스템(160)을 위한 하나의 특정한 용도 케이스는 환자(110)의 종양 세포로부터 MHC 대립유전자 세트와 관련된 후보 신생항원의 뉴클레오타이드 서열을 수신할 수 있고, 종양의 관련된 MHC 대립유전자의 하나 이상에 의해 후보 신생항원이 제시되고/되거나 환자(110)의 면역계에서 면역원성 반응을 유도할 가능성을 결정할 수 있다는 것이다. 시스템(160)에 의해 결정된 바와 같은 높은 가능성을 갖는 상기 후보 신생항원은 백신(118)에 포함되도록 선택될 수 있으며, 따라서 종양 세포를 제공하는 환자(110)의 면역계로부터 항 종양 면역 반응이 유발될 수 있다. The presentation verification system 160 is a computer model or one implemented in a computing system as described below in connection with FIG. 14, receiving a peptide sequence associated with the MHC allele set and presenting the peptide sequence by one or more MHC allele sets Decide on the possibilities. The presentation verification system 160 can be applied to both Class I and Class II MHC alleles. This is useful in a variety of situations. One specific use case for the presentation confirmation system 160 is to receive a nucleotide sequence of a candidate neoantigen associated with a set of MHC alleles from a tumor cell of a patient 110 and to one or more of the tumor's related MHC alleles. This suggests that candidate neoantigens are presented and/or can determine the likelihood of inducing an immunogenic response in patient 110's immune system. The candidate neoantigens with high likelihood as determined by system 160 may be selected to be included in vaccine 118, thus causing an anti-tumor immune response from the immune system of patient 110 providing tumor cells. Can be.

제시 확인 시스템(160)은 하나 이상의 제시 모델을 통해 제시 가능성을 결정한다. 구체적으로, 제시 모델은 주어진 펩타이드 서열이 관련된 MHC 대립유전자의 세트에 대해 제시되는지의 가능성을 생성하고, 스토어(165)에 저장된 제시 정보에 기초하여 생성된다. 예를 들어, 제시 모델은 펩타이드 서열 "YVYVADVAAK"이 샘플의 세포표면 상에 대립유전자 HLA-A*02:01, HLA-A*03:01, HLA-B*07:02, HLA-B*08:03, HLA-C*01:04의 세트에 대해 제시될 가능성을 생성할 수 있다. 제시 정보(165)는 펩타이드가 상이한 유형의 MHC 대립유전자에 결합하여 펩타이드 서열 내의 아미노산의 위치에 따라 모델이 결정되는 MHC 대립유전자에 의해 그 펩타이드가 제시되는지 여부에 대한 정보를 포함한다. 제시 모델은 인식되지 않은 펩타이드 서열이 제시 정보(165)에 기초하여 관련된 MHC 대립유전자 세트와 회합하여 제시되는지 여부를 예측할 수 있다. 전술한 바와 같이, 제시 모델은 부류 I 및 부류 II MHC 대립유전자 둘 다에 적용될 수 있다. The presentation verification system 160 determines presentation possibilities through one or more presentation models. Specifically, the presentation model creates the possibility that a given peptide sequence is presented for a set of related MHC alleles, and is generated based on presentation information stored in the store 165. For example, in the presented model, the alleles HLA-A*02:01, HLA-A*03:01, HLA-B*07:02, HLA-B*08 have the peptide sequence "YVYVADVAAK" on the cell surface of the sample. :03, HLA-C*01:04. The presentation information 165 includes information on whether the peptide is presented by the MHC allele in which the model is determined according to the position of the amino acid in the peptide sequence by binding the peptide to a different type of MHC allele. The presentation model can predict whether an unrecognized peptide sequence is presented in association with a set of related MHC alleles based on presentation information 165. As described above, the presentation model can be applied to both Class I and Class II MHC alleles.

Ⅷ.B. 제시 정보Ⅷ.B. Presentation information

도 2는 일 구현예에 따른 제시 정보를 획득하는 방법을 설명한다. 제시 정보(165)는 2개의 일반적인 정보 카테고리를 포함한다: 대립유전자-상호작용 정보 및 대립유전자-비상호작용 정보. 대립유전자-상호작용 정보는 MHC 대립유전자의 유형에 의존적인 펩타이드 서열의 제시에 영향을 미치는 정보를 포함한다. 대립유전자-비상호작용 정보는 MHC 대립유전자의 유형에 독립적인 펩타이드 서열의 제시에 영향을 주는 정보를 포함한다. 2 illustrates a method of obtaining presentation information according to an embodiment. The presentation information 165 includes two general information categories: allele-interaction information and allele-non-interaction information. Allele-interaction information includes information that affects the presentation of peptide sequences that are dependent on the type of MHC allele. Allele-non-interaction information includes information that influences the presentation of peptide sequences that are independent of the type of MHC allele.

Ⅷ.B.1. 대립유전자-상호작용 정보Ⅷ.B.1. Allele-interaction information

대립유전자-상호작용 정보는 주로 인간, 마우스 등으로부터 하나 이상의 확인된 MHC 분자에 의해 제시된 것으로 알려진 확인된 펩타이드 서열을 포함한다. 특히, 이것은 종양 샘플에서 얻은 데이터를 포함할 수도 있고 포함하지 않을 수도 있다. 제시된 펩타이드 서열은 단일 MHC 대립유전자를 발현하는 세포로부터 동정될 수 있다. 이 경우 제시된 펩타이드 서열은 일반적으로 예정된 MHC 대립유전자를 발현하도록 조작되고, 이어서 합성 단백질에 노출되는 단일-대립유전자 세포주로부터 수집된다. MHC 대립유전자 상에 제시된 펩타이드는 산-용출과 같은 기술에 의해 단리되고, 질량 분광분석법을 통해 동정된다. 도 2b는 예정된 MHC 대립유전자 HLA-DRB1*12:01에 제시된 예시적인 펩타이드 YEMFNDKSQRAPDDKMF가 질량 분광분석법을 통해 단리되고 동정된 예를 도시한다. 이 상황에서 펩타이드는 하나의 미리 결정된 MHC 단백질을 발현하도록 조작된 세포를 통해 동정되기 때문에, 제시된 펩타이드와 그것이 결합된 MHC 단백질 사이의 직접적인 연관성이 명확히 알려져있다. Allele-interaction information mainly includes identified peptide sequences known to be presented by one or more identified MHC molecules from humans, mice, and the like. In particular, it may or may not include data obtained from tumor samples. The presented peptide sequence can be identified from cells expressing a single MHC allele. In this case the presented peptide sequence is generally engineered to express a predetermined MHC allele, and then collected from a single-allele cell line exposed to synthetic proteins. Peptides presented on the MHC allele are isolated by techniques such as acid-elution and identified by mass spectrometry. 2B shows an example in which the exemplary peptide YEMFNDKSQRAPDDKMF presented in the scheduled MHC allele HLA-DRB1*12:01 was isolated and identified via mass spectrometry. In this situation, since the peptide is identified through cells engineered to express one predetermined MHC protein, the direct association between the presented peptide and the MHC protein to which it is bound is clearly known.

제시된 펩타이드 서열은 또한 다중 MHC 대립유전자를 발현하는 세포로부터 수집될 수 있다. 통상 인간에서, 6개의 상이한 유형의 MHC-I 및 최대 12개의 상이한 유형의 MHC-II 분자가 세포에 대해 발현된다. 상기 제시된 펩타이드 서열은 다수의 예정된 MHC 대립유전자를 발현하도록 조작된 다중-대립유전자 세포주로부터 동정될 수 있다. 상기 제시된 펩타이드 서열은 또한, 조직 샘플로부터, 정상 조직 샘플 또는 종양 조직 샘플로부터 동정될 수 있다. 이 경우 특히, MHC 분자는 정상 또는 종양 조직으로부터 면역침강될 수 있다. 다중 MHC 대립유전자 상에 제시된 펩타이드는 산-용출과 같은 기술로 유사하게 단리될 수 있고, 질량 분광분석법을 통해 동정될 수 있다. 도 2c는 확인된 부류 I MHC 대립유전자 HLA-A*01:01, HLA-A*02:01, HLA-B*07:02, HLA-B*08:01, 및 부류 II MHC 대립유전자 HLA-DRB1*10:01, HLA-DRB1:11:01에 대하여, 6개의 예시적인 펩타이드, YEMFNDKSF, HROEIFSHDFJ, FJIEJFOESS, NEIOREIREI, JFKSIFEMMSJDSSUIFLKSJFIEIFJ, 및 KNFLENFIESOFI가 제시되고, 질량 분광분석법을 통해 단리 및 동정되는 예를 도시한다. 단일-대립유전자 세포주와 대조적으로, 제시된 펩타이드와 결합된 MHC 단백질 사이의 직접적인 연관성은 결합된 펩타이드가 확인되기 전에 MHC 분자로부터 단리되기 때문에 알려지지 않을 수 있다. The presented peptide sequences can also be collected from cells expressing multiple MHC alleles. In normal humans, 6 different types of MHC-I and up to 12 different types of MHC-II molecules are expressed on cells. The peptide sequences presented above can be identified from multi-allele cell lines engineered to express a number of predetermined MHC alleles. The peptide sequences presented above can also be identified from tissue samples, normal tissue samples or tumor tissue samples. In this case, in particular, MHC molecules can be immunoprecipitated from normal or tumor tissue. Peptides presented on multiple MHC alleles can be similarly isolated by techniques such as acid-elution and can be identified via mass spectrometry. 2C shows the identified Class I MHC alleles HLA-A*01:01, HLA-A*02:01, HLA-B*07:02, HLA-B*08:01, and Class II MHC allele HLA- For DRB1*10:01, HLA-DRB1:11:01, six exemplary peptides, YEMFNDKSF, HROEIFSHDFJ, FJIEJFOESS, NEIOREIREI, JFKSIFEMMSJDSSUIFLKSJFIEIFJ, and KNFLENFIESOFI are presented and illustrated and illustrated through mass spectroscopy do. In contrast to the single-allele cell line, the direct association between the presented peptide and the bound MHC protein may not be known because the bound peptide is isolated from the MHC molecule before being identified.

대립유전자-상호작용 정보는 또한 펩타이드-MHC 분자 복합체의 농도 및 펩타이드의 이온화 효율에 좌우되는 질량 분광분석법 이온 전류를 포함할 수 있다. 이온화 효율은 서열-의존적인 방식으로 펩타이드에 따라 펩타이드마다 다양하다. 일반적으로, 이온화 효율은 대략 2차 등급 이상으로 펩타이드에 따라 다양한 반면, 펩타이드-MHC 복합체의 농도는 그보다 넓은 범위에 걸쳐 다양하다. Allele-interaction information can also include mass spectrometry ionic currents that depend on the concentration of the peptide-MHC molecular complex and the ionization efficiency of the peptide. Ionization efficiency varies from peptide to peptide in a sequence-dependent manner. In general, the ionization efficiency is approximately 2nd grade or higher and varies depending on the peptide, while the concentration of the peptide-MHC complex varies over a wider range.

대립유전자-상호작용 정보는 또한 주어진 MHC 대립유전자와 주어진 펩타이드 사이의 결합 친화성의 측정 또는 예측을 포함할 수 있다.(72, 73, 74) 하나 이상의 친화성 모델이 상기 예측을 생성할 수 있다. 예를 들어, 하기에 도시된 예로 돌아가서,도 1d에서, 제시 정보(165)는 펩타이드 YEMFNDKSF와 부류 I 대립유전자 HLA-A*01:01 사이의 1000nM의 결합 친화성 예측을 포함할 수 있다. IC50이 1000nm 초과인 펩타이드는 MHC에 의해 제공되지 않으며, IC50 값이 낮으면 제시 가능성이 높아진다. 제시 정보(165)는 펩타이드 KNFLENFIESOFI 및 부류 II 대립유전자 HLA-DRB1:11:01 사이의 결합 친화도 예측을 포함할 수 있다.Allele-interaction information can also include a measurement or prediction of the binding affinity between a given MHC allele and a given peptide. (72, 73, 74) One or more affinity models can generate the prediction. For example, returning to the example shown below, in FIG. 1D, presentation information 165 may include a prediction of the binding affinity of 1000 nM between the peptide YEMFNDKSF and the class I allele HLA-A*01:01. Peptides with an IC50 greater than 1000 nm are not provided by MHC, and a lower IC50 value increases the likelihood of presentation. The presentation information 165 may include prediction of binding affinity between the peptide KNFLENFIESOFI and the class II allele HLA-DRB1:11:01.

대립유전자-상호작용 정보는 또한 MHC 복합체의 안정성에 대한 측정이나 예측을 포함할 수 있다. 상기 예측을 생성할 수 있는 하나 이상의 안정성 모델.보다 안정한 펩타이드-MHC 복합체(즉, 보다 긴 반감기를 갖는 복합체)는 종양 세포 및 백신 항원을 접하는 항원-제시 세포 상에 높은 복제수로 제시될 가능성이 더 높다. 예를 들어, 하기에 도시된 예로 돌아가서, 도 2c에서, 제시 정보(165)는 부류 I 분자 HLA-A*01:01에 대한 1시간의 반감기의 안정성 예측을 포함할 수 있다. 제시 정보(165)는 또한 부류 II 분자 HLA-DRB1:11:01에 대한 반감기의 안정성 예측을 포함할 수 있다.Allele-interaction information can also include measurements or predictions for the stability of the MHC complex. One or more stability models capable of generating the above prediction. The more stable peptide-MHC complex (ie, a complex with a longer half-life) is likely to be presented with a high number of copies on tumor cells and antigen-presenting cells contacting the vaccine antigen. Higher. For example, returning to the example shown below, in FIG. 2C, presentation information 165 may include a 1 hour half-life stability prediction for class I molecule HLA-A*01:01. The presentation information 165 may also include predicting the stability of the half-life for class II molecules HLA-DRB1:11:01.

대립유전자-상호작용 정보는 또한 펩타이드-MHC 복합체에 대한 형성 반응의 측정 또는 예측된 속도를 포함할 수 있다. 더 높은 속도로 형성되는 복합체는 고농도에서 세포 표면 상에 제시될 가능성이 더 크다. Allele-interaction information can also include a measured or predicted rate of formation response to the peptide-MHC complex. Complexes that are formed at higher rates are more likely to be present on the cell surface at high concentrations.

대립유전자-상호작용 정보는 또한 펩타이드의 서열 및 길이를 포함할 수 있다. MHC 부류 I 분자는 통상 8 내지 15 펩타이드 길이의 펩타이드를 제시하는 것을 선호한다. 제시된 펩타이드의 60-80%는 길이 9를 갖는다. MHC 부류 II 분자는 전형적으로 6 내지 30개 사이의 펩타이드 길이의 펩타이드를 제공하는 것이 바람직하다.Allele-interaction information can also include the sequence and length of the peptide. MHC class I molecules usually prefer to present peptides 8 to 15 peptides in length. 60-80% of the peptides presented have a length of 9. It is preferred that MHC class II molecules typically provide peptides of between 6 and 30 peptides in length.

대립유전자-상호작용 정보는 신생항원 암호화된 펩타이드 상의 키나아제 서열 모티프의 존재 및 신생항원 암호화된 펩타이드 상의 특이적인 번역후 변형의 부재 또는 존재를 포함할 수 있다. 키나아제 모티프의 존재는 MHC 결합을 강화시키거나 방해할 수 있는, 번역후 변형 가능성에 영향을 미친다. Allele-interaction information can include the presence or absence of a specific post-translational modification on the neoantigen encoded peptide and the presence of a kinase sequence motif on the neoantigen encoded peptide. The presence of a kinase motif affects the possibility of post-translational modification, which can enhance or interfere with MHC binding.

대립유전자-상호작용 정보는 또한 번역후 변형 과정에 관여하는 단백질, 예컨대 키나아제의 발현 또는 활성 수준(RNA 서열분석, 질량 분광분석법 또는 다른 방법으로부터 측정되거나 예측된 바와 같음)를 포함할 수 있다. Allele-interaction information can also include the level of expression or activity of a protein involved in the post-translational modification process, such as kinase (as measured or predicted from RNA sequencing, mass spectrometry, or other methods).

대립유전자-상호작용 정보는 또한 질량-분광분석법 프로테오믹스 또는 다른 수단에 의해 평가된 바와 같이, 특정 MHC 대립유전자를 발현하는 다른 개체로부터의 세포에서 유사한 서열을 갖는 펩타이드의 제시 가능성을 포함할 수 있다. Allele-interaction information can also include the possibility of presenting peptides with similar sequences in cells from other individuals expressing a particular MHC allele, as assessed by mass-spectrometry proteomics or other means.

대립유전자-상호작용 정보는 또한 문제의 개체에서 특정 MHC 대립유전자의 발현 수준을 포함할 수 있다(예를 들어 RNA-서열 분석 또는 질량 분광분석법에 의해 측정됨).높은 수준에서 발현되는 MHC 대립유전자에 가장 강하게 결합하는 펩타이드는 낮은 수준에서 발현되는 MHC 대립유전자에 가장 강하게 결합하는 펩타이드보다 더 많이 제시될 가능성이 있다. Allele-interaction information can also include the expression level of a specific MHC allele in the subject in question (e.g., measured by RNA-sequencing or mass spectrometry). The MHC allele expressed at a high level The peptide that binds the most strongly is likely to be presented more than the peptide that binds most strongly to the MHC allele expressed at a low level.

대립유전자-상호작용 정보는 또한 특정 MHC 대립유전자를 발현하는 다른 개체에서 특정 MHC 대립유전자에 의한 제시의 전체 신생항원 암호화된 펩타이드-서열-독립적 확률을 포함할 수 있다. Allele-interaction information can also include the overall neoantigen encoded peptide-sequence-independent probability of presentation by a particular MHC allele in another individual expressing a particular MHC allele.

대립유전자-상호작용 정보는 또한 다른 개체에서, 동일한 계열의 분자(예를 들어, HLA-A, HLA-B, HLA-C, HLA-DQ, HLA-DR, HLA-DP)에서 MHC 대립유전자에 의한 제시의 펩타이드-서열-독립적 총 확률을 포함할 수 있다: 예를 들어, HLA-C 분자는 통상 HLA-A 또는 HLA-B 분자보다 낮은 수준에서 발현되며, 결과적으로 HLA-C에 의한 펩타이드의 제시는 HLA-A 또는 HLA-B에 의한 제시보다 덜 선험적이다. 또 다른 예에서, HLA-DP는 전형적으로 HLA-DR 또는 HLA-DQ보다 더 낮은 수준으로 발현되며; 결과적으로, HLA-DP에 의한 펩타이드의 제시는 HLA-DR 또는 HLA-DQ에 의한 제시보다 이전에 덜 선험적이다.Allele-interaction information can also be transferred to MHC alleles from other individuals, from the same family of molecules (e.g. HLA-A, HLA-B, HLA-C, HLA-DQ, HLA-DR, HLA-DP). The peptide-sequence-independent total probability of presentation by can include: For example, HLA-C molecules are usually expressed at a lower level than HLA-A or HLA-B molecules, and consequently the HLA-C peptide Presentation is less a priori than presentation by HLA-A or HLA-B. In another example, HLA-DP is typically expressed at a lower level than HLA-DR or HLA-DQ; Consequently, presentation of the peptide by HLA-DP is less a priori than presentation by HLA-DR or HLA-DQ.

대립유전자-상호작용 정보는 또한 특정 MHC 대립유전자의 단백질 서열을 포함할 수 있다. Allele-interaction information can also include the protein sequence of a particular MHC allele.

아래 섹션에 열거된 임의의 MHC 대립유전자-비상호작용 정보는 또한 MHC 대립유전자-상호작용 정보로 모델링될 수 있다. Any MHC allele-non-interaction information listed in the sections below can also be modeled as MHC allele-interaction information.

Ⅷ.B.2. 대립유전자-비상호작용 정보Ⅷ.B.2. Allele-non-interaction information

대립유전자-비상호작용 정보는 그의 원천 단백질 서열 내에서 신생항원 암호화 펩타이드에 측접한 C-말단 서열을 포함할 수 있다. MHC-I에 대해, C-말단 측접 서열은 펩타이드의 프로테아솜 처리에 영향을 미칠 수 있다. 그러나, C-말단 측접 서열은 펩타이드가 소포체로 수송되고 세포 표면상의 MHC 대립유전자를 만나기 전에 프로테아솜에 의해 펩타이드로부터 절단된다. 결과적으로, MHC 분자는 C-말단 측접 서열에 대한 어떠한 정보도 받지 않으며, 따라서 C-말단 측접 서열의 효과는 MHC 대립유전자 유형에 따라 변할 수 없다. 예를 들어, 도 2c에 도시된 예로 돌아가서, 제시 정보(165)는 펩타이드의 원천 단백질로부터 동정된 제시된 펩타이드 FJIEJFOESS의 C-말단 측접 서열 FOEIFNDKSLDKFJI를 포함할 수 있다. Allele-non-interaction information may include a C-terminal sequence flanking a neoantigen-encoding peptide within its source protein sequence. For MHC-I, the C-terminal flanking sequence can affect the proteasome treatment of the peptide. However, the C-terminal flanking sequence is cleaved from the peptide by the proteasome before the peptide is transported to the endoplasmic reticulum and meets the MHC allele on the cell surface. Consequently, the MHC molecule does not receive any information about the C-terminal flanking sequence, so the effect of the C-terminal flanking sequence cannot be varied depending on the type of MHC allele. For example, returning to the example shown in FIG. 2C, the presentation information 165 may include the C-terminal flanking sequence FOEIFNDKSLDKFJI of the presented peptide FJIEJFOESS identified from the source protein of the peptide.

대립유전자-비상호작용 정보는 또한 mRNA 정량 측정을 포함할 수 있다. 예를 들어, 질량 분광분석 훈련 데이터를 제공하는 동일한 샘플에 대해 mRNA 정량화 데이터를 얻을 수 있다. 도 13h를 참조하여 후술하는 바와 같이, RNA 발현은 펩타이드 제시의 강력한 예측변수로 확인되었다. 일 구현예에서, mRNA 정량화 측정은 소프트웨어 툴 RSEM으로부터 확인된다. RSEM 소프트웨어 도구의 상세한 구현은 Bo Li와 Colin N에서 찾을 수 있다. Dewey. RSEM : 참조 게놈이 있거나 없는 RNA-서열 분석 데이터로부터 정확한 전사체 정량화. BMC Bioinformatics, 12: 323, 2011년 8월일. 구현예에서, mRNA 정량화는 백만 맵핑된 판독치(FPKM) 당 전사체의 킬로베이스 당 단편 단위로 측정된다. Allele-non-interaction information may also include mRNA quantification. For example, mRNA quantification data can be obtained for the same sample that provides mass spectrometry training data. As described below with reference to FIG. 13H, RNA expression was identified as a strong predictor of peptide presentation. In one embodiment, mRNA quantification measurements are confirmed from the software tool RSEM. Detailed implementations of RSEM software tools can be found in Bo Li and Colin N. Dewey. RSEM : Accurate transcript quantification from RNA-sequencing data with or without reference genome . BMC Bioinformatics , 12: 323, August 2011. In an embodiment, mRNA quantification is measured in fragments per kilobase of transcript per million mapped readings (FPKM).

대립유전자-비상호작용 정보는 또한 그의 원천 단백질 서열 내 펩타이드에 측접한 N-말단 서열을 포함할 수 있다.Allele-non-interaction information can also include an N-terminal sequence flanking a peptide in its source protein sequence.

대립유전자-비상호작용 정보는 또한 펩타이드 서열의 공급원 유전자를 포함할 수 있다. 공급원 유전자는 펩타이듸 서열의 Ensembl 단백질 패밀리로서 정의될 수 있다. 다른 예로서, 공급원 유전자는 펩타이드 서열의 원천 DNA 또는 원천 RNA로서 정의될 수 있다. 예를 들어, 원천유전자는 단백질을 암호화하는 뉴클레이타이드 스트링으로 표시되거나, 또는 대안적으로 특이적 단백질을 암호화하는 것으로 알려진 공지된 DNA 또는 RNA 서열의 명명된 세트에 기초하여 보다 범주적으로 표현될 수 있다. 다른 예에서, 대립유전자-비상호작용 정보는 또한 Ensembl 또는 RefSeq와 같은 데이터베이스로부터 유도된 펩타이드 서열의 원천 전사체 또는 동형체 또는 잠재적인 원천 전사체 또는 동형체의 세트를 포함할 수 있다.Allele-non-interaction information can also include the source gene of the peptide sequence. The source gene can be defined as an Ensembl protein family of peptoid sequence. As another example, the source gene can be defined as the source DNA or source RNA of the peptide sequence. For example, a progeny may be represented by a nucleotide string encoding a protein, or alternatively expressed more categorically based on a known set of known DNA or RNA sequences encoding a specific protein. Can. In another example, allele-non-interaction information can also include a source transcript or isoform of a peptide sequence derived from a database such as Ensembl or RefSeq, or a set of potential source transcripts or isoforms.

대립 유전자-비상호작용 정보는 또한 펩티드 서열의 기원 세포의 조직 유형, 세포 유형 또는 종양 유형 세포를 포함할 수 있다.Allele-non-interaction information can also include tissue type, cell type or tumor type cells of the cell of origin of the peptide sequence.

대립유전자-비-상호작용 정보는 또한 종양 세포에서 상응하는 프로테아제의 발현에 따라 선택적으로 가중된 펩타이드 내의 프로테아제 절단 모티프의 존재를 포함할 수 있다(RNA-서열 분석 또는 질량 분광분석법으로 측정됨). 프로테아제 절단 모티프를 함유하는 펩타이드는 프로테아제에 의해 보다 쉽게 분해되고 따라서 세포 내에서 덜 안정적일 것이므로 제시될 가능성이 적다. Allele-non-interacting information can also include the presence of a protease cleavage motif in a weighted peptide selectively depending on the expression of the corresponding protease in tumor cells (measured by RNA-sequencing or mass spectrometry). Peptides containing protease cleavage motifs are less likely to be presented because they are more readily degraded by proteases and therefore will be less stable in the cell.

대립유전자-비상호작용 정보는 또한 적절한 세포 유형에서 측정된 원천 단백질의 전환율을 포함할 수 있다. 빠른 전환율(즉, 더 낮은 반감기)은 제시 가능성을 높이지만; 이 특징의 예측력은 비유사 세포 유형에서 측정할 경우 낮다. Allele-non-interaction information can also include the conversion of the source protein measured in the appropriate cell type. Fast conversion rates (ie lower half-life) increase the likelihood of presentation; The predictive power of this feature is low when measured on non-like cell types.

대립유전자-비상호작용 정보에는 RNA-서열 분석 또는 단백체 질량 분광분석법으로 측정된 바와 같이, 또는 DNA 또는 RNA 서열 데이터에서 검출된 생식 계열 또는 체세포 스플라이싱 돌연변이의 주석으로부터 예상된 바와 같이, 종양 세포에서 가장 많이 발현되는 특정한 스플라이스 변이체("동형체")를 선택적으로 고려한 원천 단백질의 길이를 포함할 수 있다. Allele-non-interacting information can be found in tumor cells as measured by RNA-sequencing or proteomic mass spectrometry, or as expected from annotations of germ line or somatic splicing mutations detected in DNA or RNA sequence data. It may include the length of the source protein, optionally taking into account the specific splice variant ("homolog") most expressed.

대립유전자-비상호작용 정보는 프로테아솜, 면역프로테아솜, 흉선프로테아솜, 또는 종양세포내 기타 프로테아제의 발현 수준을 포함할 수 있다(RNA-서열 분석, 단백체 질량 분광분석법, 또는 면역조직화학에 의해 측정될 수 있음). 상이한 프로테아솜은 상이한 절단 부위 선호도를 갖는다. 단백질의 발현 수준에 비례하여 각 유형의 프로테아솜의 절단 선호에 더 많은 무게가 주어질 것이다. Allele-non-interaction information can include the expression level of proteasomes, immunoproteasomes, thymic proteasomes, or other proteases in tumor cells (RNA-sequencing, protein mass spectrometry, or immunohistochemistry) It can be measured by). Different proteasomes have different cleavage site preferences. More weight will be given to the cleavage preference of each type of proteasome in proportion to the expression level of the protein.

대립유전자-비상호작용 정보는 또한 펩타이드의 공급원 유전자의 발현을 포함할 수 있다(예를 들어, RNA-서열 분석 또는 질량 분광분석법에 의해 측정됨).가능한 최적화는 종양 샘플 내의 기질 세포 및 종양-침윤 림프구의 존재를 설명하기 위해 측정된 발현을 조정하는 것을 포함한다. 더 고도로 발현된 유전자로부터의 펩타이드가 제시될 가능성이 더 높다. 검출불가능한 발현 수준을 갖는 유전자로부터의 펩타이드는 고려에서 배제될 수 있다. Allele-non-interaction information can also include the expression of the source gene of the peptide (e.g., measured by RNA-sequencing or mass spectrometry). Possible optimizations are stromal cells and tumor-infiltrating in tumor samples. And adjusting the measured expression to account for the presence of lymphocytes. It is more likely that peptides from more highly expressed genes will be presented. Peptides from genes with undetectable expression levels can be excluded from consideration.

대립유전자-비상호작용 정보는 신생항원 암호화된 펩타이드의 소스 mRNA가 논센스-매개된 감쇠의 모델, 예를 들어 Rivas 등 Science 2015로부터의 모델에 의해 예측된 바와 같이 논센스-매개된 감쇠될 것가능성을 포함할 수 있다. Allele-non-interacting information is likely to indicate that the source mRNA of the neoantigen-encoded peptide is nonsense-mediated attenuation, as predicted by a model from nonsense-mediated attenuation, e.g. from Rivas et al. Science 2015. It may include.

대립유전자-비상호작용 정보는 또한 세포주기의 다양한 단계 동안 펩타이드의 공급원 유전자의 통상적인 조직-특이적인 발현을 포함할 수 있다. (RNA-서열 분석 또는 질량 분광분석법 프로테오믹스로 측정된 바와 같이) 전반적으로 낮은 수준으로 발현되지만 세포주기의 특정한 단계에서 높은 수준으로 발현되는 것으로 알려진 유전자는 매우 낮은 수준에서 안정적으로 발현되는 유전자보다 더 많이 제시된 펩타이드를 생성할 가능성이 있다. Allele-non-interaction information can also include conventional tissue-specific expression of the source gene of the peptide during various stages of the cell cycle. Genes that are generally expressed at low levels overall (as measured by RNA-sequencing or mass spectrometry proteomics), but are known to be expressed at high levels at certain stages of the cell cycle, are more likely than genes that are stably expressed at very low levels. It is possible to produce the peptides presented.

대립유전자-비상호작용 정보는 또한, 예를 들어 uniProt 또는 PDB http:// www.rcsb.org/pdb/home/home.do/에 주어진 바와 같은 원천 단백질의 특징의 포괄적 카탈로그를 포함할 수 있다. 상기 특징들은 그중에서도 단백질의 2차 및 3차 구조, 세포하 국재화 11, 세포 존재론(Gene ontology, GO) 용어를 포함할 수 있다. 구체적으로, 이 정보는 단백질 수준에서 작용하는 주석, 예를 들어 5 'UTR 길이, 및 잔기 300 및 310 사이의 나선 모티프와 같은 특정한 잔기의 수준에서 작용하는 주석를 포함할 수 있다. 이러한 특징은 회전 모티프, 시트 모티프 및 불규칙 잔류물을 포함할 수 있다. Allele-non-interaction information can also include a comprehensive catalog of characteristics of the source protein, such as given at uniProt or PDB http://www.rcsb.org/pdb/home/home.do/, for example. These features may include, among other things, secondary and tertiary structures of proteins, subcellular localization 11, and terms of cell ontology (GO). Specifically, this information can include annotations that act at the protein level, such as 5'UTR length, and annotations that act at the level of specific residues, such as the helical motif between residues 300 and 310. These features can include rotating motifs, sheet motifs and irregular residues.

대립유전자-비상호작용 정보는 또한 펩타이드를 함유하는 원천 단백질의 도메인의 특성을 기술하는 특징, 예를 들어 하기를 포함할 수 있다: 2차 또는 3차 구조(예를 들어, 알파 나선구조 대 베타 시트); 대안적인 스플라이싱.Allele-non-interaction information can also include features that characterize the domain of the source protein containing the peptide, e.g., secondary or tertiary structures (e.g., alpha helices versus beta sheets) ); Alternative splicing.

대립유전자-비상호작용 정보는 또한 펩타이드의 원천 단백질 내의 펩타이드의 위치에서 제시 핫스팟의 존재 또는 부재를 기술하는 특징을 포함할 수 있다. Allele-non-interaction information can also include features that describe the presence or absence of a presenting hot spot at the location of the peptide in the source protein of the peptide.

대립유전자-비상호작용 정보는 또한 (이들 개체에서 원천 단백질의 발현 수준 및 개개인의 상이한 HLA 유형의 영향을 조정한 후) 다른 개체에서 해당 펩타이드의 원천 단백질로부터 펩타이드를 제시할 가능성을 포함할 수 있다. Allele-non-interaction information may also include the possibility of presenting peptides from the source protein of the peptide in other individuals (after adjusting the expression level of the source protein in these individuals and the influence of the individual's different HLA types).

대립유전자-비상호작용 정보는 기술적인 편향으로 인해 펩타이드가 검출되지 않거나 질량 분광분석법으로 과다 표현될 확률을 포함할 수 있다. Allele-non-interaction information can include the probability that a peptide is not detected due to technical bias or is overexpressed by mass spectrometry.

RNASeq, 마이크로어레이(들), 표적 패널(들), 예컨대 나노스트링 (Nanostring)과 같은 유전자 발현 분석으로 측정된 다양한 유전자 모듈/경로, 또는 종양 세포, 간질 또는 종양 침윤 림프구(TIL)의 상태에 대한 정보를 제공하는 RT-PCR과 같은 분석법으로 측정된 유전자 모듈의 단일/다중-유전자 대표(펩타이드의 원천 단백질을 포함할 필요가 없음)의 발현.RNASeq, microarray(s), target panel(s) such as various gene modules/paths measured by gene expression analysis such as Nanostring, or for the status of tumor cells, interstitial or tumor infiltrating lymphocytes (TILs) Expression of single/multi-gene representations of genetic modules (not necessarily including the source protein of the peptide) as determined by analytical methods such as informative RT-PCR.

대립유전자-비상호작용 정보는 또한 종양 세포내 펩타이드의 공급원 유전자의 복제수를 포함할 수 있다. 예를 들어, 종양 세포에서 동종접합성 결실을 겪는 유전자의 펩타이드는 0의 제시 확률을 배정받을 수 있다. Allele-non-interaction information can also include the number of copies of the source gene of the peptide in the tumor cell. For example, peptides of genes that undergo homozygous deletion in tumor cells can be assigned a probability of presentation of zero.

대립유전자-비상호작용 정보는 또한 펩타이드가 TAP에 결합할 확률 또는 TAP에 대한 펩타이드의 측정된 또는 예측된 결합 친화성을 포함할 수 있다. TAP에 더 많이 결합할 가능성이 있는 펩타이드 또는 더 높은 친화성으로 TAP에 결합하는 펩타이드가 MHC-I에 의해 제시될 가능성이 더 크다. Allele-non-interaction information can also include the probability that the peptide will bind to TAP or the measured or predicted binding affinity of the peptide for TAP. Peptides that are more likely to bind TAP or peptides that bind TAP with higher affinity are more likely to be presented by MHC-I.

대립유전자-비상호작용 정보는 종양 세포에서 TAP의 발현 수준(RNA-서열 분석, 단백체 질량 분광분석법, 면역조직화학법으로 측정될 수 있음)을 포함할 수도 있다. MHC-I에 대해, 더 높은 TAP 발현 수준은 모든 펩타이드의 제시 확률을 증가시킨다. Allele-non-interaction information may include the expression level of TAP in tumor cells (which can be measured by RNA-sequencing, protein mass spectrometry, immunohistochemistry). For MHC-I, higher TAP expression levels increase the probability of presentation of all peptides.

대립유전자-비상호작용 정보는 또한, 하기를 비제한적으로 포함하는 종양 돌연변이의 존재 또는 부재를 포함할 수 있다: Allele-non-interacting information can also include the presence or absence of tumor mutations, including but not limited to:

i. 공지된 암 드라이버 유전자 예컨대 EGFR, KRAS, ALK, RET, ROS1, TP53, CDKN2A, CDKN2B, NTRK1, NTRK2, NTRK3의 유발 돌연변이i. Induced mutations of known cancer driver genes such as EGFR, KRAS, ALK, RET, ROS1, TP53, CDKN2A, CDKN2B, NTRK1, NTRK2, NTRK3

ii. 항원 제시 장치에 관여하는 단백질을 암호화하는 내부(In) 유전자(예를 들어,B2M, HLA-A, HLA-B, HLA-C, TAP-1, TAP-2, TAPBP, CALR, CNX, ERP57, HLA-DM, HLA-DMA, HLA-DMB, HLA-DO, HLA-DOA, HLA-DOBHLA-DP, HLA-DPA1, HLA-DPB1, HLA-DQ, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DR, HLA-DRA, HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, 또는 프로테아솜 또는 면역프로테아솜의 구성요소를 암호화하는 임의의 유전자). 종양에서 기능-상실 돌연변이를 일으키는 항원-제시 장치의 구성 요소에 제시가 의존하는 펩타이드는 제시 확률을 감소시킨다. ii. Internal (In) genes encoding proteins involved in antigen presentation devices (e.g., B2M, HLA-A, HLA-B, HLA-C, TAP-1, TAP-2, TAPBP, CALR, CNX, ERP57, HLA-DM, HLA-DMA, HLA-DMB, HLA-DO, HLA-DOA, HLA-DOBHLA-DP, HLA-DPA1, HLA-DPB1, HLA-DQ, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DR, HLA-DRA, HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, or any gene encoding a component of the proteasome or immunoproteasome). Peptides whose presentation depends on the components of the antigen-presenting device causing a function-loss mutation in the tumor decrease the probability of presentation.

하기를 비제한적으로 포함하는, 기능성 생식 계열 다형성의 존재 또는 부재: Presence or absence of functional germline polymorphism, including but not limited to:

i. 항원 제시 장치에 관여하는 단백질을 암호화하는 내부(In) 유전자(예를 들어,B2M, HLA-A, HLA-B, HLA-C, TAP-1, TAP-2, TAPBP, CALR, CNX, ERP57, HLA-DM, HLA-DMA, HLA-DMB, HLA-DO, HLA-DOA, HLA-DOBHLA-DP, HLA-DPA1, HLA-DPB1, HLA-DQ, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DR, HLA-DRA, HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, 또는 프로테아솜 또는 면역프로테아솜의 구성요소를 암호화하는 임의의 유전자)i. Internal (In) genes encoding proteins involved in antigen presentation devices (e.g., B2M, HLA-A, HLA-B, HLA-C, TAP-1, TAP-2, TAPBP, CALR, CNX, ERP57, HLA-DM, HLA-DMA, HLA-DMB, HLA-DO, HLA-DOA, HLA-DOBHLA-DP, HLA-DPA1, HLA-DPB1, HLA-DQ, HLA-DQA1, HLA-DQA2, HLA-DQB1, HLA-DQB2, HLA-DR, HLA-DRA, HLA-DRB1, HLA-DRB3, HLA-DRB4, HLA-DRB5, or any gene encoding a component of the proteasome or immunoproteasome)

대립유전자-비상호작용 정보는 또한 종양 유형(예를 들어, NSCLC, 흑색종)을 포함할 수 있다.Allele-non-interaction information can also include tumor type (eg, NSCLC, melanoma).

대립유전자-비상호작용 정보는 또한 예를 들어 HLA 대립유전자 접미사에 의해 반영되는 HLA 대립유전자의 공지된 기능을 포함할 수 있다. 예를 들어, 대립유전자 이름 HLA-A*24:09N의 N 접미사는 발현되지 않은 무반응(null) 대립유전자를 나타내므로며, 따라서 에피토프를 나타내지 않을 수 있으며; 전체 HLA 대립유전자 접미사 명명법은 https://www.ebi.ac.uk/ipd/imgt/hla/nomenclature/suffixes. html에 기재되어 있다. Allele-non-interaction information can also include, for example, the known function of the HLA allele reflected by the HLA allele suffix. For example, the N suffix of the allele name HLA-A*24:09N indicates an unexpressed non-expressed null allele, and thus may not represent an epitope; The full HLA allele suffix nomenclature is https://www.ebi.ac.uk/ipd/imgt/hla/nomenclature/suffixes. It is described in html.

대립유전자-비상호작용 정보는 또한 임상 종양 하위유형(예를 들어, 편평상피 폐암 대 비-편평형)을 포함할 수 있다. Allele-non-interaction information can also include clinical tumor subtypes (eg squamous lung cancer versus non-squamous).

대립유전자-비상호작용 정보에는 흡연 이력도 포함될 수 있다. Allele-non-interaction information may also include smoking history.

대립유전자-비상호작용 정보는 또한 햇볕 화상, 일광 노출 또는 다른 뮤타젠에 노출된 병력을 포함할 수 있다. Allele-non-interacting information may also include a history of sunburn, sun exposure, or other mutagen exposure.

대립유전자-비상호작용 정보는 또한 관련 종양 유형 또는 임상 하위유형에서 펩타이드의 공급원 유전자의 통상적인 발현을 포함할 수 있으며, 선택적으로 유발 돌연변이에 의해 계층화될 수 있다. 관련 종양 유형에서 통상 높은 수준으로 발현되는 유전자가 더 많이 나타난다. Allele-non-interaction information can also include conventional expression of the source gene of the peptide in the relevant tumor type or clinical subtype, and can optionally be stratified by triggered mutations. More genes are usually expressed at higher levels in the relevant tumor type.

대립유전자-비상호작용 정보는 모든 종양, 또는 동일한 유형의 종양, 또는 적어도 하나의 공유된 MHC 대립유전자를 가진 개체의 종양, 또는 적어도 하나의 공유된 MHC 대립유전자가 있는 개체의 동일한 유형의 종양에서 돌연변이의 빈도를 포함할 수 있다. Allele-non-interaction information is mutated in all tumors, or tumors of the same type, or tumors of individuals with at least one shared MHC allele, or tumors of the same type of individuals with at least one shared MHC allele Frequency.

돌연변이된 종양-특이적 펩타이드의 경우, 제시 확률을 예측하는데 사용되는 특징의 목록에는 돌연변이의 주석(예를 들어, 미스센스, 연속 판독, 격자 이동, 융합 등) 또는 논센스-매개된 붕괴(NMD)를 초래할 것을 돌연변이가 예측하는지 여부가 포함된다. 예를 들어, 동종접합성 조기-중지 돌연변이로 인해 종양 세포에서 번역되지 않는 단백질 세그먼트로부터의 펩타이드는 0의 제시 확률을 배정받을 수 있다. NMD는 mRNA 번역의 감소를 초래하며, 이는 제시 확률을 감소시킨다. For mutated tumor-specific peptides, the list of features used to predict the probability of presentation includes annotation of the mutation (e.g., missense, continuous reading, lattice shift, fusion, etc.) or nonsense-mediated disruption (NMD ) Whether or not the mutation predicts that it will result. For example, peptides from protein segments that are not translated in tumor cells due to homozygous early-stop mutations can be assigned a probability of presentation of zero. NMD results in a decrease in mRNA translation, which reduces the probability of presentation.

Ⅷ.C. 제시 확인 시스템Ⅷ.C. Presentation confirmation system

도 3은 일 구현예에 따른, 제시 확인 시스템(160)의 컴퓨터 로직 구성 요소를 나타내는 고-수준 블록선도이다. 이 예시적인 구현예에서, 제시 확인 시스템(160)은 데이터 관리 모듈(312), 암호화 모듈(314), 훈련 모듈(316) 및 예측 모듈(320)을 포함한다. 제시 확인 시스템(160)은 또한 훈련 데이터 스토어(170) 및 제시 모델 스토어(175)로 구성된다. 모델 관리 시스템(160)의 일부 구현예는 본 명세서에 기재된 것과 상이한 모듈을 갖는다. 유사하게, 함수는 본원에 설명된 것과 상이한 방식으로 모듈간에 분포될 수 있다. 3 is a high-level block diagram illustrating computer logic components of a presentation verification system 160, according to one implementation. In this exemplary implementation, the presentation verification system 160 includes a data management module 312, an encryption module 314, a training module 316, and a prediction module 320. The presentation verification system 160 also comprises a training data store 170 and a presentation model store 175. Some implementations of model management system 160 have different modules than those described herein. Similarly, functions can be distributed between modules in different ways than described herein.

Ⅷ.C.1. 데이터 관리 모듈Ⅷ.C.1. Data management module

데이터 관리 모듈(312)은 제시 정보(165)로부터 훈련 데이터 세트(170)를 생성한다. 각각의 훈련 데이터 세트는 적어도 제시된 또는 제시되지 않은 펩타이드 서열 p ⁱ , 펩타이드 서열 p ⁱ 과 관련된 하나 이상의 관련된 MHC 대립유전자 a ⁱ , 및 제시 확인 시스템(160)이 독립적인 변수의 신규한 값을 예측하는데 관심이 있다는 정보를 나타내는 의존적 변수 y ⁱ 를 포함하는 독립적인 변수 z ⁱ 의 세트를 각각의 데이터 사례 i가 포함하는 복수의 데이터 사례를 포함한다. Data management module 312 generates training data set 170 from presentation information 165. Each of the training data set to predict a new value for at least given or not present peptide sequences p ^i, peptide sequence p ⁱ MHC allele of one or more associated a ^i, and the present system (160) is variable independently associated with an independent variable z ⁱ comprising a set of dependent variables y ⁱ represents the information that is of interest comprises a plurality of case data including a respective data instance i.

본 명세서의 나머지에 걸쳐 언급된 특정한 일 구현예에서, 의존적 변수 y ⁱ 는 펩타이드 p ⁱ 가 하나 이상의 관련된 MHC 대립유전자 a ⁱ 에 의해 제시되었는지 여부를 나타내는 이원 라벨이다. 그러나, 다른 구현들에서, 의존적 변수 y ⁱ 는 제시 확인 시스템(160)이 독립적인 변수 z ⁱ 에 의존하여 예측하는데 관심이 있다는 임의의 다른 종류의 정보를 나타낼 수 있다. 예를 들어, 다른 구현예에서, 의존적 변수 y ⁱ 는 데이터 사례에 대해 확인된 질량 분광분석법 이온 전류를 나타내는 수치일 수도 있다. In one particular embodiment mentioned throughout the rest of the specification, the dependent variable y ⁱ is a peptide p ⁱ with one or more related MHC alleles. It is a binary label indicating whether or not it is presented by a ⁱ . However, in other implementations, the dependent variable y ⁱ can represent any other kind of information that the presentation verification system 160 is interested in predicting depending on the independent variable z ⁱ . For example, in other embodiments, the dependent variable y ⁱ may be a number representing mass spectrometry ion current identified for the data case.

데이터 사례 i에 대한 펩타이드 서열 p ⁱ 는 k _i 아미노산의 서열이며, 상기 k _i 는 범위 내의 데이터 사례들 i 간에 다를 수 있다. 예를 들어, 그 범위는 MHC 부류 I의 경우 8-15, MHC 부류 Ⅱ의 경우 6-30일 수 있다. 시스템(160)의 특정한 일 구현예에서, 훈련 데이터 세트 내의 모든 펩타이드 서열 p ⁱ 동일한 길이, 예를 들어, 9를 가질 수 있다. 펩타이드 서열 내의 아미노산의 수는 MHC 대립유전자의 유형(예를 들어, 인간의 MHC 대립유전자 등)에 따라 다를 수 있다. 데이터 사례 i에 대한 MHC 대립유전자 a ⁱ 는 상응하는 펩타이드 서열 p ⁱ 과 관련하여 어떤 MHC 대립유전자가 존재하는지를 나타낸다. Peptide sequence p ⁱ for the data case i is an amino acid sequence of k _i, the k _i may be different between the case of the data i in the range. For example, the range can be 8-15 for MHC class I and 6-30 for MHC class II. In one particular embodiment of system 160, all peptide sequences p ⁱ in the training data set may have the same length, eg, 9. The number of amino acids in the peptide sequence may vary depending on the type of MHC allele (eg, human MHC allele, etc.). The MHC allele a ⁱ for data case i is the corresponding peptide sequence It indicates which MHC allele is present in relation to p ⁱ .

데이터 관리 모듈(312)은 또한, 훈련 데이터(170) 내에 함유된 펩타이드 서열 p ⁱ 및 관련 MHC 대립유전자 a ⁱ 와 접합하여, 결합 친화성 b ⁱ 및 안정성 s ⁱ 와 같은 추가의 대립유전자-상호작용 변수를 포함할 수 있다. 예를 들어, 훈련 데이터(170)는 펩타이드 p ⁱ 와, a ⁱ 로 표시되는 각각의 관련된 MHC 분자 사이에 결합 친화성 예측 b ⁱ 를 함유할 수 있다. 다른 예로서, 훈련 데이터(170)는 a ⁱ 에 표시된 MHC 대립유전자 각각에 대한 안정성 예측 s ⁱ 를 함유할 수 있다. The data management module 312 also conjugates the peptide sequence p ⁱ and associated MHC allele a ⁱ contained in the training data 170, thereby further allele-interactions such as binding affinity b ⁱ and stability s ^i. You can include variables. For example, training data 170 may contain a predicted binding affinity b ⁱ between the peptide p ⁱ and each related MHC molecule represented by a ⁱ . As another example, training data 170 may contain stability prediction s ⁱ for each of the MHC alleles indicated in a ⁱ .

데이터 관리 모듈(312)은 또한 펩타이드 서열 p ⁱ 와 접합하여 C-말단 측접 서열 및 mRNA 정량화 측정과 같은 대립유전자-비상호작용 변수 w ⁱ 를 포함할 수 있다. The data management module 312 may also include allele-non-interacting variables w ⁱ such as C-terminal flanking sequences and mRNA quantification measurements in conjunction with the peptide sequence p ⁱ .

데이터 관리 모듈(312)은 또한 훈련 데이터(170)를 생성하기 위해 MHC 대립유전자에 의해 제시되지 않는 펩타이드 서열을 동정한다. 일반적으로, 이것은 제시되기 전에 제시된 펩타이드 서열을 포함하는 "더 긴" 원천 단백질 서열을 동정하는 것을 포함한다. 제시 정보가 조작된 세포주를 함유할 때, 데이터 관리 모듈(312)은 세포의 MHC 대립유전자 상에 제시되지 않은 것에 세포가 노출된 합성 단백질 내의 일련의 펩타이드 서열 세트를 동정한다. 제시 정보가 조직 샘플을 함유할 때, 데이터 관리 모듈(312)은 제시된 펩타이드 서열이 조직 샘플 세포의 MHC 대립유전자 상에 존재하지 않는 원천 단백질에서 유래된 원천 단백질을 동정하고, 상기 원천 단백질내 펩타이드 서열 세트를 동정한다. The data management module 312 also identifies peptide sequences that are not presented by the MHC allele to generate training data 170. Generally, this involves identifying a “longer” source protein sequence comprising the peptide sequence presented before it is presented. When the presentation information contains an engineered cell line, the data management module 312 identifies a set of peptide sequences in a synthetic protein in which the cells are exposed to those not presented on the cell's MHC allele. When the presentation information contains a tissue sample, the data management module 312 identifies a source protein derived from a source protein in which the presented peptide sequence is not present on the MHC allele of the tissue sample cell, and the peptide sequence in the source protein Identify the set.

데이터 관리 모듈(312)은 또한 아미노산의 랜덤 서열을 갖는 펩타이드를 인공적으로 생성할 수 있고, MHC 대립유전자 상에 제시되지 않은 펩타이드로서 생성된 서열을 동정할 수 있다. 이것은 펩타이드 서열을 무작위로 생성함으로써 달성될 수 있으며, 데이터 관리 모듈(312)은 MHC 대립유전자 상에 제시되지 않은 펩타이드에 대한 많은 양의 합성 데이터를 용이하게 생성할 수 있게 한다. 실제로, 작은 백분율의 펩타이드 서열이 MHC 대립유전자에 의해 제시되기 때문에, 합성적으로 생성된 펩타이드 서열은 세포에 의해 가공된 단백질내에 포함되더라도, MHC 대립유전자에 의해 제시되지 않았을 가능성이 매우 높다. The data management module 312 can also artificially generate peptides having a random sequence of amino acids, and identify sequences generated as peptides not presented on the MHC allele. This can be accomplished by randomly generating a peptide sequence, and the data management module 312 can easily generate large amounts of synthetic data for peptides not presented on the MHC allele. Indeed, since a small percentage of the peptide sequence is presented by the MHC allele, it is very likely that the synthetically generated peptide sequence was not presented by the MHC allele, even if it was included in the protein processed by the cell.

도 4는 일 구현예에 따른 훈련 데이터(170A)의 예시적인 세트를 도시한다. 구체적으로, 훈련 데이터(170A)의 제1 3개의 데이터 사례는 대립유전자 HLA-C*01:03 및 3개의 펩타이드 서열 QCEIOWAREFLKEIGJ, FIEUHFWI, 및 FEWRHRJTRUJR을 포함하는 단일-대립유전자 세포주로부터의 펩타이드 제시 정보를 나타낸다. 훈련 데이터(170A) 내의 제4 데이터 사례는 대립유전자 HLA-B*07:02, HLA-C*01:03, HLA-A*01:01 및 펩타이드 서열 QIEJOEIJE를 포함하는 다중-대립유전자 세포주로부터의 펩타이드 정보를 나타낸다. 제1 데이터 사례는 펩타이드 서열 QCEIOWARE가 대립유전자 HLA-DRB3:01:01에 의해 제시되지 않았음을 나타낸다. 이전 두 단락에서 논의된 바와 같이, 음으로 표지된 펩타이드 서열은 데이터 관리 모듈(312)에 의해 무작위로 생성되거나, 제시된 펩타이드의 원천 단백질로부터 동정될 수 있다. 훈련 데이터(170A)는 또한 1000nM의 결합 친화성 예측 및 펩타이드 서열-대립유전자 쌍에 대한 1시간 반감기의 안정성 예측을 포함한다. 훈련 데이터(170A)는 또한 대립유전자-비상호작용 변수, 예컨대 펩타이드 FJELFISBOSJFIE의 C-말단 측접 서열 및 10² TPM의 mRNA 정량화 측정을 포함한다. 제4 데이터 사례는 펩타이드 서열 QIEJOEIJE가 대립유전자 HLA-B*07:02, HLA-C*01:03, 또는 HLA-A*01:01 중 하나에 의해 제시되었음을 나타낸다. 훈련 데이터(170A)는 또한 펩타이드의 C-말단 측접 서열 및 펩타이드에 대한 mRNA 정량화 측정뿐만 아니라 대립유전자 각각에 대한 결합 친화성 예측 및 안정성 예측을 포함한다. 4 shows an exemplary set of training data 170A according to one implementation. Specifically, the first three data cases of the training data 170A showed peptide presentation information from a single-allele cell line comprising the allele HLA-C*01:03 and three peptide sequences QCEIOWAREFLKEIGJ, FIEUHFWI, and FEWRHRJTRUJR. Shows. The fourth data case in training data 170A is from a multi-allele cell line comprising the alleles HLA-B*07:02, HLA-C*01:03, HLA-A*01:01 and the peptide sequence QIEJOEIJE. Shows peptide information. The first data case indicates that the peptide sequence QCEIOWARE was not presented by allele HLA-DRB3:01:01. As discussed in the previous two paragraphs, the negatively labeled peptide sequence can be randomly generated by the data management module 312, or can be identified from the source protein of the presented peptide. Training data 170A also includes prediction of binding affinity of 1000 nM and stability prediction of 1 hour half-life for the peptide sequence-allele pair. Training data 170A also includes allele-non-interacting variables, such as the C-terminal flanking sequence of the peptide FJELFISBOSJFIE and mRNA quantification of 10 ² TPM. The fourth data case indicates that the peptide sequence QIEJOEIJE was presented by one of the alleles HLA-B*07:02, HLA-C*01:03, or HLA-A*01:01. Training data 170A also includes C-terminal flanking sequences of the peptides and mRNA quantification measurements for the peptides, as well as prediction of binding affinity and stability for each of the alleles.

Ⅷ.C.2. 암호화 모듈Ⅷ.C.2. Encryption module

암호화 모듈(314)은 훈련 데이터(170)에 함유된 정보를 하나 이상의 제시 모델을 생성하는데 사용될 수 있는 수치 표현으로 암호화한다. 일 구현예에서, 암호화 모듈(314)은 미리 결정된 20-문자 아미노산 알파벳에 걸쳐 서열(예를 들어, 펩타이드 서열 또는 C-말단 측접 서열)을 원-핫 인코딩한다. 구체적으로,

아미노산을 갖는 펩타이드 서열

은

개 요소의 행 벡터로서 나타내며, 이 경우 펩타이드 서열의 j-번째 위치의 아미노산의 알파벳에 해당하는

중에서 하나의 요소는 1의 값을 갖는다. 그렇지 않으면 나머지 요소의 값은 0이다. 예를 들어 주어진 알파벳 {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}에 대하여, 데이터 사례 i에 대한 3개 아미노산의 펩타이드 서열 EAF는 60개의 요소 p ⁱ =[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]의 행 벡터로 나타낼 수 있다. C-말단 측접 서열 c ⁱ 는 MHC 대립유전자에 대한 단백질 서열 d _h 및 제시 정보 내의 다른 서열 데이터뿐만 아니라, 상기 기술된 바와 같이 유사하게 코딩될 수 있다. The encryption module 314 encrypts the information contained in the training data 170 into a numerical representation that can be used to generate one or more presentation models. In one embodiment, coding module 314 one-hot encodes a sequence (eg, a peptide sequence or C-terminal flanking sequence) across a predetermined 20-letter amino acid alphabet. Specifically,

Peptide sequence with amino acids

silver

Represented as a row vector of elements, in this case the alphabet of the amino acid at the j-th position of the peptide sequence

One of the elements has a value of 1. Otherwise, the value of the remaining elements is 0. For a given alphabet {A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}, for example The peptide sequence EAF of the 3 amino acids for case i is 60 elements p ⁱ =[0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]. The C-terminal flanking sequence c ⁱ can be similarly encoded as described above, as well as protein sequence d _h for the MHC allele and other sequence data in the presentation information.

훈련 데이터(170)가 아미노산의 상이한 길이의 서열을 함유할 때, 암호화 모듈(314)은 사전 결정된 알파벳을 연장하기 위한 PAD 특성을 추가함으로써 동일한 길이의 벡터로 펩타이드를 추가로 인코딩할 수 있다. 예를 들어, 이는 펩타이드 서열의 길이가 훈련 데이터(170)에서 최대 길이를 갖는 펩타이드 서열에 도달할 때까지 PAD 특성을 갖는 펩타이드 서열을 좌측 패딩함으로써 수행될 수 있다. 따라서, 최대 길이를 갖는 펩타이드 서열이 k _max 아미노산을 가질 때, 암호화 모듈(314)은 각 서열을(20+1) k _max 요소의 행 벡터로 수치로 나타낸다. 예를 들어, 확장된 알파벳 {PAD, A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y} 및 최대 아미노산 길이가 k _max =5인 경우, 3개의 아미노산의 동일한 예시적인 펩타이드 서열 EAF는 105개 요소 p ⁱ =[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]의 행 벡터로 나타낼 수 있다. C-말단 측접 서열 c ⁱ 또는 다른 서열 데이터는 상기 기술한 바와 유사하게 인코딩될 수 있다. 따라서, 펩타이드 서열 p ⁱ 또는 c ⁱ 내의 각각의 독립적인 가변성 또는 칼럼은 서열의 특정 위치에서의 특정한 아미노산의 존재를 나타낸다. When the training data 170 contains sequences of different lengths of amino acids, the coding module 314 can further encode peptides into vectors of the same length by adding PAD properties to extend the predetermined alphabet. For example, this can be done by padding the peptide sequence with PAD properties to the left until the length of the peptide sequence reaches the peptide sequence with the maximum length in the training data 170. Thus, when the peptide sequence with the maximum length has k _max amino acids, the coding module 314 numerically represents each sequence as a row vector of ( 20+1 ) k _max elements. For example, the expanded alphabet {PAD, A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y} And when the maximum amino acid length is k _max =5 , the same exemplary peptide sequence EAF of 3 amino acids is 105 elements p ⁱ =[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]. The C-terminal flanking sequence c ⁱ or other sequence data can be encoded similarly as described above. Thus, each independent variability or column in the peptide sequence p ⁱ or c ⁱ indicates the presence of a particular amino acid at a particular position in the sequence.

서열 데이터를 암호화하는 상기 방법은 아미노산 서열을 갖는 서열을 참조하여 기술되었지만, 상기 방법은 DNA 또는 RNA 서열 데이타 등과 같은 다른 유형의 서열 데이타로 유사하게 연장될 수 있다. The method of encoding sequence data has been described with reference to a sequence having an amino acid sequence, but the method can similarly be extended to other types of sequence data such as DNA or RNA sequence data.

또한, 암호화 모듈(314)은 m 요소의 행 벡터로서 데이터 사례 i에 대한 하나 이상의 MHC 대립유전자 a ⁱ 를 인코딩하며, 각 요소

은 특유의 확인된 MHC 대립유전자에 상응한다. 데이터 사례 i에 대해 확인된 MHC 대립유전자에 해당하는 요소의 값은 1이다. 그렇지 않으면 나머지 요소의 값은 0이다. 예를 들어, m=4 특유의 확인된 MHC 대립유전자 유형 {HLA-A*01:01, HLA-C*01:08, HLA-B*07:02, HLA-DRB1*10:01 } 중 다중-대립유전자 세포주에 해당하는 데이터 사례 i에 대한 대립유전자 HLA-B*07:02 및 HLA-DRB1*10:01은 4 원소의 행 벡터 a ⁱ =[0 0 1 1]로 표현될 수 있으며, a ₃ ⁱ =1 및 a ₄ ⁱ =1이다. 실시예는 4개의 확인된 MHC 대립유전자 유형으로 본원에 기술되었지만, 실제로 MHC 대립유전자 유형의 수는 수백 또는 수천이 될 수 있다. 앞에서 논의한 바와 같이, 각 데이터 사례 i는 통상 펩타이드 서열 p _i 와 관련하여 최대 6개의 상이한 MHC 대립유전자 유형을 함유한다. Further, the encryption module 314 encodes one or more MHC alleles a ⁱ for data case i as a row vector of m elements, each element

Corresponds to the unique identified MHC allele. The value of the element corresponding to the MHC allele identified for data case i is 1. Otherwise, the value of the remaining elements is 0. For example, multiple of m=4 specific identified MHC allele types {HLA-A*01:01, HLA-C*01:08, HLA-B*07:02, HLA-DRB1*10:01} -Alleles HLA-B*07:02 and HLA-DRB1*10:01 for data case i corresponding to the allele cell line can be represented by the row vector a ⁱ =[0 0 1 1] of 4 elements, a ₃ ⁱ =1 and a ₄ ⁱ =1. Although the examples have been described herein with four identified MHC allele types, in practice the number of MHC allele types can be hundreds or thousands. As discussed above, each data case i usually contains up to six different MHC allelic types in relation to the peptide sequence p _i .

또한, 암호화 모듈(314)은 각각의 데이터 사례 i에 대한 라벨 y _i 를 {0, 1}의 세트로부터의 값을 갖는 2원 변수로서 암호화하며, 1의 값은 펩타이드 x ⁱ 가 관련된 MHC 대립유전자 a ⁱ 중 하나에 의해 제시되었음을 나타내고, 0의 값은 펩타이드 x ^i가 관련된 MHC 대립유전자 a ⁱ 중 하나에 의해 제시되지 않음을 나타낸다. 의존적 변수 y _i 가 질량 분광분석 이온 전류를 나타낼 때, 암호화 모듈(314)은 다양한 함수를 사용하여 값을 추가로 스케일링할 수 있는데, 로그 함수는 [0, ∞) 사이의 이온 전류값에 대하여 (-∞, ∞)의 범위를 갖는다.In addition, the encryption module 314 encrypts the label y _i for each data case i as a binary variable having a value from the set of {0, 1}, the value of 1 being the MHC allele associated with the peptide x ⁱ It indicates that the presented by one of a ^i, represents the value of 0 is not shown by one of the peptides x ^{i is} related to MHC allele a ^i. When the dependent variable y _i represents mass spectroscopic ion current, the cryptographic module 314 can further scale the value using various functions, where the logarithmic function is calculated for the ion current value between (0, ∞) ( -∞, ∞).

암호화 모듈(314)은 펩타이드 p _i 에 대해 한쌍의 대립유전자-상호작용 변수

및, 대립유전자-상호작용 변수의 수치 표현이 교대로 연결된 행 벡터로서 관련된 MHC 대립유전자 h를 나타낼 수 있다. 예를 들어, 암호화 모듈(314)은

와 균등한 행 벡터로서

를 나타낼 수 있으며, 상기 b _h ⁱ 는 펩타이드 p _i 및 관련된 MHC 대립유전자 h에 대한 결합 친화성, 및 안정성에 대한 s _h ⁱ 에 대한 유사하게 결합 친화성 예측이다. 대안적으로, 대립유전자-상호작용 변수의 하나 이상의 조합은 개별적으로(예를 들어, 개별 벡터 또는 매트릭스로서) 저장될 수 있다. The encoding module 314 is paired allele-interaction variable for the peptide p _i

And, an MHC allele h related as a row vector in which numerical representations of allele-interaction variables are alternately linked. For example, the encryption module 314

And as an even row vector

B _h ⁱ is a similar binding affinity prediction for s _h ⁱ for peptide p _i and related MHC allele h for stability, and stability. Alternatively, one or more combinations of allele-interaction variables can be stored individually (eg, as separate vectors or matrices).

일 사례에서, 암호화 모듈(314)은 대립유전자-상호작용 변수 x _h ⁱ 에 결합 친화력에 대한 측정된 또는 예측된 값을 편입시킴으로써 결합 친화성 정보를 나타낸다. In one example, cryptographic module 314 presents binding affinity information by incorporating the measured or predicted value for binding affinity into the allele-interaction variable x _h ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자 상호작용 변수 x _h ⁱ 에 결합 안정성에 대한 측정된 또는 예측된 값을 편입시킴으로써 결합 안정성 정보를 나타낸다. In one example, cryptographic module 314 displays binding stability information by incorporating measured or predicted values for binding stability into the allele interaction variable x _h ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자 상호작용 변수 x _h ⁱ 에 결합 온-레이트에 대한 측정된 또는 예측된 값을 편입시킴으로써 결합 온 레이트 정보를 나타낸다. In one example, cryptographic module 314 presents the binding on rate information by incorporating the measured or predicted value for binding on-rate into the allele interaction variable x _h ⁱ .

일 사례에서, 부류 I MHC 분자에 의해 제시된 펩타이드에 대해, 암호화 모듈(314)은 펩타이드 길이를 벡터

로서 나타내며, 상기

은 표지 함수이며, 및 L _k 는 펩타이드 p _k 의 길이를 지칭한다. 벡터 T _k 는 대립유전자-상호작용 변수 x _h ⁱ 에 포함될 수 있다. 다른 사례에서, 부류 II MHC 분자에 의해 제시된 펩타이드에 대해, 암호화 모듈(314)은 펩타이드 길이를 벡터 In one example, for peptides presented by class I MHC molecules, the coding module 314 vector peptide length

Denoted as above,

Is a label function, and L _k refers to the length of the peptide p _k . The vector T _k can be included in the allele-interaction variable x _h ⁱ . In other instances, for peptides presented by class II MHC molecules, the coding module 314 vector peptide length

로서 나타내며, 상기

은 표지 함수이며, 및 L _k 는 펩타이드 p _k 의 길이를 지칭한다. 벡터 T _k 는 대립유전자-상호작용 변수 x _h ⁱ 에 포함될 수 있다.

Denoted as above,

Is a label function, and L _k refers to the length of the peptide p _k . The vector T _k can be included in the allele-interaction variable x _h ⁱ .

일 사례에서, 암호화 모듈(314)은 MHC 대립유전자의 RNA-서열 분석에 기초한 발현 수준을 대립유전자-상호작용 변수 x _h ⁱ 내에 편입시킴으로써 MHC 대립유전자의 RNA 발현 정보를 나타낸다. In one example, the coding module 314 determines the expression level based on RNA-sequence analysis of the MHC allele allele-interaction variable x _h ⁱ By incorporation into it, RNA expression information of the MHC allele is shown.

유사하게, 암호화 모듈(314)은 대립유전자-비상호작용 변수의 수치 표현이 교대로 연결된 행 벡터로서 대립유전자-비상호작용 변수 w ⁱ 를 나타낼 수 있다. 예를 들어 w ⁱ 는 [c ⁱ ] 또는 [c ⁱ m ⁱ w ⁱ ]와 동일한 행 벡터일 수 있으며, 상기 w ⁱ 는 펩타이드 p ⁱ 의 C-말단 측접 서열 및 펩타이드와 관련된 mRNA 정량화 측정 m ⁱ 이외에 임의의 다른 대립유전자-비상호작용 변수를 나타내는 행 벡터이다. 대안적으로, 대립유전자-비상호작용 변수의 하나 이상의 조합은 개별적으로(예를 들어, 개별 벡터 또는 매트릭스로서) 저장될 수 있다. Similarly, the encryption module 314 may represent the allele-non-interaction variable w ⁱ as a row vector in which numerical representations of allele-non-interaction variables are alternately connected. For example, w ⁱ is [ c ⁱ ] or It may be the same row vector as [ c ⁱ m ⁱ w ⁱ ], wherein w ⁱ is a C-terminal flanking sequence of peptide p ⁱ and mRNA quantitation measurement related to the peptide m ⁱ In addition, it is a row vector representing any other allele-non-interaction variable. Alternatively, one or more combinations of allele-non-interacting variables can be stored individually (eg, as separate vectors or matrices).

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 에 전환율 또는 반감기를 포함시킴으로써 펩타이드 서열에 대한 원천 단백질의 전환율을 나타낸다. In one example, coding module 314 indicates the conversion of the source protein to the peptide sequence by including the conversion or half-life in the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 에 단백질 길이를 포함시킴으로써 원천 단백질 또는 동형체의 길이를 나타낸다. In one example, coding module 314 indicates the length of the source protein or isoform by including the protein length in the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 에서

하위단위를 포함하는 면역프로테아솜-특이적 프로테아솜 하위단위의 평균 발현을 통합함으로써 면역프로테아솜의 활성화를 나타낸다. In one example, the cryptographic module 314 is in the allele-non-interaction variable w ⁱ

The activation of the immunoproteasome is indicated by incorporating the average expression of the immunoproteasome-specific proteasome subunit comprising the subunit.

일 사례에서, 암호화 모듈(314)은 펩타이드의 원천 단백질의 RNA-서열 분석 존재도를 나타내거나, 또는 펩타이드의 유전자 또는 전사체 (RSEM과 같은 기술에 의해 FPKM, TPM의 단위로 정량화됨)는 대립유전자-비상호작용 변수 w ⁱ 내 원천 단백질의 존재도를 포함시킬 수 있다. In one example, the coding module 314 indicates the RNA-sequence presence of the peptide's source protein, or the peptide's gene or transcript (quantized in units of FPKM, TPM by techniques such as RSEM) is opposed The presence of the source protein in the gene-non-interacting variable w ⁱ can be included.

일 사례에서, 암호화 모듈(314)은 펩타이드의 기원의 전사체가 예를 들어 하기 문헌의 모델에 의해 추정된 바와 같이 논센스-매개된 붕괴(NMD)를 겪을 확률을 나타낸다: Rivas 등, Science , 2015, 대립유전자-비상호작용 변수 w ⁱ 내 확률을 포함시킨다. In one example, the coding module 314 represents the probability that the transcript of the origin of the peptide will undergo nonsense-mediated decay (NMD) as estimated, for example, by a model in the literature below: Rivas et al., Science , 2015. , Allele-non-interaction variable w ⁱ Include my odds.

일 사례에서, 암호화 모듈(314)은 예를 들어 하기를 사용하여, TPM 단위로 경로내 유전자의 발현을 정량화함으로써 RNA-서열 분석을 통해 평가된 유전자 모듈 또는 경로의 활성화 상태를 나타내며, 경로내 각 유전자에 대해 RSEM을 수행한 다음 경로의 유전자 전반에 걸친 요약 통계, 예를 들어, 평균을 계산한다. 평균은 대립유전자-비상호작용 변수 w ⁱ 에 통합될 수 있다. In one example, the coding module 314 represents the activation state of the gene module or pathway evaluated through RNA-sequencing by quantifying the expression of the gene in the pathway in TPM units, for example, using RSEM is performed on the gene, then summary statistics across the gene in the pathway, eg, averages, are calculated. The mean can be incorporated into the allele-non-interaction variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 에 복제 수를 통합함으로써 공급원 유전자의 복제 수를 나타낸다. In one example, cryptographic module 314 is an allele-non-interacting variable The number of copies of the source gene is represented by integrating the number of copies in w ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 에서 측정된 또는 예상된 TAP 결합 친화성(예를 들어, 나노몰 단위)를 포함시킴으로써 TAP 결합 친화성을 나타낸다. In one example, encryption module 314 exhibits TAP binding affinity by including the measured or expected TAP binding affinity (eg, in nanomolar units) measured in the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 하기 변수내 RNA-서열 분석에 의해 측정된(및 예를 들어, 하기) TAP 발현 수준을 포함함으로써 TAP 발현 수준을 나타낸다: 대립유전자-비상호작용 변수 w ⁱ 내에서 (예를 들어, RSEM에 의해 TPM의 단위로 정량화된).In one example, the coding module 314 represents the TAP expression level by including the TAP expression level measured by RNA-sequencing in the following variables (and, for example, the following): allele-non-interaction variable w ⁱ Within (eg quantified in units of TPM by RSEM).

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 내 지표 변수의 벡터로서 종양 돌연변이를 나타낸다(즉, 펩타이드 p ^k 가 KRAS G12D 돌연변이가 있는 샘플에서 도출된 경우, d ^k = 1이고, 그렇지 않은 경우에는 0임).In one example, the cryptographic module 314 is an allele-non-interacting variable w ⁱ Indicates a tumor mutation as a vector of my indicator variable (i.e., d ^k = 1 if the peptide p ^k is derived from a sample with a KRAS G12D mutation, otherwise 0).

일 사례에서, 암호화 모듈(314)은 표지 변수의 벡터로서 항원 제시 유전자에서의 생식 계열 다형성을 나타낸다(즉, 펩타이드 p ^k 가 TAP내 특이적 생식 계열 다형성을 가진 샘플에서 도출된 경우, d ^k = 1임). 이들 지표 변수는 대립유전자-비상호작용 변수 w ⁱ 내에 포함될 수 있다. In one example, the encryption module 314 represents a germline polymorphism in the antigen-presenting gene as a vector of sign variable (that is, when the peptide p ^k derived from the sample with an I-specific germline polymorphism TAP, d ^k = 1). These indicator variables can be included in the allele-non-interaction variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 종양 유형(예를 들어, NSCLC, 흑색종, 결장직장암 등)의 알파벳에 대해 길이-1 원-핫 인코딩된 벡터로서 종양 유형을 나타낸다. 이러한 원-핫-인코딩된 변수는 대립유전자-비상호작용 변수 w ⁱ 에 포함될 수 있다. In one example, coding module 314 represents the tumor type as a length-1 one-hot encoded vector for the alphabet of tumor type (eg, NSCLC, melanoma, colorectal cancer, etc.). This one-hot-encoded variable can be included in the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 상이한 접미사를 갖는 4자리 HLA 대립유전자를 처리함으로써 MHC 대립유전자 접미사를 나타낸다. 예를 들어, HLA-A*24:09N은 모델 목적상 HLA-A*24:09와는 상이한 대립유전자로 간주된다. 대안적으로, N 접미사로 끝나는 HLA 대립유전자가 발현되지 않기 때문에, N-접미어 MHC 대립유전자에 의한 제시 확률은 모든 펩타이드에 대해 0으로 설정될 수 있다. In one example, cryptographic module 314 represents the MHC allele suffix by processing a four digit HLA allele with a different suffix. For example, HLA-A*24:09N is considered an allele different from HLA-A*24:09 for model purposes. Alternatively, since the HLA allele ending with the N suffix is not expressed, the probability of presentation by the N-suffix MHC allele can be set to zero for all peptides.

일 사례에서, 암호화 모듈(314)은 종양 하위유형(예를 들어, 폐 선암종, 폐 편평상피세포 암종 등)의 알파벳에 대해 길이-1 원-핫 인코딩된 벡터로서 종양 아형을 나타낸다. 이러한 원핫-인코딩된 변수는 대립유전자-비상호작용 변수 w ⁱ 에 포함될 수 있다. In one example, coding module 314 represents a tumor subtype as a length-1 one-hot encoded vector for the alphabet of tumor subtypes (eg, lung adenocarcinoma, lung squamous cell carcinoma, etc.). Such one hot-encoded variable may be included in the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 wi에 포함될 수 있는 이원 지표 변수(환자가 흡연 이력이 있는 경우 (d ^k = 1, 그렇지 않은 경우 0)로서 흡연 이력을 나타낸다. 대안적으로, 흡연 이력은 흡연 중증도의 알파벳에 대한 길이-1 원-핫-인코딩된 변수로서 암호화될 수 있다. 예를 들어, 흡연 상태는 1-5 척도로 평가될 수 있으며, 1은 비 흡연자를 나타내고, 5는 최근의 중증 흡연자를 나타낸다. 흡연 이력은 주로 폐 종양과 관련되어 있기 때문에, 여러 종양 유형에 대한 모델을 훈련할 때 이 변수는 환자가 흡연 이력이 있고 종양 유형이 폐 종양인 경우 1과 동일한 것으로 정의될 수 있으며, 다른 경우 0일 수 있다. In one example, the cryptographic module 314 represents the smoking history as a dual indicator variable (if the patient has a smoking history ( d ^k = 1, otherwise 0), which may be included in the allele-non-interacting variable wi. Alternatively, smoking history can be encoded as a length-1 one-hot-encoded variable for the alphabet of smoking severity, for example, smoking status can be rated on a 1-5 scale, and 1 is for non-smokers And 5 represents recent severe smokers. Since smoking history is primarily related to lung tumors, this variable is used when training a model for several tumor types, where the patient has a history of smoking and the tumor type is lung tumor 1 It may be defined as the same, and may be 0 in other cases.

일 사례에서, 암호화 모듈(314)은 2원 지표 변수로서 햇볕 화상 이력을 나타내며(환자가 중증 햇볕 화상의 이력을 갖는 경우에는 (d ^k = 1이며, 그렇지 않은 경우 0), 이는 대립유전자-비상호작용 변수 w ⁱ 에 포함될 수 있다. 중증 햇볕 화상은 주로 흑색종과 관련이 있기 때문에, 여러 종양 유형의 모델을 훈련할 때 이 변수는 환자가 중증 햇볕 화상의 이력이 있고 종양 유형이 흑색종인 경우 1과 동일한 것으로 정의될 수 있으며, 그렇지 않은 경우 0이다. In one example, the encryption module 314 represents the sunburn history as a binary indicator variable (if the patient has a history of severe sunburn ( d ^k = 1, otherwise 0), it is an allele-emergency It can be included in the interaction variable w ⁱ . Because severe sunburn is mainly associated with melanoma, this variable is used when the patient has a history of severe sunburn and the tumor type is melanoma when training models of multiple tumor types 1 It can be defined as the same as, 0 otherwise.

일 사례에서, 암호화 모듈(314)은 참조 데이터베이스, 예컨대 TCGA를 사용하여 발현 수준의 분포의 요약 통계(예를 들어, 평균, 중앙값)로서 인간 게놈 내의 각 유전자 또는 전사체에 대한 특정한 유전자 또는 전사체의 발현 수준의 분포를 나타낸다. 구체적으로, 종양 유형 흑색종을 갖는 샘플내 펩타이드 p ^k 에 대해, 대립유전자-비상호작용 변수 w ⁱ 내 펩타이드 p ^k 의 기원의 유전자 또는 전사체의 측정된 유전자 또는 전사체 발현 수준뿐만 아니라 TCGA로 측정된, 흑색종내 펩타이드 p ^k 의 유전자 또는 전사체의 평균 및/또는 중간 유전자 또는 전사체 발현을 포함할 수 있다. In one example, the coding module 314 is a specific gene or transcript for each gene or transcript in the human genome as a summary statistic (eg, mean, median) of the distribution of expression levels using a reference database, such as TCGA. It shows the distribution of expression levels. Specifically, for the sample within the peptide p ^k with a tumor type melanoma, allele-Non-interactive parameters w ⁱ I peptide gene or the gene or transcript expression level measured in the transfer member of the p ^k origin, as well as measured by TCGA May include the average and/or intermediate gene or transcript expression of the gene or transcript of the peptide p ^k in melanoma.

일 사례에서, 암호화 모듈(314)은 돌연변이 유형을 돌연변이 유형(예컨대, 미스센스, 격자 이동, NMD-유도 등)의 알파벳에 대한 길이-1 원-핫-인코딩된 변수로서 나타낸다. 이러한 원핫-인코딩된 변수는 대립유전자-비상호작용 변수 w ⁱ 에 포함될 수 있다. In one example, encryption module 314 represents the type of mutation as a length-1 one-hot-encoded variable for the alphabet of the type of mutation (eg, missense, lattice shift, NMD-induced, etc.). Such one hot-encoded variable may be included in the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 대립유전자-비상호작용 변수 w ⁱ 내 원천 단백질에서 주석의 값으로서 단백질-수준 특징(예를 들어, 5' UTR 길이)을 나타낸다. 또 다른 사례에서, 암호화 모듈(314)은 지표 변수를 포함시킴으로써 펩타이드 p ⁱ 에 대한 원천 단백질의 잔기-레벨 주석을 나타내며, 이는 펩타이드 p ⁱ 가 나선 모티프와 중첩되는 경우 1이며, 그렇지 않은 경우 0이며, 또는 펩타이드 p ⁱ 가 대립유전자-비상호작용 변수 w ⁱ 내 나선 모티프 내에 완전히 함유되어 있으면 1이다. 다른 사례에서, 나선 모티프 주석 내에 함유된 펩타이드 p ⁱ 내의 잔기의 비율을 나타내는 특징은 대립유전자-비상호작용 변수 w ⁱ 이다. In one example, the cryptographic module 314 is an allele-non-interacting variable w ⁱ It represents protein-level characteristics (eg, 5'UTR length) as the value of tin in my source protein. In another example, coding module 314 represents the residue-level annotation of the source protein for peptide p ⁱ by including an indicator variable, which is 1 if peptide p ⁱ overlaps the helical motif, otherwise 0 , Or peptide p ⁱ is the allele-non-interacting variable w ⁱ It is 1 if it is completely contained in my spiral motif. In another case, the peptide p ⁱ contained in the spiral motif annotation A feature that indicates the proportion of residues in is the allele-non-interacting variable w ⁱ .

일 사례에서, 암호화 모듈(314)은 길이가 인간 단백체내 단백질 또는 동형체의 수와 동일한 길이를 갖는 지표 벡터 o ^k 로서 인간 단백체내 단백질 또는 동형체의 유형을 나타내며, 펩타이드 p ^k 가 단백질 i로부터 유래된다면 상응하는 요소 o ^k _i 는 1이며, 그렇지 않으면 0이다. In one example, the coding module 314 represents the type of protein or isoform in human protein as an index vector o ^k whose length is equal to the number of proteins or isoforms in human protein, and the peptide p ^k from protein i If derived, the corresponding element o ^k _i is 1, otherwise 0.

일 사례에서, 암호화 모듈(314)은 L 가능한 카테고리를 갖는 범주 변수로서 펩타이드 p ⁱ 의 공급원 유전자 G=gene(p ⁱ )를 나타내며, 여기서 L은 인덱싱된 공급원 유전자 1, 2, ..., L의 수의 상한을 나타낸다.In one example, the coding module 314 represents the source gene G=gene( p ⁱ ) of the peptide p ⁱ as a categorical variable with L possible categories, where L is the indexed source gene 1, 2, ..., L Indicates the upper limit of the number.

일례에서, 인코딩 모듈 (314)은 M 개의 가능한 카테고리를 갖는 범주 변수로서 펩티드 p ⁱ 의 조직 유형, 세포 유형, 종양 유형 또는 종양 조직학 유형 T = 조직 (p ⁱ )을 나타내며, 여기서 M은 색인화된 유형 1, 2,…, M의 수의 상한을 나타낸다. 조직의 유형은 예를 들어 폐 조직, 심장 조직, 장 조직, 신경 조직 등을 포함할 수있다. 세포의 유형은 수지상 세포, 대식세포, CD4 T 세포 등을 포함할 수 있다. 종양의 유형은 폐 선암종, 폐 편평 세포 암종, 흑색종, 비호지킨 림프종 등을 포함할 수 있다.In one example, encoding module 314 is a categorical variable with M possible categories and indicates the tissue type, cell type, tumor type or tumor histology type T = tissue ( p ⁱ ) of peptide p ⁱ , where M is an indexed type. 1, 2,… , Represents the upper limit of the number of M. Types of tissue may include, for example, lung tissue, heart tissue, intestinal tissue, nerve tissue, and the like. Cell types may include dendritic cells, macrophages, CD4 T cells, and the like. Types of tumors can include lung adenocarcinoma, lung squamous cell carcinoma, melanoma, non-Hodgkin's lymphoma, and the like.

또한, 암호화 모듈(314)은 대립유전자-상호작용 변수 x ⁱ 및 대립유전자-비상호작용 변수 w ⁱ 의 수치 표현이 교대로 연결된 행 벡터로서 펩타이드

및 관련된 MHC 대립유전자 h에 대한 변수들 z ⁱ 의 전반적인 세트를 나타낼 수 있다. 예를 들어, 암호화 모듈(314)은

또는

와 동일한 행 벡터로서

를 나타낼 수 있다. In addition, the encryption module 314 is a peptide as a row vector in which the numerical expression of allele-interaction variable x ⁱ and allele-non-interaction variable w ⁱ are alternately connected.

And the overall set of variables z ⁱ for the related MHC allele h . For example, the encryption module 314

or

As the same row vector as

Can represent.

Ⅸ. 훈련 모듈Ⅸ. Training module

훈련 모듈(316)은 펩타이드 서열이 펩타이드 서열과 관련된 MHC 대립유전자에 의해 제시될 것인지 여부의 가능성을 생성하는 하나 이상의 제시 모델을 구성한다. 구체적으로, 펩타이드 서열 p ^k 및 펩타이드 서열 p ^k 와 관련된 MHC 대립유전자

의 세트가 주어진 경우, 각 제시 모델은 펩타이드 서열 p ^k 가 관련된 MHC 대립유전자 a ^k 중 하나 이상에 의해 제시될 가능성을 나타내는 추정치를 생성한다. The training module 316 constructs one or more presentation models that generate the possibility of whether the peptide sequence will be presented by the MHC allele associated with the peptide sequence. Specifically, the MHC allele associated with the peptide sequence p ^k and the peptide sequence p ^k

Given a set of, each presentation model produces an estimate indicating the likelihood that the peptide sequence p ^k will be presented by one or more of the associated MHC alleles a ^k .

ⅨⅨ .A.A . 개요. summary

훈련 모듈(316)은 (165)에 저장된 제시 정보로부터 생성된 스토어(170)에 저장된 훈련 데이터 세트에 기초하여 하나 이상의 제시 모델을 구성한다. 일반적으로, 특정한 유형의 제시 모델에 관계없이, 모든 제시 모델은 손실 함수가 최소화되도록 훈련 데이터(170)에서 독립 변수와 종속 변수 사이의 의존성을 포착한다. 구체적으로, 손실 함수

는 훈련 데이터 (170)에서의 하나 이상의 데이터 예 S 및 제시 모델에 의해 생성되는 데이터 예 S에 대해서 추정된 가능치

에 대하여 독립적인 변수들

의 수치들 간의 불일치를 나타낸다. 본 명세서의 나머지 부분에서 언급된 특정한 구현예에서, 손실 함수

는 하기와 같이 수학식 (1a)에 의해 주어진 음의 로그 가능성 함수이다: The training module 316 constructs one or more presentation models based on the training data set stored in the store 170 generated from the presentation information stored in 165. In general, regardless of a particular type of presentation model, all presentation models capture the dependency between independent and dependent variables in training data 170 so that the loss function is minimized. Specifically, loss function

Is estimated likelihood for one or more data examples S in the training data 170 and data examples S generated by the presentation model.

Variables independent of

Indicates the discrepancy between the values. In certain embodiments mentioned in the rest of the specification, the loss function

Is the negative logarithmic function given by equation (1a) as follows:

그러나 실제로는 다른 손실 함수가 사용될 수 있다. 예를 들어, 질량 분광분석법 이온 전류에 대한 예측이 이루어질 때, 손실 함수는 하기와 같이 수학식 1b에 의해 주어진 제곱평균 손실이다: However, in practice other loss functions can be used. For example, when prediction for mass spectrometry ion current is made, the loss function is the square mean loss given by equation 1b as follows:

제시 모델은 하나 이상의 파라미터 θ가 독립 변수와 종속 변수 사이의 의존성을 수학적으로 지정하는 파라미터 모델일 수 있다. 통상 손실 함수

는 배치 구배 알고리즘, 확률적 구배 알고리즘 등과 같은 구배-기반 수치 최적화 알고리즘을 통해 결정된다. 대안적으로, 제시 모델은 모델 구조가 훈련 데이터(170)로부터 결정되고 고정된 파라미터 세트에 엄격하게 기초하지 않는 비-파라미터 모델일 수 있다. The presentation model may be a parameter model in which one or more parameters θ mathematically specify a dependency between the independent variable and the dependent variable. Normal loss function

Is determined through a gradient-based numerical optimization algorithm such as a batch gradient algorithm, a stochastic gradient algorithm, and the like. Alternatively, the presentation model may be a non-parametric model whose model structure is determined from training data 170 and is not strictly based on a fixed set of parameters.

ⅨⅨ .B.B . 과-대립유전자 모델. Over-allele model

훈련 모듈(316)은 과-대립유전자 기준으로 펩타이드의 제시 가능성을 예측하기 위해 제시 모델을 구성할 수 있다. 이 경우에, 훈련 모듈(316)은 단일 MHC 대립유전자를 발현하는 세포로부터 생성된 훈련 데이터(170) 내의 데이터 사례들 S에 기초하여 제시 모델들을 훈련할 수 있다. The training module 316 can construct a presentation model to predict the likelihood of presentation of the peptide on a per-allele basis. In this case, the training module 316 can train the presentation models based on data cases S in training data 170 generated from cells expressing a single MHC allele.

일 구현예에서, 훈련 모듈(316)은 하기 식에 의해 특이적인 대립유전자 h에 대하여 펩타이드 p ^k 에 대한 추정된 제시 가능성 u _k 을 모델링한다: In one embodiment, the training module 316 models the estimated suggested likelihood u _k for the peptide p ^k for a specific allele h by the formula:

여기서 펩타이드 서열 x _h ^k 은 펩타이드 p ^k 에 대해 암호화된 대립유전자-상호작용 변수를 지칭하며, 대응하는 MHC 대립유전자 h,f(·)는 임의의 함수이며, 본원에서 설명의 편의를 위해 변형 함수로 지칭된다. 또한, g _h _(·) 는 임의의 함수이며, 설명의 편의를 위해 의존성 함수로 지칭되며, MHC 대립유전자 h에 대해 결정된 파라미터

에 기반하여 대립유전자-상호작용 변수

를 위한 의존성 스코어를 생성한다. 각 MHC 대립유전자 h에 대한 파라미터

의 세트의 값은

와 관련된 손실 함수를 최소화시킴으로써 결정될 수 있으며, 여기서, i는 단일 MHC 대립유전자 h를 발현하는 세포들로부터 생성된 훈련 데이터(170)의 서브셋 S 내의 각 사례이다. Here, the peptide sequence x _h ^k refers to the allele-interaction variable encoded for the peptide p ^k , and the corresponding MHC allele h , f(·) is an arbitrary function, and for convenience of description herein, a modification function It is referred to as Also, g _h _(·) Is an arbitrary function, referred to as a dependency function for convenience of explanation, and the parameter determined for the MHC allele h

Based on allele-interaction variables

Generate a dependency score for Parameters for each MHC allele h

The value of the set of

Can be determined by minimizing the loss function associated with, where i is each case in subset S of training data 170 generated from cells expressing a single MHC allele h .

의존성 함수

결과는 적어도 대립유전자 상호작용 특징

를 기반으로 한, 그리고 특히 펩타이드 p ^k 의 펩타이드 서열의 아미노산의 위치를 기반으로 한, 상응하는 신생항원에 MHC 대립유전자 h가 존재하는지 여부를 나타내는 MHC 대립유전자 h에 대한 의존성 스코어를 나타낸다. 예를 들어, MHC 대립유전자 h에 대한 의존성 스코어는 MHC 대립유전자 h가 펩타이드 p ^k 에 존재할 가능성이 있는 경우 높은 값을 가질 수 있고, 제시가 어려울 경우 낮은 값을 가질 수 있다. 변환 함수 f(·)는 입력을 변환시키며, 보다 구체적으로 이 경우

에 의해 생성된 의존성 스코어를 MHC 대립유전자에 의해 펩타이드 p ^k 가 제시될 가능성을 나타내는 적당한 값으로 변환시킨다. Dependency function

The results are at least characterized by allele interactions

One is based, and in particular the peptide p of the position based on the amino acid of the peptide sequence ^k, indicates the corresponding score dependent on MHC allele h indicating whether the MHC alleles present in the start-h antigen. For example, the game dependent on MHC allele h may have a high value when the MHC allele h that may be present in the peptide p ^k, when it is difficult suggested may have a lower value. The conversion function f(·) converts the input, more specifically in this case

The dependence score generated by is transformed into an appropriate value indicating the likelihood that the peptide p ^k will be presented by the MHC allele.

본 명세서의 나머지 전체에 걸쳐 언급되는 특정한 일 구현예에서, f(·)는 적절한 도메인 범위에서 [0, 1]의 범위를 갖는 함수이다. 일 예에서, f(·)는 다음에 의해 주어진 expit 함수이다: In one particular embodiment mentioned throughout the rest of the specification, f(·) is a function with a range of [0, 1] in the appropriate domain range. In one example, f(·) is the expit function given by:

또 다른 예로, f(·)는 도메인 z에 대한 값이 0 이상일 때 하기의 수식에 의해 주어진 쌍곡선 탄젠트 함수가 될 수 있다: As another example, f(·) can be a hyperbolic tangent function given by the following equation when the value for domain z is 0 or more:

대안적으로, [0, 1] 범위를 벗어나는 값을 갖는 질량 분광분석법 이온 전류에 대한 예측이 이루어지면 f(·)는 항등 함수, 지수 함수, 로그 함수 등과 같은 임의의 함수일 수 있다. Alternatively, f(·) may be an arbitrary function such as an identity function, an exponential function, a logarithmic function, etc., when a prediction for a mass spectrometry ion current having a value outside the [0, 1] range is made.

따라서 펩타이드 서열 p ^k 가 MHC 대립유전자 h에 의해 제시될 수 있는 과-대립유전자 가능성은 MHC 대립유전자 h에 대한 의존성 함수 g _h (·)를 펩타이드 서열 p ^k 의 암호화 버전에 적용시켜 상응하는 의존성 스코어를 생성함으로써 생성될 수 있다. 의존성 스코어는 펩타이드 서열 p ^k 가 MHC 대립유전자 h에 의해 제시될 과-대립유전자 가능성을 생성하기 위해 변환 함수 f(·)에 의해 변환될 수 있다. Thus peptide sequence p ^k is and which may be presented by the MHC allele h - possibility allele-dependent corresponding to apply dependent function g _h (·) to the MHC allele h the encrypted version of the peptide sequence p ^k Score It can be created by creating. The dependency score can be transformed by the transform function f(·) to generate the over-allele likelihood that the peptide sequence p ^k will be presented by the MHC allele h .

ⅨⅨ .B.B .. 1 대립유전자1 allele 상호작용 변수를 위한 의존성 함수 Dependency function for interaction variables

본 명세서 전반에 걸쳐 언급된 특정한 일 구현예에서, 의존성 함수 g _h (·) 는 다음에 의해 주어진 아핀(affine) 함수이다: In one particular embodiment mentioned throughout this specification, the dependency function g _h (·) Is the affine function given by:

이는 관련 MHC 대립유전자 h에 대해 결정된 파라미터

의 세트내 상응하는 파라미터와 각 대립유전자 상호작용 변수

를 선형적으로 결합한다. This is the parameter determined for the relevant MHC allele h

Corresponding parameters in the set of and each allele interaction variable

Combine linearly.

본 명세서 전반에 걸쳐 언급된 또 다른 특정한 구현예에서, 의존성 함수 g _h (·)는 하기에 의해 주어진 네트워크 함수이다: In another specific embodiment mentioned throughout this specification, the dependency function g _h (·) is the network function given by:

이는 하나 이상의 층에 일련의 노드가 배열된 네트워크 모델 NN _h (·)로 표현된다. 노드는 파라미터

의 세트에서 관련된 파라미터를 각각 갖는 연결을 통해 다른 노드에 연결될 수 있다. 하나의 특정한 노드에서의 값은 특정한 노드와 관련된 활성화 함수에 의해 맵핑된 관련된 파라미터에 의해 계량된 특정한 노드에 연결된 노드들의 값들의 합으로서 표시될 수 있다. 아핀 함수와는 대조적으로, 제시 모델은 서로 상이한 길이의 아미노산 서열을 갖는 비-선형성 및 프로세스 데이터를 통합할 수 있기 때문에 네트워크 모델이 유리하다. 구체적으로, 비-선형 모델링을 통해 네트워크 모델은 펩타이드 서열의 상이한 위치에 있는 아미노산 사이의 상호작용과 이 상호작용이 펩타이드 제시에 미치는 영향을 포착할 수 있다. This is represented by a network model NN _h (·) in which a series of nodes are arranged in one or more layers. Node is a parameter

It can be connected to another node through a connection each having a related parameter in the set of. The value at one particular node can be expressed as the sum of the values of nodes connected to a particular node, metered by the related parameters mapped by the activation function associated with that particular node. In contrast to the affine function, the network model is advantageous because the presentation model can incorporate non-linearity and process data with amino acid sequences of different lengths from each other. Specifically, through non-linear modeling, the network model can capture the interactions between amino acids at different positions in the peptide sequence and the effect of this interaction on peptide presentation.

일반적으로 네트워크 NN _h (·)은 피드-포워드 네트워크, 예컨대 인공 신경 네트워크(ANN), 콘볼루션 신경망(CNN), 딥 신경망(DNN) 및/또는 재발성 네트워크, 예컨대 긴 단기간 메모리 네트워크(LSTM), 양방향 재발성 네트워크, 딥 양방향 재발성 네트워크 등으로서 구조화될 수 있다. Generally, the network NN _h (·) is a feed-forward network, such as an artificial neural network (ANN), a convolutional neural network (CNN), a deep neural network (DNN) and/or a recurrent network, such as a long short-term memory network (LSTM), It can be structured as a bidirectional recurrent network, a deep bidirectional recurrent network, and the like.

본 명세서의 나머지 부분에서 언급된 일 사례에서, h=1, 2,... m 의 각각의 MHC 대립유전자는 개별적인 네트워크 모델과 관련되며, NN _h (·)는 MHC 대립유전자 h와 관련된 네트워크 모델의 결과물을 나타낸다. In one case mentioned in the rest of the specification, each MHC allele of h=1, 2,... m is associated with a separate network model, and NN _h (·) is the network model associated with the MHC allele h It shows the result of

도 5는 임의의 MHC 대립유전자 h=3과 관련한 예시적인 네트워크 모델 NN ₃ (·)을 나타낸다. 도 5에 도시된 바와 같이, MHC 대립유전자 h=3에 대한 네트워크 모델 NN ₃ (·)은 층 l=1에서 3개의 입력 노드, 층 l=2에서 4개의 노드, 층 l=3, 에서 2개의 노드, 층 l=4에서 1개의 출력 노드를 포함한다. 네트워크 모델 NN ₃ (·)은 10개의 파라미터

의 세트와 관련된다. 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 3개의 대립유전자-상호작용 변수

및

에 대한 입력 값(암호화된 폴리펩타이드 서열 데이터 및 사용된 임의의 다른 훈련 데이터를 포함하는 개별 데이터 사례)을 수신하며, 및 값 NN ₃ (x ₃ ^k )을 산출한다. 네트워크 함수는 또한 상이한 대립유전자 상호작용 변수를 입력으로서 각각 사용하는 하나 이상의 네트워크 모델을 포함할 수 있다.5 shows an exemplary network model NN ₃ (·) related to any MHC allele h=3 . As shown in FIG. 5, the network model NN ₃ (·) for the MHC allele h=3 is 3 input nodes at layer l=1, 4 nodes at layer l=2, 2 at layer l=3, Contains 1 node, 1 output node at layer l=4. Network model NN ₃ (·) has 10 parameters

It is related to the set. The network model NN ₃ (·) shows three allele-interaction variables for the MHC allele h=3

And

The input value for (individual data case including the encoded polypeptide sequence data and any other training data used) is received, and the value NN ₃ ( x ₃ ^k ) is calculated. The network function can also include one or more network models, each using different allele interaction variables as input.

다른 사례에서, 확인된 MHC 대립유전자 h=1, 2, ... m은 단일 네트워크 모델 NN _H(·) 과 관련되어 있으며, NN _h (·)는 MHC 대립유전자 h와 관련된 단일 네트워크 모델의 하나 이상의 결과를 지칭한다. 이러한 사례에서,

의 세트는 단일 네트워크 모델에 대한 파라미터 세트에 대응할 수 있으며, 따라서, 파라미터

의 세트는 모든 MHC 대립유전자에 의해 공유될 수 있다. In another case, the identified MHC allele h=1, 2, ... m is associated with a single network model NN _H(·), and NN _h (·) is one of the single network models associated with the MHC allele h The above results are referred to. In this case,

The set of can correspond to a set of parameters for a single network model, and thus, the parameters

The set of can be shared by all MHC alleles.

도 6a는 MHC 대립유전자 h=1, 2, ... m에 의해 공유되는 예시적인 네트워크 모델 NN _H (·)를 나타낸다. 도 6a에 도시된 바와 같이, 네트워크 모델 NN _H (·)은 MHC 대립유전자에 각각 상응하는 m 출력 노드를 포함한다. 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수

를 수신하며, MHC 대립유전자 h=3에 대응하는 값

을 포함하는 m값을 산출한다. 6A shows an exemplary network model NN _H (·) shared by the MHC alleles h=1, 2, ... m . As shown in FIG. 6A, the network model NN _H (·) includes m output nodes, each corresponding to an MHC allele. The network model NN ₃ (·) is the allele-interaction variable for the MHC allele h=3

And a value corresponding to the MHC allele h=3

Calculate m value including.

또 다른 예로, 단일 네트워크 모델

은 MHC 대립유전자 h의 대립유전자 상호작용 변수

암호화된 단백질 서열

이 주어진 의존성 스코어를 출력하는 네트워크 모델일 수 있다. 이러한 경우, 파라미터

의 세트는 단일 네트워크 모델에 대한 파라미터 세트에 다시 대응할 수 있으므로, 파라미터

의 세트는 모든 MHC 대립유전자에 의해 공유될 수 있다. 따라서, 이러한 경우에

는 단일 네트워크 모델에 입력

이 주어진 단일 네트워크 모델

의 출력을 지칭할 수 있다. 이러한 네트워크 모델은 훈련 데이터에서 알려지지 않은 MHC 대립유전자에 대한 펩타이드 제시 확률이 단백질 서열의 식별에 의해서만 예측될 수 있기 때문에 유리하다. As another example, a single network model

Is the allele interaction variable of the MHC allele h

Encoded protein sequence

It can be a network model that outputs this given dependency score. In this case, the parameters

Since the set of can correspond back to a set of parameters for a single network model,

The set of can be shared by all MHC alleles. Therefore, in this case

Entered into a single network model

Given a single network model

Can refer to the output of This network model is advantageous because the probability of peptide presentation for unknown MHC alleles in training data can only be predicted by identification of protein sequences.

도 6b는 MHC 대립유전자에 의해 공유되는 예시적인 네트워크 모델 NN _H (·)을 도시한다. 도 6b에 도시된 바와 같이, 네트워크 모델 NN _H (·)은 MHC 대립유전자 h=3 의 대립유전자 상호작용 변수 및 단백질 서열을 입력으로서 수신하며, MHC 대립유전자 h=3에 상응하는 의존성 스코어

를 출력한다. 6B depicts an exemplary network model NN _H (·) shared by the MHC allele. , The network model NN _H as shown in Figure 6b (·) has MHC allele h =

alleles

3, and receives as input variables, and interaction of the protein sequence, corresponding to dependent MHC allele h = 3 Score

Output

또 다른 예에서 의존성 함수 g _k (·)는 다음과 같이 표현할 수 있다:In another example, the dependency function g _k (·) can be expressed as:

여기서

는 파라미터

의 세트를 갖는 아핀 함수, 네트워크 함수 등이며, MHC 대립유전자에 대한 대립유전자 상호작용 변수에 대한 파라미터 세트에서 바이어스 파라미터

는 MHC 대립유전자 h에 대한 제시의 기본 확률을 나타낸다. here

Is the parameter

Affine function, network function, etc., having a set of bias parameters in the parameter set for allele interaction variables for the MHC allele

Represents the basic probability of presentation for the MHC allele h .

또 다른 구현예에서, 바이어스 파라미터

은 MHC 대립유전자 h의 유전자 계열에 따라 공유될 수 있다. 즉, MHC 대립유전자 h에 대한 바이어스 파라미터

는

, 와 동일할 수 있으며, gene(h)는 MHC 대립유전자 h의 유전자 계열이다. 예를 들어, 부류 I MHC 대립유전자 HLA-A*02:01, HLA-A*02:02 및 HLA-A*02:03은 "HLA-A"의 유전자 계열에 할당될 수 있으며, 이들 MHC 대립유전자 각각에 대한 바이어스 파라미터

은 공유될 수 있다. 다른 예에서, 부류 II MHC 대립유전자 HLA-DRB1:10:01, HLA-DRB1:11:01, 및 HLA-DRB3:01:01은 "HLA-DRB"의 유전자 패밀리에 할당될 수 있고 이들 MHC 대립유전자 각각에 대한 바이어스 파라미터

는 공유될 수 있다.In another embodiment, the bias parameter

Can be shared according to the gene family of the MHC allele h . That is, the bias parameter for the MHC allele h

The

, It may be the same as, gene ( h ) is a gene family of the MHC allele h . For example, class I MHC alleles HLA-A*02:01, HLA-A*02:02 and HLA-A*02:03 can be assigned to the gene family of "HLA-A", and these MHC alleles Bias parameters for each gene

Can be shared. In another example, class II MHC alleles HLA-DRB1:10:01, HLA-DRB1:11:01, and HLA-DRB3:01:01 can be assigned to the gene family of “HLA-DRB” and these MHC alleles Bias parameters for each gene

Can be shared.

식 (2)로 되돌아 가면, 예로서, 아핀 의존성 함수 g _h (·)를 사용하여 m=4 상이한 확인된 MHC 대립유전자들 중에서, 펩타이드 p ^k 가 MHC 대립유전자 h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있다: Returning to equation (2), among the identified MHC alleles different from m=4 using, for example, the affine dependence function g _h (·) , the likelihood that the peptide p ^k will be presented by the MHC allele h=3 is It can be produced by:

여기서 x ₃ ^k 는 MHC 대립유전자 h=3에 대해 확인된 대립유전자-상호작용 변수이며, θ ₃ 은 손실 함수 최소화를 통해 MHC 대립유전자 h=3에 대해 결정된 파라미터의 세트이다. Where x ₃ ^k is the allele-interaction variable identified for the MHC allele h=3 , and θ ₃ is the set of parameters determined for the MHC allele h=3 through minimizing the loss function.

다른 예로서, 별개의 네트워크 전환 함수 g _h (·)를 사용하여 m=4 상이한 확인된 MHC 대립유전자들 중에서, 펩타이드 p ^k 가 MHC 대립유전자 h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있다: As another example, among m=4 different identified MHC alleles using the distinct network conversion function g _h (·) , the likelihood that the peptide p ^k will be presented by the MHC allele h=3 will be generated by Can:

여기서 x ₃ ^k 는 MHC 대립유전자 h=3, 에 대해 확인된 대립유전자-상호작용 변수이며,

은 MHC 대립유전자 h=3과 관련된 네트워크 모델

에 대해 결정된 파라미터의 세트이다. Where x ₃ ^k is the MHC allele h=3 , Are allele-interaction variables identified for

Is the network model associated with the MHC allele h=3

It is a set of parameters determined for.

도 7은 예시적인 네트워크 모델 NN ₃ (·)을 사용하여 MHC 대립유전자 h=3과 관련하여 펩타이드 p ^k 에 대한 제시 가능성을 생성하는 것을 도시한다. 도 7에 도시된 바와 같이, 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수

를 수신하며, 출력 NN ₃ ( x ₃ ^k )를 생성한다. 출력은 함수 f(·)에 의해 맵핑되어 추정된 제시 가능성 u _k 를 생성한다. FIG. 7 shows using the exemplary network model NN ₃ (·) to generate the potential for presentation of the peptide p ^k with respect to the MHC allele h=3 . 7, the network model NN ₃ (·) is an allele-interaction variable for the MHC allele h=3

And output NN ₃ ( x ₃ ^k ). The output is mapped by the function f (·) to produce the estimated likelihood u _k .

ⅨⅨ .B.B .2. 대립유전자-비상호작용 변수가 있는 과-대립유전자.2. Allele-over-allele with non-interacting variables

일 구현예에서, 훈련 모듈(316)은 대립유전자-비상호작용 변수들을 통합하고, 하기에 의해 펩타이드 p ^k 에 대한 추정된 제시 가능성 u _k 를 모델링한다: In one embodiment, the training module 316 incorporates allele-non-interaction variables and models the estimated presentation potential u _k for the peptide p ^k by:

여기서, w ^k 는 펩타이드 p ^k 에 대한 암호화된 대립유전자-비상호작용 변수를 지칭하며, g _w (·)는 대립유전자-비상호작용 변수에 대해 결정된 파라미터

의 세트를 기반으로 한 대립유전자-비상호작용 변수

에 대한 함수이다. 구체적으로, 각 MHC 대립유전자 h에 대한 파라미터

의 세트 및 대립유전자- 비상호작용 변수에 대한 파라미터

의 세트에 대한 값은

및

에 관하여 손실 함수를 최소화함으로써 결정될 수 있으며, i는 단일 MHC 대립유전자를 발현하는 세포로부터 생성된 훈련 데이터(170)의 서브셋 S 각 경우이다. Here, w ^k refers to the encoded allele-non-interaction variable for the peptide p ^k , and g _w (·) is the parameter determined for the allele-non-interaction variable

Allele-non-interaction variables based on a set of

Is a function for Specifically, parameters for each MHC allele h

Set and allele-parameters for non-interaction variables

The value for the set of

And

With respect to can be determined by minimizing the loss function, i is each case a subset S of training data 170 generated from cells expressing a single MHC allele.

의존성 함수

의 출력은 펩타이드 p ^k 가 대립유전자 비상호작용 변수의 영향에 근거한 하나 이상의 MHC 대립유전자에 의해 제시되는지 여부를 나타내는 대립유전자 비상호작용 변수에 대한 의존성 스코어를 나타낸다. 예를 들어, 펩타이드 p ^k 가 펩타이드 p ^k 의 제시에 긍정적으로 영향을 미치는 것으로 알려진 C-말단 측접 서열과 관련되어 있다면, 대립유전자 비상호작용 변수에 대한 의존성 스코어는 높은 값을 가질 수 있으며, 펩타이드 p ^k 가 펩타이드 p ^k 의 제시에 부정적으로 영향을 미치는 것으로 알려져 있는 C-말단 측접 서열과 관련되어 있다면, 낮은 값을 가질 수 있다. Dependency function

The output of represents a dependency score for the allele non-interaction variable indicating whether the peptide p ^k is presented by one or more MHC alleles based on the effect of the allele non-interaction variable. For example, if the peptide p ^k is associated with a C-terminal flanking sequence that is known to positively influence the presentation of the peptide p ^k , the dependency score for the allele non-interaction variable can have a high value, and the peptide p ^{If k} is associated with a C-terminal flanking sequence that is known to negatively affect the presentation of the peptide p ^k , it may have a low value.

수식 (8)에 따르면, 펩타이드 서열 p ^k 가 MHC 대립유전자 h에 의해 제시될 과-대립유전자 가능성은 대립유전자 상호작용 변수에 대한 상응하는 의존성 스코어를 생성하기 위해 MHC 대립유전자 h에 대한 함수 g _h (·) 펩타이드 서열 p ^k 의 암호화 버전에 적용함으로써 생성될 수 있다. 대립유전자-비상호작용 변수에 대한 g _w (·) 함수는 대립유전자-비상호작용 변수의 암호화 버전에도 적용되어 대립유전자 비상호작용 변수의 의존성 스코어를 생성한다. 두 스코어를 조합하고, 조합된 점수는 전환 함수 f(·)에 의해 변환되어 펩타이드 서열 p ^k 이 MHC 대립유전자 h에 의해 제시될 과-대립유전자 가능성을 생성할 것이다. According to equation (8), the peptide sequence p ^k is and is presented by the MHC allele h - function for the MHC allele h for potential allele to generate the corresponding dependence scores for the allelic interaction parameter g _h (.) it may be generated by applying the encrypted version of the peptide sequence p ^k. The g _w (·) function for the allele-non-interaction variable is also applied to the encrypted version of the allele-non-interaction variable to generate a dependency score for the allele non-interaction variable. Combining the two scores, the combined scores will be converted by the conversion function f(·) to create the over-allele likelihood that the peptide sequence p ^k will be presented by the MHC allele h .

대안적으로, 훈련 모듈(316)은 대립유전자-비상호작용 변수

를 수식 (2)의 대립유전자-상호작용 변수

에 가산함으로써 예측내 대립유전자-비상호작용 변수

를 포함할 수 있다. 따라서 제시 가능성은 하기에 의해 주어질 수 있다: Alternatively, training module 316 can be used for allele-non-interaction variables.

Is the allele-interaction variable of Equation (2)

Allele-non-interaction variable in prediction by adding to

It may include. Therefore, the possibility of presentation can be given by:

ⅨⅨ .B.B .3 대립유전자-비상호작용 변수에 대한 의존성 함수.3 Dependency function for allele-non-interaction variables

대립유전자 상호작용 변수에 대한 의존성 함수 g _h (·)와 유사하게, 대립유전자 비상호작용 변수에 대한 의존성 함수 g _w (·)는 별도의 네트워크 모델이 대립유전자-비상호작용 변수 w ^k 와 관련된 아핀 함수 또는 네트워크 함수일 수 있다. Similar to the dependence function for the allele interaction variable g _h (·) , the dependence function for the allele non-interaction variable g _w (·) is an affine function in which a separate network model is associated with the allele-non-interaction variable w ^k Or it can be a network function.

특히 의존성 함수 g _w (·)는 다음에 의해 주어진 아핀 함수이며: In particular, the dependency function g _w (·) is an affine function given by:

이는 w ^k 의 대립유전자가-비상호작용 변수를 파라미터

의 세트내 해당 파라미터와 선형적으로 조합한다. This parameter of the allele-non-interaction variable of w ^k

Combine linearly with the corresponding parameter in the set of.

의존성 함수 g _w (·)는 다음에 의해 주어진 네트워크 함수일 수도 있으며: Dependency function g _w (·) may be a network function given by:

파라미터

의 세트에 관련된 파라미터가 있는 네트워크 모델

에 의해 나타내어진다. 네트워크 함수는 또한 상이한 대립유전자 비상호작용 변수를 입력으로서 각각 사용하는 하나 이상의 네트워크 모델일 수 있다.parameter

Network model with parameters related to a set of

It is represented by. The network function can also be one or more network models, each using a different allele non-interaction variable as input.

또 다른 예로, 대립유전자-비상호작용 변수에 대한 의존성 함수 g _w (·)는 하기에 의해 주어질 수 있으며: In another example, the dependence function g _w (·) for the allele-non-interaction variable can be given by:

여기서,

는 아핀 함수, 대립유전자-비상호작용 파라미터

의 세트를 갖는 네트워크 함수 등이며, m ^k 는 펩타이드 p ^k 에 대한 mRNA 정량화 측정법이며, h(·)는 정량화 측정법을 전환시키는 함수이며,

은 mRNA와 조합된 대립유전자 비상호작용 변수에 대한 파라미터의 세트내 파라미터이며, mRNA 정량화 측정을 위한 의존성 스코어를 생성시킨다. 본 명세서의 나머지에 전반적으로 언급된 특별한 일 구현예에서, h(·)는 로그 함수이지만, 실제로 h(·)는 다양한 상이한 함수들 중 임의의 하나일 수 있다. here,

Is affine function, allele-non-interaction parameter

Is a network function with a set of m ^k , m ^k is a method for quantifying mRNA for the peptide p ^k , h (·) is a function for converting the quantitation method,

Is a parameter in the set of parameters for the allele non-interaction variable combined with the mRNA, and produces a dependency score for mRNA quantification measurements. In one particular implementation mentioned throughout the rest of this specification, h(·) is a logarithmic function, but in practice h (·) can be any one of a variety of different functions.

또 다른 사례에서, 대립유전자-비상호작용 변수에 대한 의존성 함수 g _w (·)는 하기에 의해 주어질 수 있다: In another example, the dependence function g _w (·) for the allele-non-interaction variable can be given by:

여기서,

는 아핀 함수, 대립유전자 비상호작용 파라미터

의 세트를 갖는 네트워크 함수 등이며,

는 펩타이드 p ^k 에 대한 인간 단백체에서 단백질과 이성체를 나타내는 섹션 VII.C.2에 기술된 지표 벡터이며,

는 지표 벡터와 조합된 대립유전자 비상호작용 변수의 세트내 파라미터의 세트이다. 일 변형예에서, o ^k 의 치수 및 파라미터 세트

가 매우 높으면, 파라미터 정규화 용어, 예컨대

는 파라미터의 값을 결정할 때, 손실 함수에 부가될 수 있으며, 여기서

는 L1 표준(norm), L2 표준, 조합 등을 나타낸다. 하이퍼파라미터 λ의 최적 값은 적절한 방법을 통해 결정될 수 있다.here,

Is affine function, allele non-interaction parameter

Is a network function that has a set of

Is an indicator vector described in Section VII.C.2, which represents proteins and isomers in the human protein for the peptide p ^k ,

Is the set of parameters in the set of allele non-interaction variables combined with the indicator vector. In one variant, a set of dimensions and parameters of o ^k

If is very high, parameter normalization terms, such as

Can be added to the loss function when determining the value of a parameter, where

Denotes an L1 standard (norm), an L2 standard, a combination, and the like. The optimum value of the hyperparameter λ can be determined by an appropriate method.

또 다른 예에서, 대립유전자-비상호작용 변수에 대한 의존성 함수 g _w (·)는 하기에 의해 주어질 수 있다:In another example, the dependence function g _w (·) on the allele-non-interaction variable can be given by:

여기서,

는 아핀 함수 대립유전자 비상호작용 파라미터

의 세트를 가지는 네트워크 함수 등이며,

(유전자(p ^k =1)은 대립유전자 비상호작용 변수와 관련하여 상기 기술된 바와 같이 펩타이드 p ^k 가 공급원 유전자 l로부터 유래된 경우 1과 동일한 표지 함수이고,

은 공급원 유전자 l의 "항원성"을 나타내는 파라미터이다. 일 변형예에서, L이 매우 높고, 따라서 다수의 파라미터

가 매우 높으면, 파라미터 정규화 용어, 예컨대

는 L1 표준, L2 표준, 조합 등을 나타낸다. 하이퍼파라미터 λ의 최적 값은 적절한 방법을 통해 결정될 수 있다.here,

Is affine function allele non-interaction parameter

Is a network function with a set of

(Gene ( p ^k =1) is the same labeling function as 1 when the peptide p ^k is derived from the source gene l as described above with respect to the allele non-interaction variable,

Is a parameter representing the "antigenicity" of the source gene l . In one variant, L is very high, and thus multiple parameters

If is very high, parameter normalization terms, such as

Indicates L1 standard, L2 standard, combination, and the like. The optimum value of the hyperparameter λ can be determined by an appropriate method.

또 다른 예에서, 대립유전자-비상호작용 변수에 대한 의존성 함수 g _w (·)는 하기와 같이 주어질 수 있다:In another example, the dependence function g _w (·) on the allele-non-interaction variable can be given as follows:

여기서,

는 아핀 함수, 대립유전자 비상호작용 파라미터

의 세트 등을 갖는 네트워크 함수, 등이고,

은 대립유전자 비상호작용 변수와 관련하여 상기 기재된 바와 같이 펩타이드 p ^k 가 공급원 유전자 l로부터 유래된 경우 및 펩타이드 p ^k 가 조직 유형 m으로부터 유래된 경우 1과 동일한 표지 함수이고,

은 공급원 유전자 l 및 조직 유형 m의 조합의 항원성을 나타내는 파라미터이다. 구체적으로, 조직 유형 m에 대한 유전자 l의 항원성은 RNA 발현 및 펩타이드 서열 컨텍스트를 조절한 후 유전자 l로부터 펩타이드를 제시하는 조직 유형 m의 세포에 대한 잔류 성향을 나타낼 수 있다.here,

Is affine function, allele non-interaction parameter

Is a network function, etc.

And has the same function as the cover 1 when the case is derived from a peptide p ^k as described above with respect to the alleles Non-interactive variables derived from the source of the gene and peptide p l ^k the tissue type m,

Is a parameter indicating the antigenicity of the combination of source gene l and tissue type m . Specifically, the gene for the antigen l tissue type castle m may represent a residual tendency for the tissue type and then control the expression of RNA and peptide sequences present context the peptide from the gene m l cells.

하나의 변형에서, L 또는 M이 현저히 높고, 따라서 파라미터

의 수가 현저히 높으면, 파라미터 정규화 항, 예컨대

는 L1 표준(norm), L2 표준, 조합 등을 나타낸다. 하이퍼파라미터 λ의 최적 값은 적절한 방법을 통해 결정될 수 있다. 또 다른 변형에서, 동일한 공급원 유전자에 대한 계수가 조직 유형 사이에서 현저히 달라지지 않도록, 파라미터 정규화 항은 파라미터의 값을 결정할 때, 손실 함수에 부가될 수 있다. 예를 들어, 하기와 같은 벌점화 항이 손실 함수에서 상이한 조직 유형에 걸쳐 항원성의 표준 편차를 벌점화할 수 있다:In one variant, L or M is significantly higher, so the parameters

If the number of is significantly higher, the parameter normalization term, for example

Denotes an L1 standard (norm), an L2 standard, a combination, and the like. The optimum value of the hyperparameter λ can be determined by an appropriate method. In another variation, a parameter normalization term can be added to the loss function when determining the value of a parameter, such that the coefficients for the same source gene do not differ significantly between tissue types. For example, the following penalty terms can penalize standard deviations of antigenicity across different tissue types in a loss function:

여기서,

는 공급원 유전자 l에 대한 조직 유형에 걸친 평균 항원성이고, 손실 기능에서 상이한 조직 유형에 걸친 항원성의 표준 편차를 과할 수 있다.here,

Is the mean antigenicity across tissue types for the source gene l , and can exceed the standard deviation of antigenicity across different tissue types in lossy function.

실제로, 수식 (10), (11) 및 (12a) 및 (12b) 중 임의의 추가 항은 대립유전자 비상호작용 변수에 대한 의존성 함수 g _w (·)를 생성하기 위해 조합될 수 있다. 예를 들어, 수식 (10)에서 mRNA 정량 측정을 나타내는 항 h(·) 및 수식 (12)에서 공급원 유전자 항원성을 나타내는 항은 다른 아핀 또는 네트워크 함수과 함께 합쳐서 대립유전자 비상호작용 변수에 대한 의존성 함수를 생성할 수 있다.Indeed, any additional terms of equations (10), (11) and (12a) and (12b) can be combined to produce a dependence function g _w (·) for allele non-interaction variables. For example, the term h (·) indicating the quantitative measurement of mRNA in Eq. (10) and the term indicating the source gene antigenicity in Eq. (12) can be combined with other affine or network functions to determine the dependence function on allele non-interaction variables. Can be created.

수식 (8)을 예로 들면, 아핀 전환 함수

를 사용하여 m=4 상이한 확인된 MHC 대립유전자 중에서 펩타이드 p ^k 가 MHC 대립유전자 h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있다: Taking Eq. (8) as an example, the affine conversion function

The possibility that the peptide p ^k among m=4 different identified MHC alleles will be presented by the MHC allele h=3 can be generated by:

여기서 w ^k 는 펩타이드 p ^k 에 대한 확인된 대립유전자-비상호작용 변수이며, 및

는 대립유전자-비상호작용 변수에 대해 결정된 파라미터 세트이다. Where w ^k is the identified allele-non-interaction variable for the peptide p ^k , and

Is a set of parameters determined for allele-non-interaction variables.

다른 예로서, 네트워크 전환 함수

를 사용하여 m=4 상이한 확인된 MHC 대립유전자 중에서 펩타이드 p ^k 가 MHC 대립유전자 h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있다: As another example, the network conversion function

여기서 w ^k 는 펩타이드 p ^k 에 대한 확인된 대립유전자-상호작용 변수이며,

는 MHC 대립유전자-비상호작용 변수에 대해 결정된 파라미터 세트이다. Where w ^k is the identified allele-interaction variable for the peptide p ^k ,

Is the set of parameters determined for the MHC allele-non-interaction variable.

도 8은 예시적인 네트워크 모델

및

을 사용하여 MHC 대립유전자 h=3과 관련하여 펩타이드 p ^k 에 대한 제시 가능성을 생성하는 것을 도시한다. 도 8에 도시된 바와 같이, 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수

를 수신하며, 출력

를 생성한다. 네트워크 모델 NN _w (·)는 펩타이드 p ^k 에 대한 대립유전자-비상호작용 변수 w ^k 를 수신하고, 출력

을 생성한다. 출력은 함수 f(·)에 의해 조합되고, 맵핑되어 추정된 제시 가능성 u _k 를 생성한다. 8 is an exemplary network model

And

It is shown to generate the potential for presentation to the peptide p ^k in relation to the MHC allele h=3 . As shown in Figure 8, the network model NN ₃ (·) is an allele-interaction variable for the MHC allele h=3

And output

Produces The network model NN _w (·) receives the allele-non-interaction variable w ^k for the peptide p ^k and outputs it

Produces The outputs are combined by the function f(·) and mapped to produce the estimated likelihood u _k .

ⅨⅨ .C.C . 다중-대립유전자 모델. Multi-allele model

훈련 모듈(316)은 또한 2개 이상의 MHC 대립유전자가 존재하는 다중-대립유전자 설정에서 펩타이드의 제시 가능성을 예측하기 위해 제시 모델을 구성할 수 있다. 이 경우, 훈련 모듈(316)은 단일 MHC 대립유전자를 발현하는 세포, 다중 MHC 대립유전자를 발현하는 세포, 또는 이들의 조합으로부터 생성된 훈련 데이터(170)의 데이터 사례들 S에 기초하여 제시 모델들을 훈련할 수 있다. The training module 316 can also construct a presentation model to predict the likelihood of presentation of the peptide in a multi-allele setup where two or more MHC alleles are present. In this case, the training module 316 presents the presented models based on data cases S of training data 170 generated from cells expressing a single MHC allele, cells expressing multiple MHC alleles, or a combination thereof. You can train.

ⅨⅨ .C.C .1. .One. 실시예Example 1: 최대의1: maximum 과-대립유전자 모델 Over-allele model

일 구현예에서, 훈련 모듈(316)은 수식 (2) 내지 (11)과 조합하여 상기 기술된 바와 같이, 단일-대립유전자를 발현하는 세포에 기초하여 결정된 세트 H의 MHC 대립유전자 h 각각에 대해 결정된 제시 가능성

의 함수로서 다중 MHC 대립유전자 H의 세트와 연합된 펩타이드 p ^k 에 대한 추정된 제시 가능성 u _k 을 모델링한다. 구체적으로는, 제시 가능성 u _k 는

의 임의의 함수일 수 있다. 일 구현예에서, 수식 (12)에 도시된 바와 같이, 함수는 최대 함수이고, 제시 가능성 u _k 는 세트 H의 MHC 대립유전자 h 각각에 대해 최대 제시 가능성으로서 결정될 수 있다. In one embodiment, the training module 316 for each of the MHC alleles h of set H determined based on cells expressing a single-allele, as described above in combination with equations (2)-(11) Decision potential

Model the estimated presentation probability u _k for the peptide p ^k associated with a set of multiple MHC alleles H as a function of. Specifically, it presented is likely u _k

It can be any function of. In one embodiment, as shown in equation (12), the function is the maximum function, and the probability of presentation u _k can be determined as the maximum probability of presentation for each of the MHC alleles h of set H.

ⅨⅨ .C.C .2. .2. 실시예Example 2.1: 합계2.1: Total -함수 모델-Function model

일 구현예에서, 훈련 모듈(316)은 펩타이드 p ^k 에 대한 추정된 제시 가능성 u _k 를 하기에 의해 모델링한다: In one embodiment, the training module 316 models the estimated likelihood u _k for the peptide p ^k by:

여기서, 요소

는 펩타이드 서열

와 관련된 다중 MHC 대립유전자 H에 대해 1이며, 펩타이드 서열 x _h ^k 는 펩타이드 p ^k 및 상응하는 MHC 대립유전자에 대한 암호화 대립유전자-상호작용 변수를 나타낸다. 각 MHC 대립유전자 h에 대한 파라미터

의 세트에 대한 값은

에 관한 손실 함수를 최소화함으로써 결정될 수 있으며, 여기서, i는 단일 MHC 대립유전자를 발현하는 세포 및/또는 다중 MHC 대립유전자를 발현하는 세포로부터 생성된 훈련 데이터(170)의 서브셋 S 내의 각 사례이다. 의존성 함수

는 섹션 Ⅷ.B.1에서 상기 소개된 임의의 의존성 함수

의 형태로 있을 수 있다. Where element

Is a peptide sequence

1 for the multiple MHC allele H associated with, and the peptide sequence x _h ^k represents the coding allele-interaction variable for the peptide p ^k and the corresponding MHC allele. Parameters for each MHC allele h

The value for the set of

It can be determined by minimizing the loss function for, where i is each case in subset S of training data 170 generated from cells expressing a single MHC allele and/or cells expressing multiple MHC alleles. Dependency function

Is an arbitrary dependency function introduced above in section VII.B.1.

Can be in the form of

수식 (13)에 따르면, 펩타이드 서열 p ^k 가 하나 이상의 MHC 대립유전자 h에 의해 제시될 제시 가능성은 대립유전자 상호작용 변수에 대한 상응하는 점수를 생성하기 위해 MHC 대립유전자 H 각각에 대한 펩타이드 서열 p ^k 의 암호화 버전에 의존성 함수

를 적용함으로써 생성될 수 있다. 각 MHC 대립유전자 h에 대한 스코어는 조합되고, 전환 함수 f(·)에 의해 전환되어 펩타이드 서열 p ^k 가 MHC 대립유전자 H의 세트에 의해 제시될 제시 가능성을 생성한다. According to equation (13), the likelihood that the peptide sequence p ^k will be presented by one or more MHC alleles h is the peptide sequence p ^k for each of the MHC alleles H to generate a corresponding score for the allele interaction variable. Dependence function on the cryptographic version of

Can be generated by applying The scores for each MHC allele h are combined and converted by the conversion function f (·) to generate the likelihood that the peptide sequence p ^k will be presented by the set of MHC alleles H.

수식 (13)의 제시 모델은 각 펩타이드 p ^k 에 대한 관련된 대립유전자의 수가 1보다 클 수 있다는 점에서, 수식 (2)의 과-대립유전자 모델과는 상이하다. 다시 말해, a _h ^k 에 있는 하나 이상의 요소는 펩타이드 서열 p ^k 와 관련된 다중 MHC 대립유전자 H에 대해 1의 값을 가질 수 있다. The presented model of Eq. (13) differs from the over-allele model of Eq. (2) in that the number of associated alleles for each peptide p ^k can be greater than one. In other words, one or more elements in a _h ^k may have a value of 1 for multiple MHC alleles H associated with the peptide sequence p ^k .

예를 들어, 아핀 전환 함수

를 사용하여 m=4 상이한 확인된 MHC 대립유전자 중에서, 펩타이드 p ^k 가 MHC 대립유전자 h=2, h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있으며: For example, the affine conversion function

Among the identified MHC alleles different from m=4 using, the likelihood that the peptide p ^k will be presented by the MHC allele h=2, h=3 can be generated by:

여기서

는 MHC 대립유전자 h=2, h= 3에 대한 확인된 대립유전자-상호작용 변수이며,

은 MHC 대립유전자 h=2, h=3에 대해 결정된 파라미터 세트이다. here

Is the identified allele-interaction variable for the MHC allele h=2, h= 3 ,

Is a set of parameters determined for the MHC allele h=2, h=3 .

다른 예로서, 네트워크 전환 함수

를 사용하여 m=4 상이한 확인된 MHC 대립유전자 중에서, 펩타이드 p ^k 가 MHC 대립유전자 h=2, h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있으며: As another example, the network conversion function

여기서 NN _2(·) , NN ₃ (·)는 MHC 대립유전자 h=2, h= 3에 대한 확인된 네트워크 모델이며, 및

은 MHC 대립유전자 h=2, h=3에 대해 결정된 파라미터 세트이다. Where NN _2(·) and NN ₃ (·) are the identified network models for the MHC alleles h=2, h= 3 , and

Is a set of parameters determined for the MHC allele h=2, h=3 .

도 9는 예시적인 네트워크 모델

및

을 사용하여 MHC 대립유전자 h=2, h= 3와 관련하여 펩타이드 p ^k 에 대한 제시 가능성을 설명한다. 도 9에 도시된 바와 같이, 네트워크 모델

는 MHC 대립유전자 h=2에 대한 대립유전자-상호작용 변수 x ₂ ^k 를 수신하고, 출력

를 생성하고, 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수 x ₃ ^k 를 수신하며, 출력

를 생성한다. 출력은 함수 f(·)에 의해 조합되고, 맵핑되어 추정된 제시 가능성 u _k 를 생성한다. 9 is an exemplary network model

And

The possibility of presentation for the peptide p ^k in relation to the MHC alleles h=2, h= 3 is explained using. 9, the network model

Receives the allele-interaction variable x ₂ ^k for the MHC allele h=2 , and outputs

And the network model NN ₃ (·) receives the allele-interaction variable x ₃ ^k for the MHC allele h=3 , and outputs

ⅨⅨ .C.C .3. .3. 실시예Example 2.2: 대립유전자-비상호작용 변수가 있는 합계-함수 모델 2.2: Sum-function model with allele-non-interaction variables

여기서 w ^k 는 펩타이드 p ^k 에 대한 암호화 대립유전자-비상호작용 변수를 나타낸다. 구체적으로, 각 MHC 대립유전자 h에 대한 파라미터 세트

및 대립유전자-비상호작용 변수에 대한 파라미터 세트

에 대한 값은

및

와 관련하여 손실 함수를 최소화함으로써 결정될 수 있으며, 여기서 i는 단일 MHC 대립유전자를 발현하는 세포 및/또는 다수의 MHC 대립유전자를 발현하는 세포로부터 생성된 훈련 데이터(170)의 서브셋 S에 있는 각 사례이다. 의존성 함수 g _w 는 의존성 함수 섹션 Ⅷ.B.3에서 위에 소개된 임의의 의존성 함수 g _w 의 형태로 있을 수 있다. Where w ^k represents the coding allele-non-interaction variable for the peptide p ^k . Specifically, a set of parameters for each MHC allele h

And parameter sets for allele-non-interaction variables

The value for

And

It can be determined by minimizing the loss function in relation to where i is each case in subset S of training data 170 generated from cells expressing a single MHC allele and/or cells expressing multiple MHC alleles. to be. The dependency function g _w can be in the form of any dependency function g _w introduced above in the dependency function section VII.B.3.

따라서, 수식 (14)에 따르면, 펩타이드 서열 p ^k 가 하나 이상의 MHC 대립유전자 H에 의해 제시될 제시 가능성은 각 MHC 대립유전자 h에 대한 대립유전자 상호작용 변수를 위한 대응하는 상응하는 의존성 스코어를 생성하기 위해 MHC 대립유전자 H 각각에 대한 펩타이드 서열 p ^k 의 암호화 버전에 함수 g _h (·)를 적용함으로써 생성될 수 있다. 대립유전자 비상호작용 변수에 대한 함수 g _w (·)는 또한 대립유전자 비상호작용 변수에 대한 의존성 스코어를 생성하기 위해 대립유전자 비상호작용 변수의 암호화 버전에도 적용된다. 스코어를 조합하고, 상기 조합된 스코어를 전환 함수 f(·)에 의해 변환하여 펩타이드 서열 p ^k 가 MHC 대립유전자 H에 의해 제시될 제시 가능성을 생성한다. Thus, according to formula (14), the likelihood that the peptide sequence p ^k will be presented by one or more MHC alleles H produces a corresponding corresponding dependency score for allele interaction variables for each MHC allele h . For the MHC allele H , it can be generated by applying the function g _h (·) to the coding version of the peptide sequence p ^k for each. The function g _w (·) for the allele non-interaction variable also applies to the encrypted version of the allele non-interaction variable to generate a dependency score for the allele non-interaction variable. The scores are combined and the combined scores are converted by a conversion function f (·) to generate the likelihood that the peptide sequence p ^k will be presented by the MHC allele H.

수식 (14)의 제시 모델에서, 각 펩타이드 p ^k 에 대한 관련된 대립유전자의 수는 1보다 클 수 있다. 다시 말해, a _h ^k 에 있는 하나 이상의 요소는 펩타이드 서열 p ^k 와 관련된 다중 MHC 대립유전자 H에 대해 1의 값을 가질 수 있다. In the model presented in equation (14), the number of alleles involved for each peptide p ^k can be greater than one. In other words, one or more elements in a _h ^k may have a value of 1 for multiple MHC alleles H associated with the peptide sequence p ^k .

예를 들어, 아핀 전환 함수

는 MHC 대립유전자-비상호작용 변수에 대해 결정된 파라미터 세트이다. Where w ^k is the identified allele-non-interaction variable for the peptide p ^k , and

다른 예로서, 네트워크 전환 함수

도 10은 예시적인 네트워크 모델

, 및

를 사용하여 MHC 대립유전자 h=2, h=3과 관련하여 펩타이드 p ^k 에 대한 제시 가능성을 생성하는 것을 도시한다. 도 10에 도시된 바와 같이, 네트워크 모델 NN ₂ (·)은 MHC 대립유전자 h=2에 대한 대립유전자-상호작용 변수

를 수신하고, 출력

를 생성한다. 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수 x₃ ^k를 수신하고, 출력

를 생성한다. 네트워크 모델 NN _w (·)는 펩타이드 p ^k 에 대한 대립유전자-비상호작용 변수

를 수신하고, 출력

을 생성한다. 출력은 함수 f(·)에 의해 조합되고, 맵핑되어 추정된 제시 가능성 u _k 를 생성한다. 10 is an exemplary network model

, And

It is shown using to generate the potential for the presentation of the peptide p ^k in relation to the MHC allele h=2, h=3 . As shown in Figure 10, the network model NN ₂ (·) is an allele-interaction variable for the MHC allele h=2

Receive and output

Produces The network model NN ₃ (·) receives the allele-interaction variable x ₃ ^k for the MHC allele h=3 , and outputs

Produces The network model NN _w (·) is an allele-non-interaction variable for the peptide p ^k

Receive and output

대안적으로, 훈련 모듈(316)는 대립유전자-비상호작용 변수

를 수식 (15)의 대립유전자-상호작용 변수

에 첨가하여 예측에 대립유전자-비상호작용 변수

The allele-interaction variable of Equation (15)

Allele-non-interacting variable in prediction by adding to

It may include. Therefore, the possibility of presentation can be given by:

Ⅸ.C.4. Ⅸ.C.4. 실시예Example 3.1: 암시적 과-대립유전자 가능성을 사용하는 모델 3.1: Model using implicit over-allele possibilities

또다른 구현예에서, 훈련 모듈(316)은 펩타이드 p ^k 에 대한 추정된 제시 가능성 u ^k 를 하기에 의해 모델링한다: In another embodiment, the training module 316 models the estimated suggested likelihood u ^k for the peptide p ^k by:

여기서, 요소 a _h ^k 는 펩타이드 서열 p ^k 와 관련된 다중 MHC 대립유전자 h∈ H의 경우 1이며, u' _k ^h 는 MHC 대립유전자 h에 대한 암시적인 과-대립유전자 제시 가능성이며, 벡터 v는 요소 v _h 가 a _h ^k · u' _k ^h 에 대응하는 벡터이며, s(·)는 v의 요소를 맵핑하는 함수이고 r(·)는 주어진 값으로 입력 값을 잘라내는 클리핑 함수이다. 이하에서, 보다 상세히 설명되는 바와 같이, s(·)는 합계 함수 또는 2차 함수일 수 있지만, 다른 구현예에서는 s(·)는 최대 함수와 같은 임의의 함수가 될 수 있다. 암시적 과-대립유전자 가능성에 대한 파라미터 세트 θ에 대한 값은 θ에 대한 손실 함수를 최소화하여 결정할 수 있으며, 여기서 i는 단일 MHC 대립유전자를 발현하는 세포 및/또는 다중 MHC 대립유전자를 발현하는 세포로부터 생성된 훈련 데이터(170)의 서브셋 S에 있는 각각의 사례이다. Here, element _h ^k a is 1 for multiple MHC alleles h∈ H related peptide sequence p ^_k, u ^'k ^h is implicit and for MHC allele h - a possibility suggested allele, vector v is element v _h is a vector corresponding to _{^{_{^{a h k · u 'k h}}}} , s (·) is a function r (·) that maps an element of v is a clipping function is to cut the input value to the given value. Hereinafter, as described in more detail, s(·) may be a sum function or a quadratic function, but in other implementations, s(·) may be any function such as a maximum function. The value for the parameter set θ for the implicit over-allele potential can be determined by minimizing the loss function for θ, where i is a cell expressing a single MHC allele and/or a cell expressing multiple MHC alleles Each example in subset S of training data 170 generated from.

수식 (17)의 제시 모델에서 제시 가능성은 가능성 펩타이드 p ^k 에 대응하는 각각이 개별 MHC 대립유전자 h에 의해 제시될 암시적인 과-대립유전자 제시 가능성의 함수

로 모델링된다. 암시적인 과-대립유전자 가능성은 암시적 과-대립유전자 가능성을 위한 파라미터가 제시된 펩타이드와 상응하는 MHC 대립유전자 사이의 직접적인 연관이 단일-대립유전자 설정 이외에 알려지지 않는, 다중 대립유전자 설정으로부터 학습될 수 있다는 점에서 섹션 Ⅷ.B의 과-대립유전자 제시 가능성과 구별된다. 따라서, 다중-대립유전자 설정에서 제시 모델은 펩타이드 p ^k 가 일련의 MHC 대립유전자 H의 세트에 의해 전반적으로 제시될 것이지만, MHC 대립유전자 h가 펩타이드 p ^k 로 제시될 가능성이 가장 높은 것을 나타내는 개별 가능성 u' _k ^h ^∈H 을 제공할 수도 있다. 이것의 장점은 제시 모델이 단일 MHC 대립유전자를 발현하는 세포에 대한 훈련 데이터없이 암시적 가능성을 생성할 수 있다는 점이다. In the presentation model of equation (17), the likelihood of presentation is a function of the implicit over-allele presentation potential, each of which will be presented by the individual MHC allele h corresponding to the likelihood peptide p ^k .

Is modeled as The implicit over-allele potential is that the direct association between the parameterized peptide for the implicit over-allele potential and the corresponding MHC allele can be learned from multiple allele settings, where the unknown is in addition to the single-allele setting. In this respect, it is distinguished from the possibility of the over-allele presentation in section X.B. Thus, in the multi-allele setup, the presentation model will show that the peptide p ^k will be presented overall by a set of MHC alleles H , but the individual likelihood that the MHC allele h is most likely to be presented as the peptide p ^k . It may provide u _'k ^h ^∈H. The advantage of this is that the presentation model can generate implicit possibilities without training data for cells expressing a single MHC allele.

나머지 명세서에서 언급된 특정한 일 구현예에서, r(·)은 범위 [0, 1]을 갖는 함수이다. 예를 들어 r(·)은 클립 함수일 수 있다: In one particular embodiment mentioned in the rest of the specification, r(·) is a function with the range [0, 1]. For example, r(·) can be a clip function:

여기서 z와 1 사이의 최소값이 제시 가능성 u _k 로 선택된다. 또 다른 구현예에서, r(·)은 하기의 경우 하기에 의해 주어진 쌍곡선 탄젠트 함수이다. Here, the minimum value between z and 1 is selected as the presentation probability u _k . In another embodiment, r(·) is a hyperbolic tangent function given by

여기서, 도메인 z에 대한 값이 0 이상일 때.Here, when the value for domain z is 0 or more.

ⅨⅨ .C.C .5. .5. 실시예Example 3.2: 함수3.2: Function -합계 모델-Total model

특정한 구현예에서, s(·)는 합계 함수이고, 제시 가능성은 암시적 과-대립유전자 제시 가능성을 합산함으로써 제공된다: In certain embodiments, s(·) is a sum function, and the likelihood of presentation is provided by summing the implicit over-allele presentation possibilities:

일 구현예에서, MHC 대립유전자 h 에 대한 암시적 과-대립유전자 제시 가능성은 하기에 의해 생성되어: In one embodiment, the likelihood of suggesting an over-allele for the MHC allele h is generated by:

제시 가능성이 하기에 의해 추정된다: The likelihood of presentation is estimated by:

식 (19)에 따르면, 펩타이드 서열 p ^k 가 하나 이상의 MHC 대립유전자 H에 의해 제시될 제시 가능성은 MHC 대립유전자 H 각각에 대해 펩타이드 서열 p ^k 의 암호화 버전에 함수 g _h (·)를 적용함으로써 생성될 수 있어, 대립유전자 상호작용 변수에 대한 상응하는 의존성 스코어를 생성한다. 각 의존성 스코어는 함수 f(·)에 의해 먼저 전환되어, 암시적인 과-대립유전자 제시 가능성 u' _k ^h 를 생성한다. 과-대립유전자 가능성 u' _k ^h 는 조합되며, 클리핑 함수를 조합된 가능성에 적용하여 값을 범위 [0, 1]로 클리핑하고 펩타이드 서열 p ^k 가 MHC 대립유전자 H의 세트에 의해 제시될 제시 가능성을 생성할 수 있다. 의존성 함수 g _h 는 섹션 Ⅷ.B.1에서 상기 소개된 임의의 의존성 함수 g _h 의 형태로 있을 수 있다. According to formula (19), the peptide sequence p ^k has one or more MHC alleles The presentation possibilities to be presented by H can be generated by applying the function g _h (·) to the coded version of the peptide sequence p ^k for each of the MHC alleles H, resulting in a corresponding dependence score for allele interaction variables. do. Each dependency score is first converted by the function f(·) , suggesting an implicit over-allele and it generates u _'k ^h. And - possible alleles possible u _'k ^h are combined, provided to be applied to a clipping function to the combined probability by clipping values in a range [0, 1], and presented by the set of peptide sequence p ^k has MHC allele H You can create The dependency function g _h may be in the form of any dependency function g _h introduced above in section VII.B.1.

예를 들어, 아핀 전환 함수 g _h (·)를 사용하여 m=4 상이한 확인된 MHC 대립유전자 중에서, 펩타이드 p ^k 가 MHC 대립유전자 h=2, h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있으며: For example, among the identified MHC alleles different from m=4 using the affine conversion function g _h (·) , the likelihood that the peptide p ^k will be presented by the MHC allele h=2, h=3 is generated by Can be:

여기서

_,

Is the identified allele-interaction variable for the MHC allele h=2, h= 3 ,

_,

Is a set of parameters determined for the MHC allele h=2, h=3 .

다른 예로서, 네트워크 전환 함수

여기서

는 MHC 대립유전자 h=2, h= 3에 대한 확인된 네트워크 모델이며, 및

Is the identified network model for the MHC allele h=2, h= 3 , and

Is a set of parameters determined for the MHC allele h=2, h=3 .

도 11은 예시적인 네트워크 모델

을 사용하여 MHC 대립유전자 h=2, h= 3와 관련하여 펩타이드 p ^k 에 대한 제시 가능성을 설명한다. 도 9에 도시된 바와 같이, 네트워크 모델 NN ₂ (·)는 MHC 대립유전자 h=2에 대한 대립유전자-상호작용 변수 x ₂ ^k 를 수신하고, 출력

를 생성하고, 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수 x ₃ ^k 를 수신하며, 출력 NN ₃ ( x ₃ ^k )를 생성한다. 각 출력은 함수 f(·)에 의해 맵핑되고, 조합되어 추정된 제시 가능성 u _k 를 생성한다. 11 is an exemplary network model

The possibility of presentation for the peptide p ^k in relation to the MHC alleles h=2, h= 3 is explained using. As shown in FIG. 9, the network model NN ₂ (·) receives the allele-interaction variable x ₂ ^k for the MHC allele h=2 and outputs it.

And the network model NN ₃ (·) receives the allele-interaction variable x ₃ ^k for the MHC allele h=3 , and produces the output NN ₃ ( x ₃ ^k ). Each output is mapped by the function f(·) , and combined to produce an estimated likelihood u _k .

또 다른 구현예에서 질량 분광분석 이온 전류의 로그에 대한 예측이 이루어지는 경우, r(·)은 로그 함수이고, f(·)는 지수 함수이다. In another embodiment, when a prediction is made for the logarithm of the mass spectroscopic ion current, r(·) is a logarithmic function, and f(·) is an exponential function.

ⅨⅨ .C.C .6. .6. 실시예Example 3.3: 대립유전자-비상호작용 변수가 있는 함수-합계 모델 3.3: Allele-function-sum model with non-interacting variables

제시 가능성(가능성)이 하기에 의해 생성된다: Presentability (possibility) is created by:

대립유전자 비상호작용 변수가 펩타이드 제시에 미치는 영향을 통합한다. Incorporate the effect of allele non-interaction variables on peptide presentation.

수식 (21)에 따르면, 펩타이드 서열 p ^k 가 하나 이상의 MHC 대립유전자 H에 의해 제시될 제시 가능성은 각 MHC 대립유전자 h에 대한 대립유전자 상호작용 변수에 대한 대응하는 의존성 스코어를 생성하기 위해 MHC 대립유전자 H의 각각에 대한 펩타이드 서열 p ^k 의 암호화 버전으로 함수 g _h (·)를 적용함으로써 생성될 수 있다. 대립유전자 비상호작용 변수에 대한 함수 g _w (·)는 또한 대립유전자 비상호작용 변수에 대한 의존성 스코어를 생성하기 위해 대립유전자 비상호작용 변수의 암호화 버전에도 적용된다. 대립유전자 비상호작용 변수에 대한 점수는 대립유전자 상호작용 변수에 대한 각각의 의존성 스코어와 조합된다. 조합된 각 스코어는 함수 f(·)로 전환되어 암시적 과-대립유전자 제시 가능성을 생성한다. 암시적 가능성은 조합되고, 클리핑 함수는 조합된 출력에 적용되어 값을 범위 [0, 1]로 클리핑하여 펩타이드 서열 p ^k 가 MHC 대립유전자 H에 의해 제시될 제시 가능성을 생성할 수 있다. 의존성 함수 g _w 는 의존성 함수 섹션 Ⅷ.B.3에서 위에 소개된 임의의 의존성 함수 g _w 의 형태로 있을 수 있다. According to equation (21), the likelihood that the peptide sequence p ^k will be presented by one or more MHC alleles H is the MHC allele to generate a corresponding dependency score for the allele interaction variable for each MHC allele h . It can be generated by applying the function g _h (·) as the coded version of the peptide sequence p ^k for each of H. The function g _w (·) for the allele non-interaction variable also applies to the encrypted version of the allele non-interaction variable to generate a dependency score for the allele non-interaction variable. The score for the allele non-interaction variable is combined with the respective dependency score for the allele interaction variable. Each combined score is converted to the function f (·), creating the possibility of suggesting an implicit over-allele. The implicit possibilities are combined, and a clipping function can be applied to the combined outputs to clip the values into the range [0, 1] to create a presentation probability that the peptide sequence p ^k will be presented by the MHC allele H. The dependency function g _w can be in the form of any dependency function g _w introduced above in the dependency function section VII.B.3.

예를 들어, 아핀 전환 함수

다른 예로서, 네트워크 전환 함수

도 12은 예시적인 네트워크 모델

, 및

를 사용하여 MHC 대립유전자 h=2, h=3과 관련하여 펩타이드 p ^k 에 대한 제시 가능성을 생성하는 것을 도시한다. 도 12에 도시된 바와 같이, 네트워크 모델 NN ₂ (·)은 MHC 대립유전자 h=2에 대한 대립유전자-상호작용 변수 x ₂ ^k 를 수신하며, 출력

를 생성한다. 네트워크 모델 NN _w (·)는 펩타이드 p ^k 에 대한 대립유전자-비상호작용 변수 w ^k 를 수신하고, 출력 NN _w (w ^k )을 생성한다. 출력은 함수 f(·)에 의해 조합되고 맵핑된다. 네트워크 모델 NN ₃ (·)은 MHC 대립유전자 h=3에 대한 대립유전자-상호작용 변수 x ₃ ^k 를 수신하고, 출력 NN ₃ ( x ₃ ^k )를 생성하며, 이는 동일한 네트워크 모델

의 출력

과 다시 조합하고, 함수 f(·)에 의해 맵핑된다. 두 출력은 조합되어, 추정된 제시 가능성 u _k 를 생성한다. 12 is an exemplary network model

, And

It is shown using to generate the potential for the presentation of the peptide p ^k in relation to the MHC allele h=2, h=3 . As shown in Figure 12, the network model NN ₂ (·) receives the allele-interaction variable x ₂ ^k for the MHC allele h=2 , and outputs

Produces The network model NN _w (·) receives the allele-non-interaction variable w ^k for the peptide p ^k and produces an output NN _w ( w ^k ). The outputs are combined and mapped by the function f(·) . The network model NN ₃ (·) receives the allele-interaction variable x ₃ ^k for the MHC allele h=3 and generates the output NN ₃ ( x ₃ ^k ), which is the same network model.

The output of

And recombined with, and mapped by function f(·) . The two outputs are combined to produce the estimated presentation probability u _k .

다른 구현예에서, MHC 대립유전자 h에 대한 암시적 과-대립유전자 제시 가능성은 하기에 의해 생성되어: In another embodiment, the potential for suggestive over-allele for the MHC allele h is generated by:

Ⅸ.C.7. Ⅸ.C.7. 실시예Example 4 : 2차4: 2nd 모델 Model

일 구현예에서, s(·)는 2차 함수이고, 펩타이드 p ^k 에 대한 추정된 제시 가능성 u _k 는 하기에 의해 제공된다: In one embodiment, s(·) is a quadratic function and the estimated likelihood u _k for the peptide p ^k is given by:

여기서, 요소 u' _k ^h 는 MHC 대립유전자 h에 대한 암시적 과-대립유전자 제시 가능성이다. 암시적 과-대립유전자 가능성에 대한 파라미터 θ의 세트에 대한 값은 θ에 대한 손실 함수를 최소화하여 결정할 수 있으며, 여기서 i는 단일 MHC 대립유전자를 발현하는 세포 및/또는 다중 MHC 대립유전자를 발현하는 세포로부터 생성된 훈련 데이터(170)의 서브셋 S에 있는 각각의 사례이다. 묵시적인 과-대립유전자 제시 가능성은 위에 기술된 수식 (18), (20), (22)에 나타난 어떤 형태이든 가능하다. Here, the element u _'k ^h is implicit, and for MHC alleles h - a possibility given allele. The value for the set of parameters θ for the implicit over-allele potential can be determined by minimizing the loss function for θ, where i expresses cells expressing a single MHC allele and/or multiple MHC alleles Each instance in subset S of training data 170 generated from cells. The possibility of implicit over-allele presentation is possible in any of the forms shown in equations (18), (20) and (22) described above.

일 양태에서, 수식 (23)의 모델은 2개의 MHC 대립유전자에 의해 펩타이드 p ^k 가 동시에 제시될 가능성이 있음을 암시할 수 있으며, 2개의 HLA 대립유전자에 의한 제시는 통계적으로 독립적이다. In one aspect, the model of formula (23) may suggest that the peptide p ^k is likely to be presented simultaneously by two MHC alleles, and presentation by the two HLA alleles is statistically independent.

수식 (23)에 따르면, 펩타이드 서열 p ^k 가 하나 이상의 MHC 대립유전자 H에 의해 제시될 제시 가능성은 암시적 과-대립유전자 제시 가능성을 조합하고 각 쌍의 MHC 대립유전자가 합산으로부터 펩타이드 p ^k 를 동시에 제시할 가능성을 빼서, MHC 대립유전자 H에 의해 펩타이드 서열 p ^k 가 제시될 제시 가능성을 생성하여 생성될 수 있다According to formula (23), the likelihood that the peptide sequence p ^k will be presented by one or more MHC alleles H combines the implicit over-allele presentation possibilities and each pair of MHC alleles simultaneously combines the peptide p ^k from summation. Subtracting the possibility of presentation, the MHC allele H can be generated by generating the presentation probability that the peptide sequence p ^k will be presented

예를 들어, 아핀 전환 함수 g _h (·)를 사용하여 m=4 상이한 확인된 HLA 대립유전자 중에서 펩타이드 p ^k 가 HLA 대립유전자 h=2, h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있다: For example, among the identified HLA alleles different from m=4 using the affine conversion function g _h (·) , the likelihood that the peptide p ^k will be presented by the HLA allele h=2, h=3 will be generated by Can:

여기서,

는 HLA 대립유전자 h=2, h=3에 대해 확인된 대립유전자-상호작용 변수이며,

은 HLA 대립유전자 h=2, h=3에 대해 결정된 파라미터 세트이다. here,

Is the allele-interaction variable identified for the HLA allele h=2, h=3 ,

Is a set of parameters determined for the HLA allele h=2, h=3 .

다른 예로서, 네트워크 전환 함수

를 사용하여 m=4 상이한 확인된 HLA 대립유전자 중에서 펩타이드 p ^k 가 HLA 대립유전자 h=2, h=3에 의해 제시될 가능성은 하기에 의해 생성될 수 있다: As another example, the network conversion function

Among the identified HLA alleles different from m=4 , the likelihood that the peptide p ^k will be presented by the HLA allele h=2, h=3 can be generated by:

여기서,

는 HLA 대립유전자 h=2, h=3, 에 대해 확인된 네트워크 모델이며,

HLA allele h=2, h=3 , Is the network model identified for

Is a set of parameters determined for the HLA allele h=2, h=3 .

X. X. 실시예Example 5: 예측5: Forecast 모듈 module

예측 모듈(320)은 서열 데이터를 수신하고, 제시 모델을 사용하여 서열 데이터 내의 후보 신생항원을 선택한다. 구체적으로, 서열 데이터는 환자의 종양 조직 세포로부터 추출된 DNA 서열, RNA 서열 및/또는 단백질 서열일 수 있다. 상기 예측 모듈(320)은 상기 서열 데이터를 MHC-I에 대해 8 내지 15개의 아미노산 또는 MHC-II에 대해 6 내지 30개의 아미노산을 갖는 복수의 펩타이드 서열 p ^k 로 처리한다. 예를 들어, 예측 모듈(320)은 주어진 서열 "IEFROEIFJEF"를 9개의 아미노산을 갖는 3개의 펩타이드 서열 "IEFROEIFJ", "EFROEIFJE" 및 "FROEIFJEF"로 처리할 수 있다. 일 구현예에서, 예측 모듈(320)은 환자의 정상 조직 세포로부터 추출한 서열 데이터와 환자의 종양 조직 세포로부터 추출한 서열 데이터를 비교하여 하나 이상의 돌연변이를 함유하는 부분을 동정함으로써 돌연변이된 펩타이드 서열인 후보 신생항원을 동정할 수 있다. Prediction module 320 receives the sequence data and uses the presentation model to select candidate neoantigens within sequence data. Specifically, the sequence data may be DNA sequences, RNA sequences and/or protein sequences extracted from a patient's tumor tissue cells. The prediction module 320 processes the sequence data into a plurality of peptide sequences p ^k having 8 to 15 amino acids for MHC-I or 6 to 30 amino acids for MHC-II. For example, the prediction module 320 can process the given sequence “IEFROEIFJEF” with the three peptide sequences “IEFROEIFJ”, “EFROEIFJE” and “FROEIFJEF” with 9 amino acids. In one embodiment, the prediction module 320 compares the sequence data extracted from the patient's normal tissue cells with the sequence data extracted from the patient's tumor tissue cells to identify a portion containing one or more mutations, thereby generating candidate candidates that are mutated peptide sequences. Antigens can be identified.

예측 모듈(320)은 처리된 펩타이드 서열에 하나 이상의 제시 모델을 적용하여, 펩타이드 서열의 제시 가능성을 추정한다. 구체적으로, 예측 모듈(320)은 후보 신생항원에 제시 모델을 적용함으로써 종양 HLA 분자 상에 제시될 가능성이 있는 하나 이상의 후보 신생항원 서열을 선택할 수 있다. 일 구현예에서, 예측 모듈(320)은 미리 결정된 임계치를 초과하는 추정된 제시 가능성을 갖는 후보 신생항원 서열을 선택한다. 다른 구현예에서, 제시 모델은 가장 높은 추정된 제시 가능성을 갖는 v개의 후보 신생항원 서열을 선택한다 (여기서, v은 일반적으로 백신내에 전달될 수 있는 에피토프의 최대 개수임). 주어진 환자에 대해 선택된 후보 신생항원을 포함하는 백신은 환자에게 주사되어 면역 반응을 유도할 수 있다. The prediction module 320 estimates the possibility of presenting the peptide sequence by applying one or more presentation models to the processed peptide sequence. Specifically, the prediction module 320 may select one or more candidate neoantigen sequences that are likely to be presented on the tumor HLA molecule by applying a presentation model to the candidate neoantigen. In one implementation, the prediction module 320 selects candidate neoantigen sequences with an estimated likelihood of presentation that exceeds a predetermined threshold. In other embodiments, the presentation model selects v candidate neoantigen sequences with the highest estimated presentation potential, where v is generally the maximum number of epitopes that can be delivered in the vaccine. Vaccines containing candidate neoantigens selected for a given patient can be injected into the patient to elicit an immune response.

XI. 실시예 6: 카세트 설계 모듈XI. Example 6: Cassette design module

XI.A. 개요XI.A. summary

카세트 설계 모듈 324는 환자에게 주입하기 위한 v 선택된 후보 펩타이드에 기초하는 백신 카세트 서열을 생성한다. 구체적으로, 용량 v의 백신에 포함하기 위해 선택된 펩타이드 p ^k , k=1, 2, …, v의 세트에 대해, 카세트 서열은 각각 상응하는 펩타이드 p ^k 의 서열을 포함하는 일련의 치료적 에피토프 서열 p' ^k , k=1, 2, …, v를 연결함으로써 제공된다. 카세트 설계 모듈 324는 서로 직접적으로 인접한 에피토프를 연결할 수 있다. 예를 들어, 백신 카세트 C는 하기와 같이 표현될 수 있다:Cassette design module 324 generates a vaccine cassette sequence based on v selected candidate peptides for injection into a patient. Specifically, the peptides p ^k , k=1, 2,… selected for inclusion in the vaccine at dose v . For a set of , v , the cassette sequence is a series of therapeutic epitope sequences p′ ^k , k=1, 2,… each comprising the sequence of the corresponding peptide p ^k . , v by linking. The cassette design module 324 can connect epitopes directly adjacent to each other. For example, vaccine cassette C can be expressed as follows:

여기서 p' ^ti 는 카세트의 i번째 에피토프를 나타낸다. 따라서, t _i 는 카세트의 i번째 위치에서 선택된 펩타이드에 대한 지수 k=1, 2, …, v에 상응한다. 카세트 설계 모듈 324는 에피토프를 인접한 에피토프 사이에서 하나 이상의 링커 서열을 연결할 수 있다. 예를 들어, 백신 카세트 C는 하기와 같이 나타낼 수 있다:Wherein p ^'ti The It represents the i- th epitope of the cassette. Therefore, t _i is the index k=1, 2,… for the peptide selected at the i-th position of the cassette . , corresponding to v . Cassette design module 324 can link one or more linker sequences between adjacent epitopes. For example, vaccine cassette C can be represented as follows:

여기서 l _(ti,tj) 는 카세트의 i번째 에피토프 p' ^ti 와 j=i+1번째 에피토프 p' ^j=i+1 사이에 위치한 링커 서열을 나타낸다. 카세트 설계 모듈 324는 카세트의 상이한 위치에 어느 선택된 에피토프 p' ^k , k=1, 2, …, v, 뿐만 아니라 에피토프 사이에 위치한 임의의 링커 서열이 위치할지 결정한다. 카세트 서열 C는 본 명세서에서 기술된 임의의 방법에 기초한 백신으로서 로딩될 수 있다.Where l _{(ti, tj)} represents a linker sequence located between the i-th epitope p ^'ti = i + 1-th and j epitope p' ^{j = i + 1} of the cassette. Cassette Design module 324 is one selected epitope in different locations of the cassette ^{p 'k, k = 1,} 2, ... , v , as well as any linker sequences located between epitopes. Cassette sequence C can be loaded as a vaccine based on any of the methods described herein.

치료적 에피토프의 세트는 미리 결정된 임계치 이상 제시 가능성과 연관된 예측 모듈 320에 의해 결정된 선택된 펩타이드에 기초하여 생성될 수 있고, 여기서 제시 가능성은 제시 모델에 의해 결정된다. 그러나 다른 구현예에서, 치료적 에피토프의 세트는 복수의 방법 중 어느 하나 이상에 기초하여　(단독으로 또는 조합하여), 예를 들어, 환자의 HLA 부류 I 또는 부류 II 대립유전자에 대한 결합 친화성 또는 예측된 결합 친화성, 환자의 HLA 부류 I 또는 부류 II 대립유전자에 대한 결합 안정성 또는 예측된 결합 안정성, 무작위 샘플링 등에 기초하여 생성될 수 있다는　것이 이해된다. A set of therapeutic epitopes can be generated based on a selected peptide determined by prediction module 320 associated with a likelihood of presenting above a predetermined threshold, where the likelihood of presentation is determined by a presentation model. However, in other embodiments, the set of therapeutic epitopes is based on any one or more of a plurality of methods (alone or in combination), e.g., the binding affinity of the patient to the HLA class I or class II allele, or It is understood that it can be generated based on predicted binding affinity, binding stability to a patient's HLA class I or class II allele or predicted binding stability, random sampling, and the like.

일 구현예에서, 치료적 에피토프 p' ^k 는 선택된 펩타이드 p ^k 그 자체에 상응할 수 있다. 치료적 에피토프 p' ^k 는 선택된 펩타이드에 더하여 C- 및 / 또는 N-말단 플랭킹 서열을 또한 포함할 수 있다. 예를 들어, 카세트에 포함된 에피토프 p' ^k 서열 [ n ^k p ^k c ^k ]로서 나타낼 수 있고, 여기서 c ^k 는 선택된 펩타이드 p ^k 의 C-말단에 부착된 C-말단 플랭킹 서열이고, n ^k 는 선택된 펩타이드 p ^k 의 N-말단에 부착된 N-말단 플랭킹 서열이다. 본 명세서의 나머지 부분에 걸쳐 언급된 하나의 예에서, N- 및 C-말단 플랭킹 서열은 에피토프 공급원 단백질의 맥락에서 치료적 백신의 고유 N- 및 C-말단 플랭킹 서열이다. 본 명세서의 나머지 부분에 걸쳐 언급된 하나의 예에서, 치료적 에피토프 p' ^k 는 고정 길이 에피토프를 대표한다. 다른 예에서, 치료적 에피토프 p' ^k 는 가변 길이 에피토프를 대표할 수 있고, 여기서 에피토프의 길이는 예를 들어, C- 또는 N-플랭킹 서열의 길이에 따라 달라질 수 있다. 예를 들어, C-말단 플랭킹 서열 c ^k 및 N-말단 플랭킹 서열 n ^k 는 에피토프 p' ^k 에 대한 16가지 가능한 선택이 초래한, 다양한 길이의 2 내지 5개 잔기를 각각 가질 수 있다.In one embodiment, the therapeutically effective epitope p ^'k may correspond to the selected peptide p ^k itself. Therapeutic epitope p ^'k may also include a C- and / or N- terminal flanking sequences in addition to the selected peptides. For example, the epitope p 'sequence ^k [n ^k p ^k contained in the cassette c ^k] can be represented as where c ^k is selected peptide p is a C- terminal a C- terminal flanking sequence attached to the ^k, n ^k is the N- terminal flanking attached to the N- terminus of the selected peptide p ^k Sequence. In one example mentioned throughout the rest of the specification, the N- and C-terminal flanking sequences are the native N- and C-terminal flanking sequences of the therapeutic vaccine in the context of the epitope source protein. In the one mentioned throughout the rest of the description e.g., therapeutic epitopes p ^'k represents a fixed-length epitope. In another example, the therapeutically effective epitope p ^'k may represent the variable-length epitope, wherein the length of the epitope, for example, it may be changed according to the length of the C- or N- flanking sequences. For example, C- terminal flanking sequence c ^k and n ^k N- terminal flanking sequence may have 16 different selection results possible, two to five residues of various lengths for the epitope p ^'k, respectively.

카세트 설계 모듈 324는 카세트의 치료적 에피토프의 쌍 사이에 접합부에 걸친 접합 에피토프의 제시를 고려하여 카세트 서열을 생성한다. 접합 에피토프는 카세트에서 치료적 에피토프 및 링커 서열을 연결하는 과정으로 인해 카세트에서 생기는 신규한 비-자기이지만 관련이 없는 에피토프 서열이다. 접합 에피토프의 신규 서열은 카세트 그 자체의 치료적 에피토프와 상이하다. 에피토프 p' ^ti 및 p' ^tj 에 걸친 접합 에피토프는 치료적 에피토프 p' ^ti 의 서열 및 p' ^tj 그 자체와 상이한 p' ^ti 또는 p' ^tj 둘 다와 중첩되는 임의의 에피토프 서열을 포함할 수 있다. 구체적으로, 선택적 링커 서열 l ^(ti,tj) 이 있거나 없는 카세트의 에피토프 p' ^ti 및 인접 에피토프 p' ^tj 사이의 각각의 접합부는 n _(ti,tj) 접합 에피토프 e _n ^(ti,tj) , n=1, 2, …, n _(ti,tj) 와 관련될 수 있다. 접합 에피토프는 에피토프 p' ^ti 및 p' ^tj 둘 다와 적어도 부분적으로 중첩된 서열일 수 있거나 에피토프 p' ^ti 와 p' ^tj 사이에 위치한 링커 서열과 적어도 부분적으로 중첩되는 서열일 수 있다. 접합 에피토프는 MHC 부류 I, MHC 부류 II, 또는 둘 다에 의해 제시될 수 있다.Cassette design module 324 generates a cassette sequence taking into account the presentation of a conjugation epitope across the junction between pairs of therapeutic epitopes of the cassette. Conjugated epitopes are novel non-self but unrelated epitope sequences that occur in cassettes due to the process of linking therapeutic epitope and linker sequences in the cassette. The novel sequence of the conjugation epitope differs from the therapeutic epitope of the cassette itself. Bonding epitope spanning the epitope p ^'ti and p' ^tj is therapeutically effective epitope p may include any epitope sequence that overlaps with both the "sequence and p of ^ti ^'tj different to itself p' ^ti or p ^'tj . Specifically, selective linker sequence l ^{(ti, tj)} the epitope p ^'ti and adjacent epitopes p' each junction between the ^tj of either no cassette is n _{(ti, tj)} joined epitopes e _n ^{(ti, tj),} n =1, 2,… , n _(ti,tj) . Bonding epitope may be an epitope, p 'and p ^ti' ^tj both the sequences that is at least partially overlap with a linker sequence positioned between at least either be a partially overlapping epitope sequence p 'and p ^ti' ^tj. Conjugation epitopes can be presented by MHC class I, MHC class II, or both.

도 13은 2개의 예시 카세트 서열, 카세트 1 (C ₁ ) 및 카세트 2 (C ₂ )을 나타낸다. 각각의 카세트는 v=2의 백신 용량을 가지고, 치료적 에피토프 p' ^t1 = p ¹ = SINFEKL 및 p' ^t2 = p ² = LLLLLVVVV, 및 두 에피토프 사이의 링커 서열 l ^(t1,t2) = AAY를 포함한다. 구체적으로, 카세트 C ₁ 의 서열은 [ p ¹ l ^(t1,t2) p ² ]에 의해 주어지는 반면, 카세트 C ₂ 의 서열은 [ p ² l ^(t1,t2) p ¹ ]로 주어진다. 카세트 C ₁ 의 예시 접합 에피토프 e _n ^(1,2) 는 카세트에서 두 에피토프 p' ¹ 및 p' ² 에 걸친 EKLAAYLLL, KLAAYLLLLL, 및 FEKLAAYL과 같은 서열일 수 있고, 카세트에서 링커 서열 및 단일 선택된 에피토프에 걸친 AAYLLLLL 및 YLLLLLVVV과 같은 서열일 수 있다. 유사하게, 카세트 C ₂ 의 예시 접합 에피토프 e _m ^(2,1) 는 VVVVAAYSIN, VVVVAAY, 및 AYSINFEK와 같은 서열일 수 있다. 두 카세트 모두 서열 p ¹ , l ^(c1,c2) , 및 p ² 의 동일한 세트를 포함함에도 불구하고, 동정된 접합 에피토프의 세트는 카세트 내의 치료적 에피토프의 정렬된 순서에 따라 상이하다.13 shows two exemplary cassette sequences, Cassette 1 ( C ₁ ) and Cassette 2 ( C ₂ ). Each drawer is v = 2 has a capacity of vaccines, therapeutic epitopes p ^'t1 = p ¹ = SINFEKL and p' = p ² = ^t2 LLLLLVVVV, and linker sequence l ^{(t1, t2)} between two epitopes = Includes AAY. Specifically, the sequence of cassette C ₁ is given by [ p ¹ l ^(t1,t2) p ² ], while the sequence of cassette C ₂ is given by [ p ² l ^(t1,t2) p ¹ ]. Examples of cassette C ₁ junction epitopes e _n ^(1,2) may be in the same sequence and that the two epitopes EKLAAYLLL, KLAAYLLLLL, and FEKLAAYL over the p ^'1, and p' ² from the cassette, the linker sequence and a single epitope is selected from a cassette Can be sequences such as AAYLLLLL and YLLLLLVVV spanning. Similarly, the exemplary conjugation epitope e _m ^(2,1) of cassette C ₂ can be sequences such as VVVVAAYSIN, VVVVAAY, and AYSINFEK. Although both cassettes contain identical sets of sequences p ¹ , l ^(c1,c2) , and p ^2, the set of identified conjugation epitopes differs according to the order of sorting of therapeutic epitopes in the cassette.

카세트 설계 모듈 324는 환자에서 접합 에피토프가 제시되는 가능성을 감소시키는 카세트 서열을 생성한다. 구체적으로, 카세트가 환자에게 주입될 때, 접합 에피토프는 환자의 HLA 부류 I 또는 HLA 부류 II 대립유전자에 의해 제시될 잠재력을 가지고, 각각, CD8 또는 CD4 T-세포 반응을 자극한다. 이러한 반응은 접합 에피토프에 반응성인 T-세포는 치료적 이점을 가지지 않고, 항원 경쟁에 의해 카세트에서 선택된 치료적 에피토프에 반응성인 면역 반응을 약화시킬 수 있기 때문에 종종 바람직하지 않다.⁷⁶ Cassette design module 324 generates a cassette sequence that reduces the likelihood of conjugation epitopes present in the patient. Specifically, when a cassette is injected into a patient, the conjugation epitope has the potential to be presented by the patient's HLA class I or HLA class II allele, stimulating the CD8 or CD4 T-cell response, respectively. This reaction is often undesirable because T-cells that are responsive to the conjugation epitope do not have a therapeutic advantage and can attenuate an immune response that is responsive to the therapeutic epitope selected in the cassette by antigen competition. ⁷⁶

일 구현예에서, 카세트 설계 모듈 324는 하나 이상의 후보 카세트를 반복하고, 접합 에피토프의 제시 스코어가 수치상 임계치 미만인 카세트 서열과 관련된 카세트 서열을 결정한다. 접합 에피토프 제시 스코어는 카세트에서 접합 에피토프의 제시 가능성과 관련된 양이고, 더 높은 값의 접합 에피토프 제시 스코어는 카세트의 접합 에피토프가 HLA 부류 I 또는 HLA 부류 II 또는 둘 다에 의해 제시될 수 있는 더 높은 가능성을 나타낸다.In one embodiment, cassette design module 324 repeats one or more candidate cassettes and determines a cassette sequence associated with a cassette sequence whose presentation score of the conjugation epitope is numerically below a threshold. The conjugation epitope presentation score is an amount related to the likelihood of conjugation epitope presentation in the cassette, and the higher value of the conjugation epitope presentation score is the higher probability that the conjugation epitope of the cassette can be presented by HLA class I or HLA class II or both Indicates.

카세트 설계 모듈 324는 후보 카세트 서열 중에서 가장 낮은 접합 에피토프 제시 스코어와 관련된 카세트 서열을 결정할 수 있거나 미리 결정된 임계치 미만의 제시 스코어를 가지는 카세트 서열을 선택할 수 있다. 일 예에서, 주어진 카세트 서열 C의 제시 스코어는 카세트 C의 접합부와 각각 관련된 거리 매트릭 d( e _n ^(ti,tj) , n=1, 2, …, n _(ti,tj) ) = d _(ti,tj) 의 세트에 기초하여 결정된다. 구체적으로, 거리 매트릭 d _(ti,tj) 는 인접한 치료적 에피토프 p' ^ti 및 p' ^tj 의 쌍 사이에 걸쳐 있는 접합 에피토프 중 하나 이상이 제시될 가능성을 명시한다. 그런 다음 카세트 C에 대한 접합 에피토프 제시 스코어는 카세트 C에 대한 거리 매트릭의 세트에 대해 함수 (예를 들어, 합산, 통계 함수)를 적용함으로써 결정할 수 있다. 수학적으로, 제시 스코어는 하기로 주어진다: Cassette design module 324 can determine the cassette sequence associated with the lowest conjugation epitope presentation score among candidate cassette sequences or can select a cassette sequence with a presentation score below a predetermined threshold. In one example, the presented score of a given cassette sequence C is the distance metric d ( e _n ^(ti,tj) , n=1, 2, …, n _(ti,tj) ) = d _(ti _, respectively, associated with the junction of cassette C _,tj) . Specifically, the distance metric d _(ti,tj) is Adjacent therapeutic epitopes p ^'ti And p 'is indicated the possibility of presenting one or more of the bonding epitope that spans between the pair of ^tj. Then junction epitope presented score for the cassette C can be determined by applying the function (e.g., summing, statistical functions) for the set of distance metrics for the cassette C. Mathematically, the presentation score is given below:

여기서 h(ㆍ)는 각 접합부의 거리 매트릭을 스코어에 매핑하는(mapping) 일부 함수이다. 본 명세서의 나머지 부분에 걸쳐 언급된 하나의 특정 예에서, 함수 h(ㆍ)는 카세트의 거리 매트릭에 걸친 합산이다. Here, h (·) is a partial function of mapping the distance metric of each joint to a score. In one particular example mentioned throughout the rest of the specification, the function h (·) is the sum over the distance metric of the cassette.

카세트 설계 모듈 324는 하나 이상의 후보 카세트 서열을 반복할 수 있고, 후보 카세트에 대한 접합 에피토프 제시 스코어를 결정하고, 임계치 미만 접합 에피토프 제시 스코어와 관련된 최적 카세트 서열을 동정할 수 있다. 본 명세서의 나머지 부분에　걸쳐 언급된 하나의 특정 구현예에서, 접합부에 대해 주어진 거리 매트릭 d(ㆍ)은 제시 가능성의 합 또는 본 명세서의 섹션 Ⅶ 및 Ⅷ에서 기술된 제시 모델에 의해 결정된 바와 같은 제시된 접합 에피토프 예상 수치에 의해 주어질 수 있다. 그러나, 다른 구현예에서, 거리 매트릭은 다른 인자 단독 또는 상기 예시된 것과 같은 모델과 조합된 것으로부터 유래할 수 있고, 여기서 이들 다른 인자는 하기 중 임의의 하나 이상 (단독 또는 조합)으로부터 유래하는 거리 매트릭을 포함할 수 있다: HLA 결합 친화성 또는 안정성 측정 또는 HLA 부류 I 또는 HLA 부류 II에 대한 예측, 및 HLA 부류 I 또는 HLA 부류 II에 대한, HLA 질량 측정 또는 T-세포 에피토프 데이터에 대해 훈련된 제시 또는 면역원성 모델. 예를 들어, 거리 매트릭은 HLA 부류 I 및 HLA 부류 II 제시에 관한 정보를 조합할 수 있다. 예를 들어, 거리 매트릭은 임계치 미만의 결합 친화성을 가진 임의의 환자의 HLA 부류 I 또는 HLA 부류 II 대립유전자에 결합하는 예측된 접합 에피토프의 수일 수 있다. 다른 예시에서, 거리 매트릭은 임의의 환자의 HLA 부류 I 또는 HLA 부류 II 대립유전자에 의해 제시될 것으로 예측된 접합 에피토프의 예상된 수일 수 있다.Cassette design module 324 can repeat one or more candidate cassette sequences, determine conjugation epitope presentation scores for candidate cassettes, and identify optimal cassette sequences associated with sub-threshold conjugation epitope presentation scores. In one particular embodiment mentioned throughout the rest of the specification, the distance metric d (·) given for the joint is presented as determined by the sum of the possibilities of presentation or the presentation model described in Sections X and X of this specification. Conjugation epitopes can be given by estimates. However, in other embodiments, the distance metric may be derived from other factors alone or in combination with a model as exemplified above, where these other factors are distances derived from any one or more of the following (alone or in combination). The metrics can include: HLA binding affinity or stability measures or predictions for HLA class I or HLA class II, and trained on HLA mass measurements or T-cell epitope data for HLA class I or HLA class II. Presentation or immunogenicity model. For example, a distance metric can combine information on the presentation of HLA class I and HLA class II. For example, the distance metric can be the number of predicted conjugation epitopes that bind to the HLA Class I or HLA Class II alleles of any patient with a binding affinity below a threshold. In another example, the distance metric can be the expected number of conjugation epitopes predicted to be presented by any patient's HLA class I or HLA class II allele.

카세트 설계 모듈 324는 후보 카세트 서열에서 임의의 접합 에피토프가 그의 백신이 디자인되는 주어진 환자에 대하여 자가-에피토프인지 확인하기 위해 하나 이상의 후보 카세트 서열을 추가로 동정할 수 있다. 이를 달성하기 위해, 카세트 설계 모듈 324는 BLAST와 같은 공지된 데이터베이스에 대해 접합 에피토프를 동정한다. 일 구현예에서, 카세트 설계 모듈은 에피토프 t _j 의 N-말단에 대해 에피토프 t _i 를 연결하는 것은 접합 자가-에피토프의 형성을 유발하는 에피토프 t _i ,t _j 의 쌍에 대해 거리 매트릭 d _(ti,tj) 을 매우 큰 값 (예를 들어, 100)으로 설정함으로써 접합 자가-에피토프를 피하는 카세트를 설계하도록 구성될 수 있다.Cassette design module 324 can further identify one or more candidate cassette sequences to ensure that any conjugated epitope in the candidate cassette sequence is a self-epitope for a given patient whose vaccine is designed. To achieve this, cassette design module 324 identifies conjugation epitopes against known databases such as BLAST. In one implementation, the cassette design module is to connect the epitopes t _i for the N- terminus of the epitope t _j joining the self-drive for the pairs of epitopes t _i, t _j to cause the formation of epitope metric d _{(ti, tj)} can be configured to design a cassette that avoids the zygote self-epitope by setting it to a very large value (eg, 100).

도 13의 예시로 돌아와서, 카세트 설계 모듈 324는 (예를 들어) 예를 들어, MHC 부류 I에 대해 8 내지 15 아미노산, 또는 MHC 부류 II에 대해 9 내지 30 아미노산, 길이를 가지는 모든 가능한 접합 에피토프 e _n ^(t1,t2) = e _n ^(1,2) 의 제시 가능성의 합산으로 주어진 카세트 C ₁ 의 단일 접합부 (t ₁ ,t ₂ )에 대한 거리 매트릭 d _(t1,t2) = d _(1,2) = 0.39를 결정한다. 카세트 C ₁ 에 다른 접합부가 존재하지 않기 때문에, 카세트 C ₁ 에 대한 거리 매트릭에 걸친 합산인, 접합 에피토프 제시 스코어는 또한 0.39로 주어진다. 카세트 설계 모듈 324는 또한 MHC 부류 I에 대해 8 내지 15 아미노산, 또는 MHC 부류 II에 대해 9 내지 30 아미노산 길이를 가지는 모든 가능한 접합 에피토프 e _n ^(t1,t2) = e _n ^(2,1) 의 제시 가능성의 합산으로 주어진 카세트 C ₂ 의 단일 접합부에 대한 거리 매트릭 d _(t1,t2) = d _(2,1) = 0.068을 결정한다. 이러한 실시예에서, 카세트 C ₂ 에 대한 접합 에피토프 제시 스코어는 또한 단일 접합부의 거리 매트릭 0.068로 주어진다. 카세트 설계 모듈 324는 접합 에피토프 제시 스코어가 C ₁ 의 카세트 서열보다 낮기 때문에, C ₂ 의 카세트 서열을 최적 카세트로서 출력한다.Returning to the example of Figure 13, the cassette design module 324 (for example), for example, MHC class of all possible joint epitope for I having a 9 to 30 amino acids in length about 8 to 15 amino acids, or MHC class II e _n ^(t1,t2) = distance metric d _(t1,t2) = d _(1,2 ₎ for a single junction ( t ₁ ,t ₂ ) of cassette C ₁ given by the summation of the possible presentation of e _n ^(1,2) ₎ = 0.39. Since the cassette C ₁ does not have any other junction, summing the, proposed joint score epitope spanning the distance metrics for the cassette ₁ C is also given as 0.39. Cassette design module 324 also presents all possible conjugation epitopes e _n ^(t1,t2) = e _n ^(2,1) with 8 to 15 amino acids in length for MHC class I, or 9 to 30 amino acids in length for MHC class II The sum of possibilities determines the distance metric d _(t1,t2) = d _(2,1) = 0.068 for a single junction of cassette C ₂ . In this example, the junction epitope presentation score for Cassette C ₂ is also given as the distance metric of the single junction is 0.068. Cassette design module 324 outputs the cassette sequence of C ₂ as the optimal cassette because the conjugation epitope presentation score is lower than the cassette sequence of C ₁ .

카세트 설계 모듈 324는 무차별적 접근법을 수행할 수 있고 가장 작은 접합 에피토프 제시 스코어를 가지는 서열을 선택하기 위해 모든 또는 대부분의 후보 카세트 서열을 반복한다. 그러나, 이러한 후보 카세트의 수는 백신 v의 용량이 증가함에 따라 엄청나게 클 수 있다. 예를 들어, v=20 에피토프의 백신 용량에 대하여, 카세트 설계 모듈 324는 가장 낮은 접합 에피토프 제시 스코어를 가지는 카세트를 결정하기 위해 ~10¹⁸개의 가능한 후보 카세트를 반복해야 한다. 이러한 결정은 계산상 부담이 될 수 있고 (필요한 전산 처리 자원 측면에서), 때때로 카세트 설계 모듈 324가 환자에 대한 백신을 생성하기에 합리적인 시간 내에 완료하기 위해, 다루기 어려울 수 있다. 게다가, 각각의 후보 카세트에 대해 가능한 접합 에피토프를 계산하는 것은 훨씬 더 부담이 될 수 있다. 따라서, 카세트 설계 모듈 324는 무차별 접근법에 대한 후보 카세트 서열의 수보다 상당히 작은　복수의　후보 카세트 서열을　반복하는 방식에 기초하여 카세트 서열을 선택할　수 있다.Cassette design module 324 can perform a promiscuous approach and repeat all or most candidate cassette sequences to select sequences with the smallest conjugation epitope presentation scores. However, the number of such candidate cassettes can be enormously large as the dose of vaccine v increases. For example, for a vaccine dose of v=20 epitopes, cassette design module 324 should repeat ˜10 ¹⁸ possible candidate cassettes to determine the cassette with the lowest conjugation epitope presentation score. Such a decision can be computationally burdensome (in terms of computational resources required), and can sometimes be difficult to handle in order to complete the cassette design module 324 in a reasonable time to generate a vaccine for the patient. Moreover, calculating the possible conjugation epitope for each candidate cassette can be much more burdensome. Thus, the cassette design module 324 can select a cassette sequence based on the manner of repeating a plurality of candidate cassette sequences significantly less than the number of candidate cassette sequences for a promiscuous approach.

일 구현예에서, 카세트 설계 모듈 324는 무작위로 또는 적어도 의사-무작위로　생성된 후보 카세트의 서브셋을 생성하고, 카세트 서열로서 미리 결정된 임계치 미만 접합 에피토프 제시 스코어와 관련된 후보 카세트를 선택한다. 추가적으로, 카세트 설계 모듈 324는 카세트 서열로서 가장 낮은 접합 에피토프 제시 스코어를 가지는 서브셋으로부터 후보 카세트를 선택할 수 있다. 예를 들어, 카세트 설계 모듈 324는 v=20 선택된 에피토프의 세트에 대해 ~1백만 개의 후보 카세트의 서브셋을 생성할 수 있고, 가장 작은 접합 에피토프 제시 스코어를 가지는 후보 카세트를 선택할 수 있다. 무작위 카세트 서열의 서브셋을 생성하고 서브셋으로부터 낮은 접합 에피토프 제시 스코어를 가지는 카세트 서열을 선택하는 것이 무차별적 접근법에 비해 차선적일 수 있지만, 이는 계산 자원이 상당히 덜 요구하므로 그 구현은 기술적으로 실현 가능하다. 추가로, 이러한 더욱 효율적인 기술이 아니라 무차별적 방법을 수행하는 것은 접합 에피토프 제시 스코어에서 미미한 또는 심지어 무시할 만한 개선만을 초래할 것이므로, 자원 할당 관점에서 이를 가치가 없는 것으로 만든다.In one embodiment, cassette design module 324 generates a subset of candidate cassettes generated randomly or at least pseudo-randomly, and selects a candidate cassette associated with a pre-determined conjugation epitope presentation score as a cassette sequence. Additionally, cassette design module 324 can select a candidate cassette from a subset having the lowest conjugation epitope presentation score as the cassette sequence. For example, cassette design module 324 can generate a subset of ~1 million candidate cassettes for a set of v=20 selected epitopes, and can select candidate cassettes with the smallest conjugation epitope presentation scores. Generating a subset of random cassette sequences and selecting a cassette sequence with a low conjugation epitope presentation score from the subset may be suboptimal compared to a promiscuous approach, but this implementation is technically feasible since it requires significantly less computational resources. Additionally, performing this non-discriminatory method rather than this more efficient technique will result in only minor or even negligible improvement in the junction epitope presentation score, making it worthless in terms of resource allocation.

다른 구현예에서, 카세트 설계 모듈 324는 비대칭 순회 외판원 문제 (TSP)로서 카세트에 대한 에피토프 서열을 공식화함으로써 개선된 카세트 구성을 결정한다. 노드의 목록 및 노드의 각 쌍 사이의 거리가 주어지면, TSP는 각 노드를 정확히 한 번 방문하고 원래 노드로 돌아오는 가장 짧은 총 거리와 관련된 노드의 순서를 결정한다. 예를 들어, 서로 간의 거리가 알려진 A, B, 및 C가 주어지면, TSP의 솔루션은 각 도시를 정확히 한 번 방문하기 위해 이동한 총 거리가 가능한 경로 중에서 가장 작은, 도시의 닫힌 순서를 생성한다. TSP의 비대칭 버전은 노드의 쌍 사이의 거리가 비대칭일 때 노드의 최적 순서를 결정한다. 예를 들어, 노드 A로부터 B로 이동하기 위한“거리”는 노드 B로부터 노드 A로 이동하기 위한 “거리”와 상이할 수 있다.In another embodiment, cassette design module 324 determines improved cassette construction by formulating epitope sequences for cassettes as an asymmetric traversal salesman problem (TSP). Given a list of nodes and the distance between each pair of nodes, the TSP visits each node exactly once and determines the order of the nodes relative to the shortest total distance back to the original node. For example, given A, B, and C with known distances from each other, the solution of the TSP creates a closed sequence of cities, with the smallest total possible distance traveled to visit each city exactly once. . The asymmetric version of the TSP determines the optimal order of nodes when the distance between pairs of nodes is asymmetric. For example, the “distance” for moving from node A to B may be different from the “distance” for moving from node B to node A.

카세트 설계 모듈 324는 비대칭 TSP를 해결함으로써 개선된 카세트 서열을 결정하고, 여기서 각각의 노드는 치료적 에피토프 p' ^k 에 상응한다. 에피토프 p' ^k 에 상응하는 노드로부터 에피토프 p' ^m 에 상응하는 다른 노드까지의 거리는 접합 에피토프 거리 매트릭 d _(k,m) 에 의해 주어지는 반면, 에피토프 p' ^m 에 상응하는 노드로부터 에피토프 p' ^k 에 상응하는 노드까지의 거리는 거리 매트릭 d _(k,m) 과 상이할 수 있는 거리 매트릭 d _(m,k) 로 주어진다. 비대칭 TSP를 사용하여 개선된 최적 카세트에 대해 해결함으로써, 카세트 설계 모듈 324는 카세트의 에피토프 사이의 접합부에 걸쳐 감소된 제시 스코어를 유발하는 카세트 서열을 찾을 수 있다. 비대칭 TSP의 솔루션은 카세트의 접합부에 걸쳐 접합 에피토프 제시 스코어를 최소화하기 위해 카세트에서 연결되어야 하는 에피토프의 순서에 상응하는 치료적 에피토프의 순서를 나타낸다. 구체적으로, 치료적 에피토프의 세트 k=1, 2, …, v가 주어지면, 카세트 설계 모듈 324는 카세트에서 각각의 가능한 순서 쌍의 치료적 에피토프에 대한 거리 매트릭 d _(k,m) , k,m = 1, 2, …, v을 결정한다. 다시 말해서, 에피토프의 주어진 쌍 k, m에 대해, 이러한 거리 매트릭은 서로 상이할 수 있기 때문에, 에피토프 p' ^k 이후에 치료적 에피토프 p' ^m 를 연결하는 것에 대한 거리 매트릭 d _(k,m) 및에피토프 p' ^m 이후에 치료적 에피토프 p' ^k 를 연결하는 것에 대한 거리 매트릭 d _(m,k) 둘 다 결정된다.Cassette design module 324 are each node determines the sequence cassette improved by solving an asymmetric TSP, and here corresponds to the therapeutically effective epitope p ^'k. The epitope p street junction epitope distance matrix to the other nodes that correspond to the "from the node corresponding to the ^k epitope p ^'m d _{(k, m),} epitope p from the node corresponding to the ^m' On the other hand, the epitope p given by ^k The distance to the corresponding node is given by the distance metric d _(m,k) which can be different from the distance metric d _(k,m) . By solving for improved optimal cassettes using asymmetric TSPs, cassette design module 324 can find cassette sequences that result in reduced presentation scores across the junctions between the epitopes of the cassettes. The solution of the asymmetric TSP represents the sequence of therapeutic epitopes corresponding to the sequence of epitopes that must be linked in the cassette to minimize the junction epitope presentation score across the junction of the cassette. Specifically, a set of therapeutic epitopes k=1, 2,… Given , v , cassette design module 324 distance matrix d _(k,m) , k,m = 1, 2,… for each possible sequence pair of therapeutic epitopes in the cassette . , v That is, given a pair of epitopes k, because for m, this distance metric can be different from each other, the epitope p distance on connecting the ^m 'therapeutically effective epitope p after the ^k' metric d _{(k, m)} and After the epitope p ^'m Distance metric on connecting the therapeutic epitopes ^{_{p 'k d (m, k}} ) Both are decided.

카세트 설계 모듈 324는 정수 선형 프로그래밍 문제를 통해 비대칭 TSP를 해결한다. 구체적으로, 카세트 설계 모듈 324는 하기로 주어진

경로 매트릭스 P를 생성한다:Cassette design module 324 solves the asymmetric TSP through the integer linear programming problem. Specifically, the cassette design module 324 is given below

Create the path matrix P :

v x v 매트릭스 D는 비대칭 거리 매트릭스이고, 여기서 각각의 요소 D(k, m), k=1, 2, …, v; m=1, 2, …, v는 에피토프 p' ^k 로부터 에피토프 p' ^m 까지의 접합부에 대한 거리 매트릭에 상응한다. P의 열 k = 2, …, v는 원래 에피토프의 노드에 상응하는 반면, 열 1 및 행 1은 모든 다른 노드로부터 제로(0) 거리인 "고스트 노드"에 상응한다. 매트릭스로의 "고스트 노드"의 첨가는 백신 카세트가 원형이 아니라 선형이라는 개념을 암호화하므로, 첫 번째와 마지막 에피토프 사이에 접합부가 없다. 다시 말해서, 서열은 원형이 아니고, 첫 번째 에피토프는 서열에서 마지막 에피토프 이후에 연결되는 것으로 보이지 않는다.

을 에피토프 p' ^k 가 에피토프 p' ^m 의 N-말단에 연결된 지정 경로 (즉, 카세트의 에피토프-에피토프 접합부)가 있으면 그 값이 1이고 그렇지 않으면 0인 이진 변수를 나타내도록 한다. 추가로, E를 모든 v 치료적 백신 에피토프의 세트를 나타내도록 하고,

를 에피토프의 서브셋을 나타내도록 한다. 임의의 이러한 서브셋 S에 대해, out(S)를 k는 S의 에피토프이고 m은 E＼S의 에피토프인 에피토프-에피토프 접합부

의 수를 나타내도록 한다. 공지된 경로 매트릭스 P가 주어진 경우, 카세트 설계 모듈 324는 하기 완전 선형 프로그래밍 문제를 해결하는 경로 매트릭스 X를 발견하고: v x v matrix D is an asymmetric distance matrix, where each element D( k, m ), k=1, 2,… , v; m=1, 2,… , V is the epitope p ^'k From the corresponds to the distance metrics for the joint to the epitope p ^'m. Column k of P = 2,… , v corresponds to the node of the original epitope, while Column 1 and Row 1 correspond to the “ghost node” which is the zero distance from all other nodes. The addition of the "ghost node" to the matrix encodes the concept that the vaccine cassette is linear rather than circular, so there is no junction between the first and last epitope. In other words, the sequence is not circular, and the first epitope does not appear to be linked after the last epitope in the sequence.

The epitope p ^'k is the epitope p' is connected to the designated N- terminus of ^m paths (that is, the cassette epitope-epitope junctions) If the value is 1, and otherwise to indicate a binary 0 was variable. Additionally, let E represent a set of all v therapeutic vaccine epitopes,

Let denote a subset of epitopes. For any such subset S , out( S ) is an epitope-epitope junction where k is the epitope of S and m is the epitope of E＼S .

Let's indicate the number of Given the known path matrix P , cassette design module 324 finds path matrix X which solves the following full linear programming problem:

여기서 P _km 는하기 제한이 적용되는, 경로 매트릭스 P의 요소 P(k,m)를 나타낸다:Where P _km is Represents the element P( k,m ) of the path matrix P , to which the following restrictions apply:

처음의 두 가지 제한은 각각의 에피토프가 카세트에서 정확히 한 번 나타남을 보증한다. 마지막 제한은 카세트가 연결되었음을 보장한다. 다시 말해서, x에 의해 암호화된 카세트는 연결된 선형 단백질 서열이다.The first two limitations ensure that each epitope appears exactly once in the cassette. The last restriction ensures that the cassette is connected. In other words, the cassette encoded by x is a linked linear protein sequence.

식 (27)의 완전 선형 프로그래밍 문제에서 x _km , k,m = 1, 2, …, v+1 에 대한 솔루션은 접합 에피토프의 제시 스코어를 낮추는 카세트에 대한 하나 이상의 서열의 치료적 에피토프를 추론하기 위해 사용될 수 있는 노드 및 고스트 노드의 닫힌 서열을 나타낸다. 구체적으로, x _km = 1의 값은 "경로"가 노드 k 로부터 노드 m까지 존재함을 나타내고, 또는 다시 말해서, 치료적 에피토프 p' ^m 는 개선된 카세트 서열에서 치료적 에피토프 p' ^k 이후에 연결되어야 한다. x _km = 0의 솔루션은 이러한 경로가 존재하지 않음을 나타내고, 또는 다시 말해서, 치료적 에피토프 p' ^m 는 개선된 카세트 서열에서 치료적 에피토프 p' ^k 이후에 연결되지 않아야 한다. 집합적으로, 식 (27)의 선형 프로그래밍 문제에서 x _km 의 값은 노드 및 고스트 노드의 서열을 대표하고, 여기서 경로는 정확히 한 번만 각 노드에 들어가고 존재한다. 예를 들어, x _고스트,1 =1, x ₁₃ =1, x ₃₂ =1, 및 x _2,고스트 =1 (그렇지 않으면 0)의 값은 노드 및 고스트 노드의 서열 고스트→1→3→2→고스트를 나타낸다.In the complete linear programming problem in equation (27), x _km , k,m = 1, 2,… , The solution for v+1 represents a closed sequence of nodes and ghost nodes that can be used to infer a therapeutic epitope of one or more sequences to a cassette that lowers the presentation score of the conjugation epitope. Specifically, x _km = value of 1 is the "path" that indicates the presence up from node k node m, or in other words, therapeutic epitopes p ^'m is the therapeutic epitopes in the improved cassette sequence p' connection after the ^k Should be. _km x = 0 of the solution indicates that these paths do not exist, or in other words, therapeutic epitopes p must not be connected to the ^'m is therapeutically p epitopes in the improved cassette sequence' ^k later. Collectively, in the linear programming problem of equation (27), the value of x _km represents the sequence of the node and ghost node, where the path enters and exists at each node exactly once. For example, the values of x _{ghost, 1} = 1 , x ₁₃ = 1 , x ₃₂ = 1 , and x _{2, ghost} = 1 (otherwise 0) are the ghost of the node and the ghost node's sequence →1→3→2→ Ghost .

일단 서열이 해결되면, 카세트의 치료적 에피토프에 상응하는 원래 노드만으로 정제된 서열을 생성하기 위해 고스트 노드는 서열로부터 삭제된다. 정제된 서열은 제시 스코어를 개선하기 위해 선택된 에피토프가 카세트에서 연결되어야 하는 순서를 나타낸다. 예를 들어, 이전 단락의 실시예에서 계속하여, 고스트 노드는 정제된 서열 1→3→2를 생성하기 위해 삭제될 수 있다. 정제된 서열은 카세트에서 에피토프를 연결하는 하나의 가능한 방법, 즉 p ¹ →p ³ →p ² 를 나타낸다.Once the sequence is resolved, the ghost node is deleted from the sequence to generate a purified sequence with only the original node corresponding to the therapeutic epitope of the cassette. The purified sequence represents the order in which selected epitopes should be linked in the cassette to improve the presentation score. For example, continuing from the examples in the previous paragraph, ghost nodes can be deleted to generate purified sequences 1→3→2 . The purified sequence represents one possible way of linking epitopes in the cassette, p ¹ →p ³ →p ² .

치료적 에피토프 p' ^k 가 가변 길이 에피토프인 경우, 카세트 설계 모듈 324는 치료적 에피토프 p' ^k 및 p' ^m 의 상이한 거리에 상응하는 후보 거리 매트릭을 결정하고, 가장 작은 후보 거리 매트릭으로서 거리 매트릭 d _(k,m) 를 동정한다. 예를 들어, 에피토프 p' ^k =[ n ^k p ^k c ^k ] 및 p' ^m =[ n ^m p ^m c ^m ]는 (일 구현예에서) 2 내지 5 아미노산이 다를 수 있는 상응하는 N- 및 C-말단 플랭킹 서열을 각각 포함할 수 있다. 따라서, 에피토프 p' ^k 와 p' ^m 사이의 접합부는 접합부에 위치한 n ^k 의 4개의 가능한 길이 값 및 c ^m 의 4개의 가능한 길이 값에 기초한 접합 에피토프의 16개 상이한 세트와 관련된다. 카세트 설계 모듈 324는 각 세트의 접합 에피토프에 대한 후보 거리 매트릭을 결정할 수 있고, 가장 작은 값으로서 거리 매트릭 d _(k,m) 를 결정할 수 있다. 그런 다음 카세트 설계 모듈 324는 경로 매트릭스 P를 구성할 수 있고 카세트 서열을 결정하기 위해 식 (27)에서 완전 선형 프로그래밍 문제를 해결할 수 있다. "If the ^k is a variable-length epitope cassette design module 324 is therapeutically effective epitope p 'therapeutically effective epitope p ^k and p' ^m different determine corresponding candidate distance metric for the distance, and the smallest candidate distance metric as a distance metric d _(k,m) is identified. For example, the epitope ^{^{p 'k = [n k p}} k c ^k] and ^{^{p 'm = [n m p}} m c ^m ] may include the corresponding N- and C-terminal flanking sequences, each of which may differ from 2 to 5 amino acids (in one embodiment). Thus, the junction between epitopes p ^'k and p' is the n ^m ^k in the joint It is associated with 16 different sets of junction epitopes based on 4 possible length values and 4 possible length values of c ^m . The cassette design module 324 can determine the candidate distance metrics for each set of junction epitopes, and can determine the distance metric d _(k,m) as the smallest value. The cassette design module 324 can then construct a path matrix P The complete linear programming problem can be solved in equation (27) to determine the cassette sequence.

무작위 샘플링 접근법에 비해, 선형 프로그래밍 문제를 사용하여 카세트 서열을 해결하는 것은 백신에서 치료적 에피토프의 쌍에 각각 상응하는 v x (v-1) 거리 매트릭의 결정을 요구한다. 이러한 접근법을 통해 결정된 카세트 서열은 특히 생성된 후보 카세트 서열의 수가 큰 경우, 무작위 샘플링 접근법보다 상당히 적은 계산 자원을 잠재적으로 요구하는 상당히 적은 접합 에피토프의 제시를 가지는 서열을 유발할 수 있다.Compared to the random sampling approach, solving the cassette sequence using a linear programming problem requires the determination of v x ( v-1 ) distance metrics each corresponding to a pair of therapeutic epitopes in the vaccine. Cassette sequences determined through this approach can result in sequences with a presentation of significantly fewer conjugation epitopes potentially requiring significantly less computational resources than random sampling approaches, especially when the number of candidate cassette sequences generated is large.

XI.B. 무작위 샘플링 대 비대칭 TSP에 의해 생성된 카세트 서열에 대한 접합 에피토프 제시의 비교XI.B. Comparison of conjugation epitope presentation to cassette sequence generated by random sampling versus asymmetric TSP

v=20 치료적 에피토프를 포함하는 2개의 카세트 서열을 무작위 샘플링 1,000,000 순열에 의해 (카세트 서열 C ₁ ), 및 식 (27)의 완전 선형 프로그래밍 문제를 해결함으로써 (카세트 서열 C ₂ ) 생성하였다. 거리 매트릭, 및 따라서, 제시 스코어는 식 (14)에 기술된 제시 모델에 기초하여 결정되었고, 여기서 f는 시그모이드 함수이고, x _h ⁱ 는 펩타이드 p ⁱ 의 서열이고, g _h (ㆍ)는 신경망 함수이고, w는 플랭킹 서열, 펩타이드 p ⁱ 의 로그 킬로베이스 백만 당 전사체 (TPM), 펩타이드 p ⁱ 의 단백질의 항원성, 및 펩타이드 p ⁱ 의 기원의 샘플 ID를 포함하고, 플랭킹 서열의 g _w (ㆍ) 및 로그 TPM은 각각, 신경망 함수이다. g _h (ㆍ)에 대한 각각의 신경망 함수는 입력 치수 231 (패드 문자를 포함하는, 잔기 당 11 잔기 x 21자), 너비 256을 가지는 숨겨진 단일 층 다층 퍼셉트론 (MLP)의 하나의 출력 노드, 숨겨진 층의 정류 선형 단위 (ReLU) 활성화, 출력층의 선형 활성화, 및 훈련 데이터 세트에서 HLA 대립유전자 당 하나의 출력 노드를 포함하였다. 플랭킹 서열에 대한 신경망 함수는 입력 치수 210 (패드 문자를 포함하는, 잔기 당 N-말단 플랭킹 서열의 5 잔기 + C-말단 플랭킹 서열의 5 잔기 x 21자), 너비 32를 가지는 숨겨진 단일 층 MLP, 숨겨진 층의 ReLU 활성화 및 출력층의 선형 활성화였다. RNA 로그 TPM에 대한 신경망 함수는 입력 치수 1, 너비 16을 가지는 하나의 숨겨진 층 MLP, 숨겨진 층의 ReLU 활성화 및 출력층의 선형 활성화였다. 제시 모델을 HLA 대립유전자 HLA-A*02:04, HLA-A*02:07, HLA-B*40:01, HLA-B*40:02, HLA-C*16:02, 및 HLA-C*16:04를 위해 구성하였다. 2개의 카세트 서열의 제시된 접합 에피토프의 예상 수를 나타내는 제시 스코어를 비교하였다. 결과는 (27)의 식을 해결하여 생성된 카세트 서열에 대한 제시 스코어가 무작위 샘플링에 의해 생성된 카세트 서열에 대한 제시 스코어에 비해 ~4배 개선과 관련됨을 나타냈다.Two cassette sequences containing v =20 therapeutic epitopes were generated by random sampling 1,000,000 permutations (cassette sequence C ₁ ), and by solving the complete linear programming problem of equation (27) (cassette sequence C ₂ ). The distance metric, and thus the presentation score, was determined based on the presentation model described in equation (14), where f is a sigmoid function, x _h ⁱ is the sequence of the peptide p ⁱ , and g _h (·) neural network function and, w is from flanking sequence, peptide p ⁱ log kilobases transfer member (TPM), peptide p ⁱ of antigenicity, and the peptides of the protein containing the sample ID of the p ⁱ origin and flanking sequences per million G _w (·) and log TPM, respectively, are neural network functions. Each neural network function for g _h (·) has an input dimension of 231 (11 residues per residue x 21 characters, including pad characters), one output node of a hidden single layer multilayer perceptron (MLP) with a width of 256, hidden One output node per HLA allele was included in the rectification linear unit (ReLU) activation of the layer, linear activation of the output layer, and training data set. The neural network function for the flanking sequence is a hidden single with input dimension 210 (5 residues of N-terminal flanking sequence per residue, including pad letters + 5 residues of C-terminal flanking sequence x 21 characters), width 32 Layer MLP, hidden layer ReLU activation and output layer linear activation. The neural network functions for the RNA log TPM were one hidden layer MLP with input dimension 1 and width 16, ReLU activation of the hidden layer and linear activation of the output layer. Models presented are HLA alleles HLA-A*02:04, HLA-A*02:07, HLA-B*40:01, HLA-B*40:02, HLA-C*16:02, and HLA-C *Configured for 16:04. The presentation scores representing the expected number of presented conjugation epitopes of the two cassette sequences were compared. The results showed that the presentation score for the cassette sequence generated by solving the formula (27) was associated with a ~4 fold improvement over the presentation score for the cassette sequence generated by random sampling.

구체적으로, v=20 에피토프는 하기로 주어진다:Specifically, the v= 20 epitope is given below:

p' ¹ = YNYSYWISIFAHTMWYNIWHVQWNK p ^'1 = YNYSYWISIFAHTMWYNIWHVQWNK

p' ² = IEALPYVFLQDQFELRLLKGEQGNN p ^'2 = IEALPYVFLQDQFELRLLKGEQGNN

p' ³ = DSEETNTNYLHYCHFHWTWAQQTTV p ^'3 = DSEETNTNYLHYCHFHWTWAQQTTV

p' ⁴ = GMLSQYELKDCSLGFSWNDPAKYLR p ^'4 = GMLSQYELKDCSLGFSWNDPAKYLR

p' ⁵ = VRIDKFLMYVWYSAPFSAYPLYQDA p ^'5 = VRIDKFLMYVWYSAPFSAYPLYQDA

p' ⁶ = CVHIYNNYPRMLGIPFSVMVSGFAM p ^'6 = CVHIYNNYPRMLGIPFSVMVSGFAM

p' ⁷ = FTFKGNIWIEMAGQFERTWNYPLSL p ^'7 = FTFKGNIWIEMAGQFERTWNYPLSL

p' ⁸ = ANDDTPDFRKCYIEDHSFRFSQTMN p ^'8 = ANDDTPDFRKCYIEDHSFRFSQTMN

p' ⁹ = AAQYIACMVNRQMTIVYHLTRWGMK p ^'9 = AAQYIACMVNRQMTIVYHLTRWGMK

p' ¹⁰ = KYLKEFTQLLTFVDCYMWITFCGPD p ^'10 = KYLKEFTQLLTFVDCYMWITFCGPD

p' ¹¹ = AMHYRTDIHGYWIEYRQVDNQMWNT p ^'11 = AMHYRTDIHGYWIEYRQVDNQMWNT

p' ¹² = THVNEHQLEAVYRFHQVHCRFPYEN p ^'12 = THVNEHQLEAVYRFHQVHCRFPYEN

p' ¹³ = QTFSECLFFHCLKVWNNVKYAKSLK p ^'13 = QTFSECLFFHCLKVWNNVKYAKSLK

p' ¹⁴ = SFSSWHYKESHIALLMSPKKNHNNT p ^'14 = SFSSWHYKESHIALLMSPKKNHNNT

p' ¹⁵ = ILDGIMSRWEKVCTRQTRYSYCQCA p ^'15 = ILDGIMSRWEKVCTRQTRYSYCQCA

p' ¹⁶ = YRAAQMSKWPNKYFDFPEFMAYMPI p '= ¹⁶ YRAAQMSKWPNKYFDFPEFMAYMPI

p' ¹⁷ = PRPGMPCQHHNTHGLNDRQAFDDFV p ^'17 = PRPGMPCQHHNTHGLNDRQAFDDFV

p' ¹⁸ = HNIISDETEVWEQAPHITWVYMWCR p ^'18 = HNIISDETEVWEQAPHITWVYMWCR

p' ¹⁹ = AYSWPVVPMKWIPYRALCANHPPGT p ^'19 = AYSWPVVPMKWIPYRALCANHPPGT

p' ²⁰ = HVMPHVAMNICNWYEFLYRISHIGR. p ^'20 = HVMPHVAMNICNWYEFLYRISHIGR.

첫 번째 실시예에서, 1,000,000개의 상이한 후보 카세트 서열을 20개 치료적 에피토프를 사용하여 무작위로 생성하였다. 제시 스코어를 각각의 후보 카세트 서열에 대해 생성하였다. 가장 낮은 제시 스코어를 갖는 것으로 동정된 후보 카세트 서열은 제시된 접합 에피토프의 6.1 예상된 수의 제시 스코어를 가지는 하기와 같았다:In the first example, 1,000,000 different candidate cassette sequences were randomly generated using 20 therapeutic epitopes. Presentation scores were generated for each candidate cassette sequence. Candidate cassette sequences identified as having the lowest presentation score were as follows with the 6.1 expected number of presentation scores of the presented conjugation epitope:

C ₁ = THVNEHQLEAVYRFHQVHCRFPYENAMHYQMWNTYRAAQMSKWPNKYFDFPEFMAYMPICVHIYNNYPRMLGIPFSVMVSGFAMAYSWPVVPMKWIPYRALCANHPPGTANDDTPDFRKCYIEDHSFRFSQTMNIEALPYVFLQDQFELRLLKGEQGNNDSEETNTNYLHYCHFHWTWAQQTTVILDGIMSRWEKVCTRQTRYSYCQCAFTFKGNIWIEMAGQFERTWNYPLSLSFSSWHYKESHIALLMSPKKNHNNTQTFSECLFFHCLKVWNNVKYAKSLKHVMPHVAMNICNWYEFLYRISHIGRHNIISDETEVWEQAPHITWVYMWCRVRIDKFLMYVWYSAPFSAYPLYQDAKYLKEFTQLLTFVDCYMWITFCGPDAAQYIACMVNRQMTIVYHLTRWGMKYNYSYWISIFAHTMWYNIWHVQWNKGMLSQYELKDCSLGFSWNDPAKYLRPRPGMPCQHHNTHGLNDRQAFDDFV C ₁ = THVNEHQLEAVYRFHQVHCRFPYENAMHYQMWNTYRAAQMSKWPNKYFDFPEFMAYMPICVHIYNNYPRMLGIPFSVMVSGFAMAYSWPVVPMKWIPYRALCANHPPGTANDDTPDFRKCYIEDHSFRFSQTMNIEALPYVFLQDQFELRLLKGEQGNNDSEETNTNYLHYCHFHWTWAQQTTVILDGIMSRWEKVCTRQTRYSYCQCAFTFKGNIWIEMAGQFERTWNYPLSLSFSSWHYKESHIALLMSPKKNHNNTQTFSECLFFHCLKVWNNVKYAKSLKHVMPHVAMNICNWYEFLYRISHIGRHNIISDETEVWEQAPHITWVYMWCRVRIDKFLMYVWYSAPFSAYPLYQDAKYLKEFTQLLTFVDCYMWITFCGPDAAQYIACMVNRQMTIVYHLTRWGMKYNYSYWISIFAHTMWYNIWHVQWNKGMLSQYELKDCSLGFSWNDPAKYLRPRPGMPCQHHNTHGLNDRQAFDDFV

1,000,000개 무작위 서열의 중앙 값 제시 스코어는 18.3이었다. 실험은 제시된 접합 에피토프의 예상된 수가 무작위 샘플링된 카세트 중에서 카세트 서열을 동정함으로써 상당히 감소할 수 있음을 나타낸다.The median presentation score of 1,000,000 random sequences was 18.3. Experiments show that the expected number of conjugated epitopes presented can be significantly reduced by identifying cassette sequences among randomly sampled cassettes.

두 번째 실시예에서, 카세트 서열 C ₂ 를 식 (27)에서 완전 선형 프로그래밍 문제를 해결함으로써 동정하였다. 구체적으로, 치료적 에피토프의 쌍 사이의 각각의 잠재적 접합부의 거리 매트릭을 결정하였다. 거리 매트릭을 프로그래밍 문제에 대한 솔루션을 해결하기 위해 사용하였다. 이러한 접근법에 의해 동정된 카세트 서열은 1.7의 제시 스코어를 가지는 하기와 같았다:In the second example, the cassette sequence C ₂ was identified by solving the complete linear programming problem in equation (27). Specifically, the distance metric of each potential junction between pairs of therapeutic epitopes was determined. Distance metrics were used to solve solutions to programming problems. The cassette sequence identified by this approach was as follows with a presentation score of 1.7:

C ₂ = IEALPYVFLQDQFELRLLKGEQGNNILDGIMSRWEKVCTRQTRYSYCQCAHVMPHVAMNICNWYEFLYRISHIGRTHVNEHQLEAVYRFHQVHCRFPYENFTFKGNIWIEMAGQFERTWNYPLSLAMHYQMWNTSFSSWHYKESHIALLMSPKKNHNNTVRIDKFLMYVWYSAPFSAYPLYQDAQTFSECLFFHCLKVWNNVKYAKSLKYRAAQMSKWPNKYFDFPEFMAYMPIAYSWPVVPMKWIPYRALCANHPPGTCVHIYNNYPRMLGIPFSVMVSGFAMHNIISDETEVWEQAPHITWVYMWCRAAQYIACMVNRQMTIVYHLTRWGMKYNYSYWISIFAHTMWYNIWHVQWNKGMLSQYELKDCSLGFSWNDPAKYLRKYLKEFTQLLTFVDCYMWITFCGPDANDDTPDFRKCYIEDHSFRFSQTMNDSEETNTNYLHYCHFHWTWAQQTTVPRPGMPCQHHNTHGLNDRQAFDDFV C ₂ = IEALPYVFLQDQFELRLLKGEQGNNILDGIMSRWEKVCTRQTRYSYCQCAHVMPHVAMNICNWYEFLYRISHIGRTHVNEHQLEAVYRFHQVHCRFPYENFTFKGNIWIEMAGQFERTWNYPLSLAMHYQMWNTSFSSWHYKESHIALLMSPKKNHNNTVRIDKFLMYVWYSAPFSAYPLYQDAQTFSECLFFHCLKVWNNVKYAKSLKYRAAQMSKWPNKYFDFPEFMAYMPIAYSWPVVPMKWIPYRALCANHPPGTCVHIYNNYPRMLGIPFSVMVSGFAMHNIISDETEVWEQAPHITWVYMWCRAAQYIACMVNRQMTIVYHLTRWGMKYNYSYWISIFAHTMWYNIWHVQWNKGMLSQYELKDCSLGFSWNDPAKYLRKYLKEFTQLLTFVDCYMWITFCGPDANDDTPDFRKCYIEDHSFRFSQTMNDSEETNTNYLHYCHFHWTWAQQTTVPRPGMPCQHHNTHGLNDRQAFDDFV

카세트 서열 C ₂ 의 제시 스코어는 카세트 서열 C ₁ 의 제시 스코어에 비해 ~4배 개선, 및 1,000,000개의 무작위로 생성된 후보 카세트의 중앙값 제시 스코어에 비해 ~11배 개선을 나타냈다. 카세트 C ₁ 을 생성하기 위한 실행 시간은 단일 스레드의 2.30 GHz 인텔 제온 E5-2650 CPU 상에서 20초였다. 카세트 C ₂ 를 생성하기 위한 실행 시간은 단일 스레드의 동일한 CPU 상에서 1초였다. 따라서 이 실시예에서, 식 (27)의 선형 프로그래밍 문제를 해결함으로써 동정된 카세트 서열은 20배 감소된 계산 비용으로 ~4배 더 나은 솔루션을 생성한다.Presenting the game sequence of the cassette C ₂ exhibited improved to 11 times compared to 1-4 times improved, and the median score for the presented candidate cassette generated as random 1,000,000 compared to present the game on the cassette sequence C _1. The run time to create Cassette C ₁ was 20 seconds on a single threaded 2.30 GHz Intel Xeon E5-2650 CPU. The run time to create cassette C ₂ was 1 second on the same CPU in a single thread. Thus, in this example, the cassette sequence identified by solving the linear programming problem of equation (27) yields a ~4x better solution with a 20x reduced computational cost.

결과는 선형 프로그래밍 문제가, 잠재적으로 더 적은 계산 자원으로, 무작위 샘플링으로부터 동정된 것보다 더 낮은 수의 제시된 접합 에피토프를 가지는 카세트 서열을 잠재적으로 제공할 수 있음을 나타낸다.The results show that the linear programming problem can potentially provide a cassette sequence with a lower number of conjugated epitopes identified than from random sampling, potentially with less computational resources.

XI.C. MHC플러리(MHCflurry) 및 제시 모델에 의해 생성된 카세트 서열 선택을 위한 접합 에피토프 제시의 비교XI.C. Comparison of conjugation epitope presentation for cassette sequence selection generated by MHCflurry and presentation model

이 실시예에서, v=20 치료적 에피토프를 포함하는 카세트 서열을 무작위 샘플링 1,000,000 순열에 의해, 및 식 (27)의 완전 선형 프로그래밍 문제를 해결함으로써 생성된 종양/정상 엑솜 서열분석, 종양 전사체 서열분석 및 폐암 샘플의 HLA 유형화를 기반으로 선택하였다. 거리 매트릭, 및 따라서, 제시 스코어를 다양한 임계치 (예를 들어, 50-1000nM, 또는 이상, 또는 이하) 미만의 친화성을 가지는 환자의 HLA에 결합하기 위한, MHC플러리, HLA-펩타이드 결합 친화성 예측자에 의해 예측된 접합 에피토프의 수에 기초하여 결정하였다. 이 실시예에서, 치료적 에피토프로서 선택된 20개 비유사 체세포 돌연변이를 상기 섹션 XI.B의 제시 모델에 따른 순위 돌연변이에 의해 종양 샘플에서 동정된 98개 체세포 돌연변이 중으로부터 선택하였다. 그러나, 다른 구현예에서, 치료적 에피토프가 다른 기준; 예를 들어 안정성에 기초한 것, 또는 제시 스코어, 친화성 등과 같은 기준의 조합에 기초하여 선택될 수 있다는 것이 이해된다. 게다가, 백신 요구에 포함되기 위해 치료적 에피토프를 순위를 매기는 기준이 설계 모듈 324에서 사용되는 거리 매트릭 D(k, m)을 결정하기 위해 사용되는 기준과 동일하지 않다는 것이 이해된다.In this example, tumor/normal exome sequencing, tumor transcript sequences generated by random sampling 1,000,000 permutations of the cassette sequence containing v =20 therapeutic epitopes, and by solving the complete linear programming problem of equation (27) The selection was based on analysis and HLA typing of lung cancer samples. Distance metric, and thus, MHCFlury, HLA-peptide binding affinity, for binding the presentation score to HLA in patients with an affinity below various thresholds (e.g., 50-1000 nM, or above, or below) It was determined based on the number of conjugation epitopes predicted by the predictor. In this example, 20 non-similar somatic mutations selected as therapeutic epitopes were selected from among 98 somatic mutations identified in tumor samples by ranking mutations according to the presented model in Section XI.B above. However, in other embodiments, the therapeutic epitope has different criteria; It is understood that it may be selected based on, for example, stability, or a combination of criteria such as presentation score, affinity, and the like. Moreover, it is understood that the criteria for ranking therapeutic epitopes for inclusion in vaccine needs are not the same as those used to determine the distance metric D( k,m ) used in design module 324.

환자의 HLA 부류 I 대립유전자는 HLA-A*01:01, HLA-A*03:01, HLA-B*07:02, HLA-B*35:03, HLA-C*07:02, HLA-C*14:02였다.The patient's HLA class I alleles are HLA-A*01:01, HLA-A*03:01, HLA-B*07:02, HLA-B*35:03, HLA-C*07:02, HLA- C*14:02.

구체적으로 이 실시예에서, v=20 치료적 에피토프는 하기와 같다.Specifically, in this example, the v=20 therapeutic epitope is as follows.

SSTPYLYYGTSSVSYQFPMVPGGDRSSTPYLYYGTSSVSYQFPMVPGGDR

EMAGKIDLLRDSYIFQLFWREAAEPEMAGKIDLLRDSYIFQLFWREAAEP

ALKQRTWQALAHKYNSQPSVSLRDFALKQRTWQALAHKYNSQPSVSLRDF

VSSHSSQATKDSAVGLKYSASTPVRVSSHSSQATKDSAVGLKYSASTPVR

KEAIDAWAPYLPEYIDHVISPGVTSKEAIDAWAPYLPEYIDHVISPGVTS

SPVITAPPSSPVFDTSDIRKEPMNISPVITAPPSSPVFDTSDIRKEPMNI

PAEVAEQYSEKLVYMPHTFFIGDHAPAEVAEQYSEKLVYMPHTFFIGDHA

MADLDKLNIHSIIQRLLEVRGSMADLDKLNIHSIIQRLLEVRGS

AAAYNEKSGRITLLSLLFQKVFAQIAAAYNEKSGRITLLSLLFQKVFAQI

KIEEVRDAMENEIRTQLRRQAAAHTKIEEVRDAMENEIRTQLRRQAAAHT

DRGHYVLCDFGSTTNKFQNPQTEGVDRGHYVLCDFGSTTNKFQNPQTEGV

QVDNRKAEAEEAIKRLSYISQKVSDQVDNRKAEAEEAIKRLSYISQKVSD

CLSDAGVRKMTAAVRVMKRGLENLTCLSDAGVRKMTAAVRVMKRGLENLT

LPPRSLPSDPFSQVPASPQSQSSSQLPPRSLPSDPFSQVPASPQSQSSSQ

ELVLEDLQDGDVKMGGSFRGAFSNSELVLEDLQDGDVKMGGSFRGAFSNS

VTMDGVREEDLASFSLRKRWESEPHVTMDGVREEDLASFSLRKRWESEPH

IVGVMFFERAFDEGADAIYDHINEGIVGVMFFERAFDEGADAIYDHINEG

TVTPTPTPTGTQSPTPTPITTTTTVTVTPTPTPTGTQSPTPTPITTTTTV

QEEMPPRPCGGHTSSSLPKSHLEPSQEEMPPRPCGGHTSSSLPKSHLEPS

PNIQAVLLPKKTDSHHKAKGKPNIQAVLLPKKTDSHHKAKGK

하기 표의 이 실시예로부터의 결과는 3가지 예시 방법을 통해 발견된 바와 같은 임계치 열 (nM은 나노몰을 의미함)에서 값 미만의 친화성을 가지는 환자의 HLA에 결합하기 위해 MHC플러리에 의해 예측된 접합 에피토프의 수를 비교한다. 첫 번째 방법의 경우, 1s 실행 시간을 가지는 상기 기술된 순회 외판원 문제 (ATSP) 공식을 통해 최적 카세트를 발견하였다. 두 번째 방법의 경우, 최적 카세트를 1백만 무작위 샘플 이후 발견된 최고 카세트를 취하여 결정하였다. 세 번째 방법의 경우, 접합 에피토프의 중앙값의 수를 1백만 무작위 샘플에서 발견하였다.The results from this example in the table below are predicted by MHCflury to bind to HLA in patients with a value below the affinity in the threshold column (nM means nanomolar) as found through three exemplary methods. Compare the number of conjugated epitopes. For the first method, the optimal cassette was found through the above-described traveling salesman problem (ATSP) formula with 1s run time. For the second method, the optimal cassette was determined by taking the highest cassette found after 1 million random samples. For the third method, the median number of conjugation epitopes was found in 1 million random samples.

이 실시예의 결과는 복수의 기준 중 임의의 하나가 주어진 카세트 디자인이 설계 요건을 만족하는지 여부를 확인하기 위해 사용될 수 있음을 설명한다. 구체적으로, 이전 실시예에 의해 입증된 바와 같이, 복수의 후보 중에서 선택된 카세트 서열은 가장 낮은 접합 에피토프 제시 스코어, 또는 적어도 동정된 임계치 미만의 스코어를 가지는 카세트 서열에 의해 특정될 수 있다. 이러한 실시예는 주어진 카세트 디자인이 설계 요건을 만족하는지 여부를 특정하기 위해 사용될 수 있는, 다른 기준, 예를 들어 결합 친화성을 나타낸다. 이러한 기준에 대해, 임계치 결합 친화성 (예를 들어, 50-1000, 이상 또는 이하)은 카세트 설계 서열이 임계치 이상의 접합 에피토프의 일부 임계치 수보다 적어야 하며 (예를 들어, 0), 사용될 수 있는 복수의 방법 중 임의의 하나는 (예를 들어, 표에서 설명된 방법 1 내지 3) 주어진 후보 카세트 서열이 이러한 요건을 만족하는지 확인하기 위해 사용될 수 있다는 것과 같이 특정하여 설정될 수 있다. 이러한 예시 방법은 사용되는 방법에 따라, 임계치는 상이하게 설정될 필요가 있다는 것을 추가로 설명한다. 안정성에 기초한 것, 또는 제시 스코어, 친화성 등등과 같은 기준의 조합과 같은 다른 기준이 구상될 수 있다.The results of this embodiment demonstrate that any one of a plurality of criteria can be used to ascertain whether a given cassette design satisfies the design requirements. Specifically, as demonstrated by the previous example, the cassette sequence selected from the plurality of candidates can be characterized by a cassette sequence having the lowest conjugation epitope presentation score, or at least a score below an identified threshold. This example demonstrates other criteria, such as binding affinity, that can be used to specify whether a given cassette design meets design requirements. For this criterion, the threshold binding affinity (e.g., 50-1000, above or below) must have a cassette design sequence less than some threshold number of conjugation epitopes above the threshold (e.g., 0), multiples that can be used Any one of the methods of (e.g., methods 1 to 3 described in the table) can be specifically set as such that a given candidate cassette sequence can be used to confirm that it meets these requirements. This exemplary method further explains that the threshold needs to be set differently depending on the method used. Other criteria can be envisioned, such as those based on stability, or combinations of criteria such as presentation score, affinity, and the like.

다른 실시예에서, 동일한 카세트를 동일한 HLA 유형 및 앞서 이 섹션 (XI.C)의 20개 치료적 에피토프를 사용하여 생성하였지만, 결합 친화성 예측을 기반으로 한 거리 매트릭을 사용하는 대신에, 에피토프 m, k에 대한 거리 매트릭은 일련의 임계치 이상의 (0.005와 0.5 사이의 확률, 또는 이상, 또는 이하) 제시 확률을 가지는 환자의 HLA 부류 I 대립유전자에 의해 제시될 것으로 예측된 m과 k 접합부에 걸친 펩타이드의 수이고, 여기서 제시 확률은 상기 섹션 XI.B의 제시 모델에 의해 결정되었다. 이 실시예는 기준의 범위가 백신에서 사용하기 위해 주어진 후보 카세트 서열이 설계 요건을 만족하는지 확인하기 위해 고려될 것임을 추가로 설명한다.In another embodiment, the same cassette was generated using the same HLA type and the 20 therapeutic epitopes of this section (XI.C) previously, but instead of using a distance metric based on binding affinity prediction, epitope m , The distance metric for k is a peptide spanning the m and k junctions predicted to be presented by the HLA class I allele of the patient with a probability of presentation above a set of thresholds (probability between 0.005 and 0.5, or above, or below) , Where the probability of presentation was determined by the presentation model in section XI.B above. This example further explains that the range of criteria will be considered to ensure that the candidate cassette sequence given for use in the vaccine meets the design requirements.

상기 실시예는 후보 카세트 서열이 구현에 따라 다양할 수 있는지를 결정하기 위한 기준을 확인하였다. 이들 실시예 각각은 기준 위 또는 아래로 떨어지는 접합 에피토프의 수의 카운트가 후보 카세트 서열이 기준을 만족하는지 여부를 결정하기 위해 사용되는 카운트일 수 있음을 설명하였다. 예를 들어, 기준이 HLA에 대한 임계 결합 친화성을 만족하는 또는 초과하는 에피토프의 수인 경우, 후보 카세트 서열이 그 수보다 크거나 작은지 여부는 후보 카세트 서열이 백신을 위해 선택된 카세트로서 사용되기 위한 기준을 만족하는지 결정할 수 있다. 유사하게 기준이 임계치 제시 가능성을 초과하는 접합 에피토프의 수인 경우.This example confirmed the criteria for determining whether a candidate cassette sequence may vary depending on the implementation. Each of these examples demonstrated that the count of the number of conjugation epitopes falling above or below the reference can be a count used to determine whether a candidate cassette sequence satisfies the reference. For example, if the criterion is the number of epitopes that meet or exceed the critical binding affinity for HLA, whether the candidate cassette sequence is greater than or less than that number is required for the candidate cassette sequence to be used as the cassette selected for vaccine. You can decide if you meet the criteria. Similarly if the criterion is the number of conjugation epitopes exceeding the probability of presenting a threshold.

그러나, 다른 구현예에서, 카운팅 이외의 계산을 후보 카세트 서열이 설계 기준을 만족하는지 결정하기 위해 수행할 수 있다. 예를 들어, 일부 임계치의 초과하는/미치지 않는 에피토프의 카운트가 아니라, 이는 대신 임계치를 초과하거나 미치지 않는 접합 에피토프의 비율을 예를 들어 상위 X%의 접합 에피토프가 일부 임계치 Y 이상의 제시 가능성을 갖는지, 또는 접합 에피토프의 X% 퍼센트가 Z nM보다 작거나 큰 HLA 결합 친화성을 가지는지를 결정할 수 있다. 이들은 단지 예시이며, 일반적으로 기준은 개별 접합 에피토프의 임의의 속성, 또는 일부 또는 모든 접합 에피토프의 응집으로부터 유래된 통계를 기초로 할 수 있다. 여기서, X는 일반적으로 0과 100% 사이의 임의의 숫자일 수 있으며(예를 들어, 75% 이하) Y는 0과 1 사이의 임의의 값일 수 있고, Z는 궁금한 기준에 적합한 임의의 숫자일 수 있다. 이들 값은 경험적으로 결정될 수 있으며, 사용된 모델 및 기준, 뿐만 아니라 사용된 훈련 데이터의 질에 따라 다르다.However, in other embodiments, calculations other than counting can be performed to determine if a candidate cassette sequence meets design criteria. For example, it is not a count of the excess/non-trivial epitopes of some thresholds, but instead the ratio of conjugation epitopes that exceed or do not exceed the threshold, e.g. whether the top X% conjugation epitopes have the potential to present above some threshold Y, Alternatively, it can be determined whether the X% percentage of the conjugation epitope has an HLA binding affinity less than or greater than Z nM. These are only examples, and generally the criteria can be based on any attribute of the individual conjugation epitopes, or statistics derived from the aggregation of some or all conjugation epitopes. Here, X can generally be any number between 0 and 100% (e.g. 75% or less), Y can be any value between 0 and 1, and Z can be any number suitable for the criteria in question. Can. These values can be determined empirically and depend on the model and criteria used, as well as the quality of training data used.

이와 같이, 특정 측면에서, 높은 확률의 제시를 가지는 접합 에피토프는 제거될 수 있고; 낮은 확률의 제시를 가지는 접합 에피토프는 유지될 수 있으며; 밀접하게 결합하는 접합 에피토프, 즉, 1000nM 또는 500nM 미만 결합 친화성 또는 일부 다른 임계치를 가지는 접합 에피토프는 제거될 수 있고; 그리고/또는 약하게 결합하는 접합 에피토프, 즉,　1000nM 또는 500nM 이상 결합 친화성　또는 일부 다른 임계치를 가지는 접합 에피토프는 유지될 수 있다.As such, in certain aspects, conjugation epitopes with a high probability of presentation can be eliminated; Conjugation epitopes with low probability presentation can be maintained; Conjugated epitopes that bind tightly, ie conjugation epitopes having a binding affinity of less than 1000 nM or 500 nM or some other threshold can be removed; And/or a conjugated epitope that binds weakly, i.e., a binding affinity of at least 1000 nM or 500 nM or more, or some other threshold, can be maintained.

상기 실시예가 상기 기술된 제시 모델의 구현을 사용하여 후보 서열을 동정하였지만, 이들 원리는 카세트 서열에서의 배열을 위한 에피토프가 다른 유형의 모델 또한, 예를 들어 친화성, 안정성 등등을 기초로 하는 것을 기반을 두어 동정되는 구현에서 동일하게 적용된다.Although the above examples have identified candidate sequences using the implementation of the presentation model described above, these principles suggest that epitopes for alignment in cassette sequences are based on other types of models as well, eg, affinity, stability, etc. The same applies in implementations that are identified on the basis.

XI.D. 공유된 항원 및 공유된 신생항원에 대한 카세트 선택XI.D. Cassette selection for shared antigens and shared neoantigens

개별 환자를 위해 개인화된 백신에 대한 치료적 에피토프의 서브셋을 선택하기보다, 일련의 치료적 에피토프 서열 p' ^k , k=1, 2, …, v가 암 환자 집단에서 높은 제시 가능성과 관련된 에피토프의 세트일 수 있다. 예를 들어, 일련의 치료적 에피토프 서열은 암 환자에서 과발현된 것으로 동정된 유전자로부터의 서열인, 및 암 환자 집단에서 높은 제시 가능성과 관련된 공유된 항원 서열일 수 있다. 다른 실시예처럼, 일련의 치료적 에피토프 서열은 암 환자 집단에서 일반적인 드라이버 돌연변이와 관련된 서열인, 및 높은 제시 가능성과 관련된 공유된 신생항원 서열일 수 있다. 따라서, 개별 환자의 서열분석 데이터 및 HLA 대립유전자 유형에 기초하여 카세트의 치료적 에피토프 서열을 맞춤화하는 대신, 치료적 에피토프 서열은 복수의 환자 간에 공유될 수 있다.Selecting a subset of therapeutic epitope for personalized vaccine for each patient than, a series of therapeutic epitope sequence ^{p 'k, k = 1,} 2, ... , v may be a set of epitopes associated with high likelihood of presentation in a cancer patient population. For example, a series of therapeutic epitope sequences may be sequences from genes identified as overexpressed in cancer patients, and shared antigen sequences associated with high potential for presentation in a population of cancer patients. As in other embodiments, the sequence of therapeutic epitopes can be a sequence associated with a common driver mutation in a population of cancer patients, and a shared neoantigen sequence associated with high likelihood of presentation. Thus, instead of customizing the therapeutic epitope sequence of the cassette based on the individual patient's sequencing data and the HLA allele type, the therapeutic epitope sequence can be shared among multiple patients.

카세트 서열이 공유될 때, 에피토프 t _i 와 t _j 의 쌍 사이의 거리 매트릭 d _(ti,tj) 은 상응하는 HLA 대립유전자와 각각 관련된 서브-거리 매트릭의 가중 합으로서 결정될 수 있다. 구체적으로, 거리 매트릭 d _(ti,tj) 은 하기로 주어질 수 있다:When the cassette sequence is shared, the distance metric d _(ti,tj) between a pair of epitopes t _i and t _j can be determined as the weighted sum of the sub-distance metrics each associated with the corresponding HLA allele. Specifically, the distance metric d _(ti,tj) can be given as:

여기서 d _h , _(ti,tj) 는 HLA 대립유전자 h 상에 제시될 인접한 치료적 에피토프의 쌍 사이에 걸쳐 있는 하나 이상의 접합 에피토프 e _n ^(ti,tj) , n=1, 2, …, n _(ti,tj) 가능성을 특정화하는 서브-거리 매트릭이고, w _h 는 주어진 환자 집단에서 HLA 대립유전자 h의 유병률을 나타내는 가중치이다. 식 (28)에서와 같이 또는 HLA 대립유전자의 유병률을 접합 에피토프의 제시를 가중하는데 사용하는 임의의 다른 유사한 방식으로 거리 매트릭을 설정함으로써, 환자 집단에서 더욱 만연한 것으로 평가되는 HLA 대립유전자에 대한 접합 에피토프 제시를 감소시키는 카세트 서열을 선택할 수 있다.Where d _h , _(ti,tj) is one or more conjugated epitopes e _n ^(ti,tj) , n=1, 2,… spanning between pairs of adjacent therapeutic epitopes to be presented on the HLA allele h . _, N _{(ti, tj),} which characterize the possible sub-matrix and the distance, _h w is a weight representing the HLA allele prevalence of the gene h in a given patient population. Conjugation epitopes for HLA alleles that are evaluated to be more prevalent in the patient population, as in Eq. (28), or by setting the distance metrics in any other similar manner that uses the prevalence of the HLA allele to weight the presentation of the conjugation epitope Cassette sequences that reduce presentation can be selected.

HLA 대립유전자 h와 관련된 서브-거리 매트릭은 본 명세서의 섹션 Ⅶ 및 Ⅷ에서 기술된 제시 모델에 의해 결정된 바와 같은 HLA 대립유전자 h 상의 제시된 접합 에피토프의 제시 가능성 또는 예상된 수의 합으로 주어질 수 있다. 그러나, 다른 구현예에서, 서브-거리 매트릭은 다른 인자 단독으로 또는 상기 예시된 것과 같은 모델들과 조합하여 유래될 수 있으며, 여기서 이들 다른 인자는 하기 (단독 또는 조합) 중 임의의 하나 이상으로부터 서브-거리 매트릭이 유래되는 것을 포함할 수 있음을 이해해야 한다: HLA 부류 I 또는 HLA 부류 II에 대한 HLA 결합 친화성 또는 안정성 측정 또는 예측, 및 HLA 부류 I 또는 HLA 부류 II에 대한, HLA 질량 분석 또는 T-세포 에피토프 데이터 상에서 훈련된 제시 또는 면역원성 모델. 서브-거리 매트릭은 HLA 부류 I 및 HLA 부류 II 제시에 관한 정보를 조합할 수 있다. 예를 들어, 서브-거리 매트릭은 임계치 미만 결합 친화성을 가지는 임의의 환자의 HLA 부류 I 또는 HLA 부류 II 대립유전자에 결합할 것으로 예측된 접합 에피토프의 수일 수 있다. 다른 실시예에서, 서브-거리 매트릭은 임의의 환자의 HLA 부류 I 또는 HLA 부류 II 대립유전자에 의해 제시될 것으로 예측된 접합 에피토프의 예상된 수일 수 있다.Sub related HLA alleles h - distance metric can be given by the sum of the number of present potential or expected of HLA alleles h on a given joint epitope as determined by the present model described in Section Ⅶ and Ⅷ herein. However, in other embodiments, the sub-distance metrics can be derived from other factors alone or in combination with models such as those exemplified above, where these other factors are sub from any one or more of the following (alone or in combination). It should be understood that the distance metrics may include derived: HLA binding affinity or stability measurements or predictions for HLA class I or HLA class II, and HLA mass spectrometry or T for HLA class I or HLA class II -Presented or immunogenic models trained on cellular epitope data. The sub-distance metrics can combine information on the presentation of HLA class I and HLA class II. For example, the sub-distance metric can be the number of conjugation epitopes predicted to bind the HLA class I or HLA class II allele of any patient with a sub-threshold binding affinity. In other embodiments, the sub-distance metric may be the expected number of conjugation epitopes predicted to be presented by the HLA class I or HLA class II allele of any patient.

식 (28)에서 정의된 거리 매트릭에 기초하여, 카세트 설계 모듈 324는 하나 이상의 후보 카세트 서열을 반복할 수 있고, 후보 카세트에 대한 접합 에피토프 제시 스코어를 결정할 수 있고, 상기 섹션 XI.A에서 도입된 임의의 방법을 사용하여, 임계치 미만 접합 에피토프 제시 스코어와 관련된 광 카세트 서열을 동정할 수 있다.Based on the distance metric defined in equation (28), cassette design module 324 can repeat one or more candidate cassette sequences, determine the conjugation epitope presentation score for the candidate cassette, and is introduced in Section XI.A above. Using any method, the optical cassette sequence associated with a sub-threshold conjugation epitope presentation score can be identified.

XI.E. 공유된 항원 및 공유된 신생항원에 대한 무작위 샘플링 대 비대칭 TSP에 의해 생성된 카세트 서열에 대한 접합 에피토프 제시의 비교XI.E. Comparison of conjugation epitope presentation for cassette sequences generated by asymmetric TSP versus random sampling for shared antigens and shared neoantigens

이 실시예에서, 섹션 XI.C로부터 동일한 20개 치료적 에피토프를 사용하여 카세트를 생성하였고, 3가지 예시 방법에 의해 발견된 카세트 서열에 대한 접합 에피토프의 예상된 수를 비교하였다. 섹션 XI.C와 다르게, 거리 매트릭 및 거리 매트릭스를 식 (28)을 사용하여 결정하였다. 식 (28)에서 w _h 로 표시된, 대립유전자 빈도를 28 HLA-A, 43 HLA-B 및 23 HLA-C 대립유전자에 걸쳐 섹션 XI.B의 모델 훈련 샘플을 사용하여 계산하였다. 이들은 모델에 의해 지원되는 대립유전자였다. 각각의 유전자, HLA-A, HLA-B, 및 HLA-C에 대해 빈도를 개별적으로 계산하였다. 각각의 거리 매트릭을 상이한 임계치 확률에서 상응하는 대립유전자 빈도에 가중된 임계치 제시 가능성 이상인 제시된 접합 에피토프의 예상된 수를 기초로 하여 결정하였다. 섹션 XI.B와 유사하게, 첫 번째 방법에 대해, 최적 카세트를 상기 기술된 순회 외판원 문제 (ATSP) 공식을 통해 발견하였다. 두 번째 방법에 대해, 최적 카세트를 1백만 무작위 샘플 이후 발견된 최고 카세트를 취하여 결정하였다. 세 번째 방법에 대해, 접합 에피토프의 중앙값 수는 1백만 무작위 샘플에서 발견하였다. 구체적으로, ATSP 방법에 대한 거리 매트릭스는 대립유전자 빈도에 의해 가중된, 단일-대립유전자 거리 서브-매트릭스의 가중 합이다.In this example, cassettes were generated using the same 20 therapeutic epitopes from Section XI.C, and the expected number of conjugation epitopes for cassette sequences found by the three exemplary methods was compared. Unlike section XI.C, distance metrics and distance metrics were determined using equation (28). Allele frequencies, expressed as w _h in Eq. (28), were calculated using the model training samples from Section XI.B across the 28 HLA-A, 43 HLA-B and 23 HLA-C alleles. These were alleles supported by the model. The frequency was calculated individually for each gene, HLA-A, HLA-B, and HLA-C. Each distance metric was determined based on the expected number of conjugated epitopes presented above the probability of presenting a weighted threshold to the corresponding allele frequency at different threshold probabilities. Similar to section XI.B, for the first method, an optimal cassette was found through the circuit of the traveling salesman problem (ATSP) described above. For the second method, the optimal cassette was determined by taking the highest cassette found after 1 million random samples. For the third method, the median number of conjugation epitopes was found in 1 million random samples. Specifically, the distance matrix for the ATSP method is the weighted sum of the single-allele distance sub-matrix, weighted by the allele frequency.

상기 표에 나타낸 바와 같이, 결과는 더는 섹션 XI.C에서와 같이 정수 값이 아니며, 각각의 방법에서의 거리 매트릭이 대립유전자 빈도에 기초한 접합 에피토프의 가중 예측이기 때문에 거리 매트릭스가 더는 정수 값이 아니기 때문이다. 결과는 선형 프로그래밍 문제가 또한 무작위 샘플링, 및 잠재적으로 적은 계산 자원으로 확인된 것과 비교함으로써, 공유된 (신생-)항원 백신 카세트 패킹에 대해 제시된 접합 에피토프의 가능성을 많이 감소시킨 공유된 항원 또는 공유된 신생항원에 대한 카세트 서열을 제공할 수 있음을 나타낸다.As shown in the table above, the distance matrix is no longer an integer value because the result is no longer an integer value as in section XI.C, and the distance metric in each method is a weighted prediction of the junction epitope based on the allele frequency. Because it is not. Results show that the shared programming or shared antigens greatly reduced the likelihood of conjugation epitopes presented for shared (neo-)antigen vaccine cassette packing, by comparing linear programming problems to those identified with random sampling, and potentially fewer computational resources. It indicates that it is possible to provide a cassette sequence for a new antigen.

다른 실시예에서, 섹션 XI.C로부터 동일한 20개 치료적 에피토프를 사용하여 카세트를 생성하였고, 3가지 예시 방법에 의해 발견된 카세트 서열에 대한 접합 에피토프의 예상된 수를 MHC플러리를 사용하여 비교하였다. 거리 매트릭 및 거리 매트릭스를 식 (28)을 사용하여 결정하였다. 식 (28)에서 w _h 로 표시된, 대립유전자 빈도를 22 HLA-A, 27 HLA-B 및 9 HLA-C 대립유전자에 걸쳐 모델 훈련 샘플을 사용하여 계산하였다. 각각의 유전자, HLA-A, HLA-B, 및 HLA-C에 대해 빈도를 개별적으로 계산하였다. 각각의 거리 매트릭을 상이한 임계치 확률에서 상응하는 대립유전자 빈도에 가중된 임계치 제시 가능성 이상인 제시된 접합 에피토프의 예상된 수를 기초로 하여 결정하였다. 섹션 XI.B와 유사하게, 첫 번째 방법에 대해, 최적 카세트를 상기 기술된 순회 외판원 문제 (ATSP) 공식을 통해 발견하였다. 두 번째 방법에 대해, 최적 카세트를 1백만 무작위 샘플 이후 발견된 최고 카세트를 취하여 결정하였다. 세 번째 방법에 대해, 접합 에피토프의 중앙값 수는 1백만 무작위 샘플에서 발견하였다. 구체적으로, ATSP 방법에 대한 거리 매트릭스는 대립유전자 빈도에 의해 가중된, 단일-대립유전자 거리 서브-매트릭스의 가중 합이다.In another example, cassettes were generated using the same 20 therapeutic epitopes from Section XI.C, and the expected number of conjugated epitopes for the cassette sequences found by the three exemplary methods was compared using MHCFlori. Did. The distance metric and distance matrix were determined using equation (28). Allele frequencies, expressed as w _h in Eq. (28), were calculated using model training samples across the 22 HLA-A, 27 HLA-B and 9 HLA-C alleles. The frequency was calculated individually for each gene, HLA-A, HLA-B, and HLA-C. Each distance metric was determined based on the expected number of conjugated epitopes presented above the probability of presenting a weighted threshold to the corresponding allele frequency at different threshold probabilities. Similar to section XI.B, for the first method, an optimal cassette was found through the circuit of the traveling salesman problem (ATSP) described above. For the second method, the optimal cassette was determined by taking the highest cassette found after 1 million random samples. For the third method, the median number of conjugation epitopes was found in 1 million random samples. Specifically, the distance matrix for the ATSP method is the weighted sum of the single-allele distance sub-matrix, weighted by the allele frequency.

이 실시예의 결과는 복수의 기준 중 임의의 하나가 주어진 카세트 디자인이 설계 요건을 만족하는지 여부를 확인하기 위해 사용될 수 있음을 설명한다. 구체적으로, 이 실시예는 주어진 카세트 디자인이 공유된 항원 및 신생항원 백신 카세트를 위한 설계 요건을 만족하는지 여부를 특정하기 위해 사용될 수 있는 다른 기준, 예를 들어 결합 친화성을 나타낸다. 이러한 기준에 대해, 임계치 결합 친화성 (예를 들어, 50-1000, 또는 이상 또는 이하)은 카세트 설계 서열이 임계치 이상의 접합 에피토프의 일부 임계치 수보다 적어야 하며 (예를 들어, 0), 사용될 수 있는 복수의 방법 중 임의의 하나는 (예를 들어, 표에서 설명된 방법 1 내지 3) 주어진 후보 카세트 서열이 이러한 요건을 만족하는지 확인하기 위해 사용될 수 있다는 것과 같이 특정하여 설정될 수 있다. 이러한 예시 방법은 사용되는 방법에 따라, 임계치는 상이하게 설정될 필요가 있다는 것을 추가로 설명한다. 안정성에 기초한 것, 또는 제시 스코어, 친화성 등등과 같은 기준의 조합과 같은 다른 기준이 구상될 수 있다.The results of this embodiment demonstrate that any one of a plurality of criteria can be used to ascertain whether a given cassette design satisfies the design requirements. Specifically, this example demonstrates other criteria that can be used to specify whether a given cassette design satisfies the design requirements for the shared antigen and neoantigen vaccine cassettes, eg binding affinity. For this criterion, the threshold binding affinity (e.g., 50-1000, or greater or less) must have a cassette design sequence less than some threshold number of conjugation epitopes above the threshold (e.g., 0), which can be used. Any one of a plurality of methods (e.g., methods 1 to 3 described in the table) can be specifically set as such that a given candidate cassette sequence can be used to confirm that it meets these requirements. This exemplary method further explains that the threshold needs to be set differently depending on the method used. Other criteria can be envisioned, such as those based on stability, or combinations of criteria such as presentation score, affinity, and the like.

XII. 예시 컴퓨터XII. Example computer

도 14는 도 1 및 3에 도시된 개체들(entities)을 구현하기 위한 예시 컴퓨터(1400)를 도시한다. 컴퓨터(1400)는 칩셋(1404)에 연결된 적어도 하나의 프로세서(1402)를 포함한다. 칩셋(1404)은 메모리 컨트롤러 허브(1420) 및 입력/출력(I/O) 컨트롤러 허브(1422)를 포함한다. 메모리(1406) 및 그래픽 어댑터(1412)는 메모리 컨트롤러 허브(1420)에 연결되고, 디스플레이(1418)는 그래픽 어댑터(1412)에 연결된다. 저장 디바이스(1408), 입력 디바이스(1414), 및 네트워크 어댑터(1416)는 I/O 컨트롤러 허브(1422)에 연결된다. 컴퓨터(1400)의 다른 구현예는 상이한 구조를 갖는다. 14 shows an example computer 1400 for implementing the entities shown in FIGS. 1 and 3. Computer 1400 includes at least one processor 1402 coupled to chipset 1404. Chipset 1404 includes a memory controller hub 1420 and an input/output (I/O) controller hub 1422. Memory 1406 and graphics adapter 1412 are connected to memory controller hub 1420, and display 1418 is connected to graphics adapter 1412. Storage device 1408, input device 1414, and network adapter 1416 are connected to I/O controller hub 1422. Other implementations of the computer 1400 have different structures.

저장 디바이스(1408)는 하드 드라이브, 컴팩트 디스크 읽기전용 메모리(CD-ROM), DVD 또는 고체상 메모리 디바이스와 같은 일시적이지 않은 컴퓨터-판독가능한 저장 매체이다. 메모리(1406)는 프로세서(1402)에 의해 사용되는 지침 및 데이터를 유지한다. 입력 인터페이스(1414)는 터치 스크린 인터페이스, 마우스, 트랙볼, 또는 다른 유형의 포인팅 장치, 키보드 또는 일부 이들의 조합이며, 컴퓨터(1400)에 데이터를 입력하는데 사용된다. 일부 구현예에서, 컴퓨터(1400)는 사용자로부터의 제스처를 통해 입력 인터페이스(1414)로부터 입력(예를 들어, 명령)을 수신하도록 구성될 수 있다. 그래픽 어댑터(1412)는 이미지 및 다른 정보를 디스플레이(1418) 상에 디스플레이한다. 네트워크 어댑터(1416)는 컴퓨터(1400)를 하나 이상의 컴퓨터 네트워크에 연결시킨다. Storage device 1408 is a non-transitory computer-readable storage medium such as a hard drive, compact disc read-only memory (CD-ROM), DVD or solid-state memory device. Memory 1406 maintains instructions and data used by processor 1402. The input interface 1414 is a touch screen interface, mouse, trackball, or other type of pointing device, keyboard, or some combination thereof, and is used to input data into the computer 1400. In some implementations, the computer 1400 can be configured to receive input (eg, commands) from the input interface 1414 through gestures from the user. Graphics adapter 1412 displays images and other information on display 1418. Network adapter 1416 connects computer 1400 to one or more computer networks.

컴퓨터(1400)는 본원에 설명된 기능성을 제공하기 위한 컴퓨터 프로그램 모듈을 실행하도록 적응된다. 본 명세서에 사용된 바와 같이, 용어 "모듈(module)"은 특정한 기능을 제공하기 위해 사용되는 컴퓨터 프로그램 로직을 지칭한다. 따라서, 모듈은 하드웨어, 펌웨어 및/또는 소프트웨어로 구현될 수 있다. 일 구현예에서, 프로그램 모듈은 저장 장치(1408)에 저장되고, 메모리(1406)에 장입되며, 프로세서(1402)에 의해 실행된다. Computer 1400 is adapted to execute computer program modules to provide the functionality described herein. As used herein, the term "module" refers to computer program logic used to provide a particular function. Thus, the module can be implemented in hardware, firmware and/or software. In one implementation, program modules are stored in the storage device 1408, loaded into the memory 1406, and executed by the processor 1402.

도 1의 개체에 의해 사용되는 컴퓨터(1400)의 유형은 구현예 및 독립체에 의해 요구되는 처리 능력에 따라 달라질 수 있다. 예를 들어, 제시 확인 시스템(160)은 서버 팜(farm)과 같은 네트워크를 통해 서로 통신하는 단일 컴퓨터(1400) 또는 다중 컴퓨터(1400)에서 동작할 수 있다. 컴퓨터(1400)는 그래픽 어댑터(1412) 및 디스플레이(1418)와 같은, 상기 기술된 성분 중 일부가 빠질 수 있다. The type of computer 1400 used by the entity in FIG. 1 may vary depending on the implementation and processing power required by the entity. For example, the presentation verification system 160 may operate on a single computer 1400 or multiple computers 1400 communicating with each other over a network, such as a server farm. Computer 1400 may omit some of the components described above, such as graphics adapter 1412 and display 1418.

참고 문헌references

SEQUENCE LISTING <110> GRITSTONE ONCOLOGY, INC. <120> REDUCING JUNCTION EPITOPE PRESENTATION FOR NEOANTIGENS <130> 32669-41062/WO <140> PCT/US2018/062294 <141> 2018-11-21 <150> 62/590,045 <151> 2017-11-22 <160> 76 <170> PatentIn version 3.5 <210> 1 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 1 Tyr Val Tyr Val Ala Asp Val Ala Ala Lys 1 5 10 <210> 2 <211> 17 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 2 Tyr Glu Met Phe Asn Asp Lys Ser Gln Arg Ala Pro Asp Asp Lys Met 1 5 10 15 Phe <210> 3 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 3 Tyr Glu Met Phe Asn Asp Lys Ser Phe 1 5 <210> 4 <211> 11 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (3)..(3) <223> Pyrrolysine <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <400> 4 His Arg Xaa Glu Ile Phe Ser His Asp Phe Xaa 1 5 10 <210> 5 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Leu or Ile <220> <221> MOD_RES <222> (5)..(5) <223> Leu or Ile <220> <221> MOD_RES <222> (7)..(7) <223> Pyrrolysine <400> 5 Phe Xaa Ile Glu Xaa Phe Xaa Glu Ser Ser 1 5 10 <210> 6 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Pyrrolysine <400> 6 Asn Glu Ile Xaa Arg Glu Ile Arg Glu Ile 1 5 10 <210> 7 <211> 27 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (1)..(1) <223> Leu or Ile <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <220> <221> MOD_RES <222> (15)..(15) <223> Selenocysteine <220> <221> MOD_RES <222> (21)..(21) <223> Leu or Ile <220> <221> MOD_RES <222> (27)..(27) <223> Leu or Ile <400> 7 Xaa Phe Lys Ser Ile Phe Glu Met Met Ser Xaa Asp Ser Ser Xaa Ile 1 5 10 15 Phe Leu Lys Ser Xaa Phe Ile Glu Ile Phe Xaa 20 25 <210> 8 <211> 13 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (11)..(11) <223> Pyrrolysine <400> 8 Lys Asn Phe Leu Glu Asn Phe Ile Glu Ser Xaa Phe Ile 1 5 10 <210> 9 <211> 15 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Pyrrolysine <220> <221> MOD_RES <222> (14)..(14) <223> Leu or Ile <400> 9 Phe Xaa Glu Ile Phe Asn Asp Lys Ser Leu Asp Lys Phe Xaa Ile 1 5 10 15 <210> 10 <211> 16 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (16)..(16) <223> Leu or Ile <400> 10 Gln Cys Glu Ile Xaa Trp Ala Arg Glu Phe Leu Lys Glu Ile Gly Xaa 1 5 10 15 <210> 11 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Selenocysteine <400> 11 Phe Ile Glu Xaa His Phe Trp Ile 1 5 <210> 12 <211> 12 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (7)..(7) <223> Leu or Ile <220> <221> MOD_RES <222> (10)..(10) <223> Selenocysteine <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <400> 12 Phe Glu Trp Arg His Arg Xaa Thr Arg Xaa Xaa Arg 1 5 10 <210> 13 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Leu or Ile <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (8)..(8) <223> Leu or Ile <400> 13 Gln Ile Glu Xaa Xaa Glu Ile Xaa Glu 1 5 <210> 14 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <400> 14 Gln Cys Glu Ile Xaa Trp Ala Arg Glu 1 5 <210> 15 <211> 14 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Leu or Ile <220> <221> MOD_RES <222> (9)..(9) <223> Pyrrolysine <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <400> 15 Phe Xaa Glu Leu Phe Ile Ser Asx Xaa Ser Xaa Phe Ile Glu 1 5 10 <210> 16 <211> 11 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (9)..(9) <223> Leu or Ile <400> 16 Ile Glu Phe Arg Xaa Glu Ile Phe Xaa Glu Phe 1 5 10 <210> 17 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (9)..(9) <223> Leu or Ile <400> 17 Ile Glu Phe Arg Xaa Glu Ile Phe Xaa 1 5 <210> 18 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Pyrrolysine <220> <221> MOD_RES <222> (8)..(8) <223> Leu or Ile <400> 18 Glu Phe Arg Xaa Glu Ile Phe Xaa Glu 1 5 <210> 19 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (3)..(3) <223> Pyrrolysine <220> <221> MOD_RES <222> (7)..(7) <223> Leu or Ile <400> 19 Phe Arg Xaa Glu Ile Phe Xaa Glu Phe 1 5 <210> 20 <211> 7 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 20 Ser Ile Asn Phe Glu Lys Leu 1 5 <210> 21 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 21 Leu Leu Leu Leu Leu Val Val Val Val 1 5 <210> 22 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 22 Glu Lys Leu Ala Ala Tyr Leu Leu Leu 1 5 <210> 23 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 23 Lys Leu Ala Ala Tyr Leu Leu Leu Leu Leu 1 5 10 <210> 24 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 24 Phe Glu Lys Leu Ala Ala Tyr Leu 1 5 <210> 25 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 25 Ala Ala Tyr Leu Leu Leu Leu Leu 1 5 <210> 26 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 26 Tyr Leu Leu Leu Leu Leu Val Val Val 1 5 <210> 27 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 27 Val Val Val Val Ala Ala Tyr Ser Ile Asn 1 5 10 <210> 28 <211> 7 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 28 Val Val Val Val Ala Ala Tyr 1 5 <210> 29 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 29 Ala Tyr Ser Ile Asn Phe Glu Lys 1 5 <210> 30 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 30 Tyr Asn Tyr Ser Tyr Trp Ile Ser Ile Phe Ala His Thr Met Trp Tyr 1 5 10 15 Asn Ile Trp His Val Gln Trp Asn Lys 20 25 <210> 31 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 31 Ile Glu Ala Leu Pro Tyr Val Phe Leu Gln Asp Gln Phe Glu Leu Arg 1 5 10 15 Leu Leu Lys Gly Glu Gln Gly Asn Asn 20 25 <210> 32 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 32 Asp Ser Glu Glu Thr Asn Thr Asn Tyr Leu His Tyr Cys His Phe His 1 5 10 15 Trp Thr Trp Ala Gln Gln Thr Thr Val 20 25 <210> 33 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 33 Gly Met Leu Ser Gln Tyr Glu Leu Lys Asp Cys Ser Leu Gly Phe Ser 1 5 10 15 Trp Asn Asp Pro Ala Lys Tyr Leu Arg 20 25 <210> 34 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 34 Val Arg Ile Asp Lys Phe Leu Met Tyr Val Trp Tyr Ser Ala Pro Phe 1 5 10 15 Ser Ala Tyr Pro Leu Tyr Gln Asp Ala 20 25 <210> 35 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 35 Cys Val His Ile Tyr Asn Asn Tyr Pro Arg Met Leu Gly Ile Pro Phe 1 5 10 15 Ser Val Met Val Ser Gly Phe Ala Met 20 25 <210> 36 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 36 Phe Thr Phe Lys Gly Asn Ile Trp Ile Glu Met Ala Gly Gln Phe Glu 1 5 10 15 Arg Thr Trp Asn Tyr Pro Leu Ser Leu 20 25 <210> 37 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 37 Ala Asn Asp Asp Thr Pro Asp Phe Arg Lys Cys Tyr Ile Glu Asp His 1 5 10 15 Ser Phe Arg Phe Ser Gln Thr Met Asn 20 25 <210> 38 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 38 Ala Ala Gln Tyr Ile Ala Cys Met Val Asn Arg Gln Met Thr Ile Val 1 5 10 15 Tyr His Leu Thr Arg Trp Gly Met Lys 20 25 <210> 39 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 39 Lys Tyr Leu Lys Glu Phe Thr Gln Leu Leu Thr Phe Val Asp Cys Tyr 1 5 10 15 Met Trp Ile Thr Phe Cys Gly Pro Asp 20 25 <210> 40 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 40 Ala Met His Tyr Arg Thr Asp Ile His Gly Tyr Trp Ile Glu Tyr Arg 1 5 10 15 Gln Val Asp Asn Gln Met Trp Asn Thr 20 25 <210> 41 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 41 Thr His Val Asn Glu His Gln Leu Glu Ala Val Tyr Arg Phe His Gln 1 5 10 15 Val His Cys Arg Phe Pro Tyr Glu Asn 20 25 <210> 42 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 42 Gln Thr Phe Ser Glu Cys Leu Phe Phe His Cys Leu Lys Val Trp Asn 1 5 10 15 Asn Val Lys Tyr Ala Lys Ser Leu Lys 20 25 <210> 43 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 43 Ser Phe Ser Ser Trp His Tyr Lys Glu Ser His Ile Ala Leu Leu Met 1 5 10 15 Ser Pro Lys Lys Asn His Asn Asn Thr 20 25 <210> 44 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 44 Ile Leu Asp Gly Ile Met Ser Arg Trp Glu Lys Val Cys Thr Arg Gln 1 5 10 15 Thr Arg Tyr Ser Tyr Cys Gln Cys Ala 20 25 <210> 45 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 45 Tyr Arg Ala Ala Gln Met Ser Lys Trp Pro Asn Lys Tyr Phe Asp Phe 1 5 10 15 Pro Glu Phe Met Ala Tyr Met Pro Ile 20 25 <210> 46 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 46 Pro Arg Pro Gly Met Pro Cys Gln His His Asn Thr His Gly Leu Asn 1 5 10 15 Asp Arg Gln Ala Phe Asp Asp Phe Val 20 25 <210> 47 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 47 His Asn Ile Ile Ser Asp Glu Thr Glu Val Trp Glu Gln Ala Pro His 1 5 10 15 Ile Thr Trp Val Tyr Met Trp Cys Arg 20 25 <210> 48 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 48 Ala Tyr Ser Trp Pro Val Val Pro Met Lys Trp Ile Pro Tyr Arg Ala 1 5 10 15 Leu Cys Ala Asn His Pro Pro Gly Thr 20 25 <210> 49 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 49 His Val Met Pro His Val Ala Met Asn Ile Cys Asn Trp Tyr Glu Phe 1 5 10 15 Leu Tyr Arg Ile Ser His Ile Gly Arg 20 25 <210> 50 <211> 484 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic polypeptide <400> 50 Thr His Val Asn Glu His Gln Leu Glu Ala Val Tyr Arg Phe His Gln 1 5 10 15 Val His Cys Arg Phe Pro Tyr Glu Asn Ala Met His Tyr Gln Met Trp 20 25 30 Asn Thr Tyr Arg Ala Ala Gln Met Ser Lys Trp Pro Asn Lys Tyr Phe 35 40 45 Asp Phe Pro Glu Phe Met Ala Tyr Met Pro Ile Cys Val His Ile Tyr 50 55 60 Asn Asn Tyr Pro Arg Met Leu Gly Ile Pro Phe Ser Val Met Val Ser 65 70 75 80 Gly Phe Ala Met Ala Tyr Ser Trp Pro Val Val Pro Met Lys Trp Ile 85 90 95 Pro Tyr Arg Ala Leu Cys Ala Asn His Pro Pro Gly Thr Ala Asn Asp 100 105 110 Asp Thr Pro Asp Phe Arg Lys Cys Tyr Ile Glu Asp His Ser Phe Arg 115 120 125 Phe Ser Gln Thr Met Asn Ile Glu Ala Leu Pro Tyr Val Phe Leu Gln 130 135 140 Asp Gln Phe Glu Leu Arg Leu Leu Lys Gly Glu Gln Gly Asn Asn Asp 145 150 155 160 Ser Glu Glu Thr Asn Thr Asn Tyr Leu His Tyr Cys His Phe His Trp 165 170 175 Thr Trp Ala Gln Gln Thr Thr Val Ile Leu Asp Gly Ile Met Ser Arg 180 185 190 Trp Glu Lys Val Cys Thr Arg Gln Thr Arg Tyr Ser Tyr Cys Gln Cys 195 200 205 Ala Phe Thr Phe Lys Gly Asn Ile Trp Ile Glu Met Ala Gly Gln Phe 210 215 220 Glu Arg Thr Trp Asn Tyr Pro Leu Ser Leu Ser Phe Ser Ser Trp His 225 230 235 240 Tyr Lys Glu Ser His Ile Ala Leu Leu Met Ser Pro Lys Lys Asn His 245 250 255 Asn Asn Thr Gln Thr Phe Ser Glu Cys Leu Phe Phe His Cys Leu Lys 260 265 270 Val Trp Asn Asn Val Lys Tyr Ala Lys Ser Leu Lys His Val Met Pro 275 280 285 His Val Ala Met Asn Ile Cys Asn Trp Tyr Glu Phe Leu Tyr Arg Ile 290 295 300 Ser His Ile Gly Arg His Asn Ile Ile Ser Asp Glu Thr Glu Val Trp 305 310 315 320 Glu Gln Ala Pro His Ile Thr Trp Val Tyr Met Trp Cys Arg Val Arg 325 330 335 Ile Asp Lys Phe Leu Met Tyr Val Trp Tyr Ser Ala Pro Phe Ser Ala 340 345 350 Tyr Pro Leu Tyr Gln Asp Ala Lys Tyr Leu Lys Glu Phe Thr Gln Leu 355 360 365 Leu Thr Phe Val Asp Cys Tyr Met Trp Ile Thr Phe Cys Gly Pro Asp 370 375 380 Ala Ala Gln Tyr Ile Ala Cys Met Val Asn Arg Gln Met Thr Ile Val 385 390 395 400 Tyr His Leu Thr Arg Trp Gly Met Lys Tyr Asn Tyr Ser Tyr Trp Ile 405 410 415 Ser Ile Phe Ala His Thr Met Trp Tyr Asn Ile Trp His Val Gln Trp 420 425 430 Asn Lys Gly Met Leu Ser Gln Tyr Glu Leu Lys Asp Cys Ser Leu Gly 435 440 445 Phe Ser Trp Asn Asp Pro Ala Lys Tyr Leu Arg Pro Arg Pro Gly Met 450 455 460 Pro Cys Gln His His Asn Thr His Gly Leu Asn Asp Arg Gln Ala Phe 465 470 475 480 Asp Asp Phe Val <210> 51 <211> 484 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic polypeptide <400> 51 Ile Glu Ala Leu Pro Tyr Val Phe Leu Gln Asp Gln Phe Glu Leu Arg 1 5 10 15 Leu Leu Lys Gly Glu Gln Gly Asn Asn Ile Leu Asp Gly Ile Met Ser 20 25 30 Arg Trp Glu Lys Val Cys Thr Arg Gln Thr Arg Tyr Ser Tyr Cys Gln 35 40 45 Cys Ala His Val Met Pro His Val Ala Met Asn Ile Cys Asn Trp Tyr 50 55 60 Glu Phe Leu Tyr Arg Ile Ser His Ile Gly Arg Thr His Val Asn Glu 65 70 75 80 His Gln Leu Glu Ala Val Tyr Arg Phe His Gln Val His Cys Arg Phe 85 90 95 Pro Tyr Glu Asn Phe Thr Phe Lys Gly Asn Ile Trp Ile Glu Met Ala 100 105 110 Gly Gln Phe Glu Arg Thr Trp Asn Tyr Pro Leu Ser Leu Ala Met His 115 120 125 Tyr Gln Met Trp Asn Thr Ser Phe Ser Ser Trp His Tyr Lys Glu Ser 130 135 140 His Ile Ala Leu Leu Met Ser Pro Lys Lys Asn His Asn Asn Thr Val 145 150 155 160 Arg Ile Asp Lys Phe Leu Met Tyr Val Trp Tyr Ser Ala Pro Phe Ser 165 170 175 Ala Tyr Pro Leu Tyr Gln Asp Ala Gln Thr Phe Ser Glu Cys Leu Phe 180 185 190 Phe His Cys Leu Lys Val Trp Asn Asn Val Lys Tyr Ala Lys Ser Leu 195 200 205 Lys Tyr Arg Ala Ala Gln Met Ser Lys Trp Pro Asn Lys Tyr Phe Asp 210 215 220 Phe Pro Glu Phe Met Ala Tyr Met Pro Ile Ala Tyr Ser Trp Pro Val 225 230 235 240 Val Pro Met Lys Trp Ile Pro Tyr Arg Ala Leu Cys Ala Asn His Pro 245 250 255 Pro Gly Thr Cys Val His Ile Tyr Asn Asn Tyr Pro Arg Met Leu Gly 260 265 270 Ile Pro Phe Ser Val Met Val Ser Gly Phe Ala Met His Asn Ile Ile 275 280 285 Ser Asp Glu Thr Glu Val Trp Glu Gln Ala Pro His Ile Thr Trp Val 290 295 300 Tyr Met Trp Cys Arg Ala Ala Gln Tyr Ile Ala Cys Met Val Asn Arg 305 310 315 320 Gln Met Thr Ile Val Tyr His Leu Thr Arg Trp Gly Met Lys Tyr Asn 325 330 335 Tyr Ser Tyr Trp Ile Ser Ile Phe Ala His Thr Met Trp Tyr Asn Ile 340 345 350 Trp His Val Gln Trp Asn Lys Gly Met Leu Ser Gln Tyr Glu Leu Lys 355 360 365 Asp Cys Ser Leu Gly Phe Ser Trp Asn Asp Pro Ala Lys Tyr Leu Arg 370 375 380 Lys Tyr Leu Lys Glu Phe Thr Gln Leu Leu Thr Phe Val Asp Cys Tyr 385 390 395 400 Met Trp Ile Thr Phe Cys Gly Pro Asp Ala Asn Asp Asp Thr Pro Asp 405 410 415 Phe Arg Lys Cys Tyr Ile Glu Asp His Ser Phe Arg Phe Ser Gln Thr 420 425 430 Met Asn Asp Ser Glu Glu Thr Asn Thr Asn Tyr Leu His Tyr Cys His 435 440 445 Phe His Trp Thr Trp Ala Gln Gln Thr Thr Val Pro Arg Pro Gly Met 450 455 460 Pro Cys Gln His His Asn Thr His Gly Leu Asn Asp Arg Gln Ala Phe 465 470 475 480 Asp Asp Phe Val <210> 52 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 52 Ser Ser Thr Pro Tyr Leu Tyr Tyr Gly Thr Ser Ser Val Ser Tyr Gln 1 5 10 15 Phe Pro Met Val Pro Gly Gly Asp Arg 20 25 <210> 53 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 53 Glu Met Ala Gly Lys Ile Asp Leu Leu Arg Asp Ser Tyr Ile Phe Gln 1 5 10 15 Leu Phe Trp Arg Glu Ala Ala Glu Pro 20 25 <210> 54 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 54 Ala Leu Lys Gln Arg Thr Trp Gln Ala Leu Ala His Lys Tyr Asn Ser 1 5 10 15 Gln Pro Ser Val Ser Leu Arg Asp Phe 20 25 <210> 55 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 55 Val Ser Ser His Ser Ser Gln Ala Thr Lys Asp Ser Ala Val Gly Leu 1 5 10 15 Lys Tyr Ser Ala Ser Thr Pro Val Arg 20 25 <210> 56 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 56 Lys Glu Ala Ile Asp Ala Trp Ala Pro Tyr Leu Pro Glu Tyr Ile Asp 1 5 10 15 His Val Ile Ser Pro Gly Val Thr Ser 20 25 <210> 57 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 57 Ser Pro Val Ile Thr Ala Pro Pro Ser Ser Pro Val Phe Asp Thr Ser 1 5 10 15 Asp Ile Arg Lys Glu Pro Met Asn Ile 20 25 <210> 58 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 58 Pro Ala Glu Val Ala Glu Gln Tyr Ser Glu Lys Leu Val Tyr Met Pro 1 5 10 15 His Thr Phe Phe Ile Gly Asp His Ala 20 25 <210> 59 <211> 22 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 59 Met Ala Asp Leu Asp Lys Leu Asn Ile His Ser Ile Ile Gln Arg Leu 1 5 10 15 Leu Glu Val Arg Gly Ser 20 <210> 60 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 60 Ala Ala Ala Tyr Asn Glu Lys Ser Gly Arg Ile Thr Leu Leu Ser Leu 1 5 10 15 Leu Phe Gln Lys Val Phe Ala Gln Ile 20 25 <210> 61 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 61 Lys Ile Glu Glu Val Arg Asp Ala Met Glu Asn Glu Ile Arg Thr Gln 1 5 10 15 Leu Arg Arg Gln Ala Ala Ala His Thr 20 25 <210> 62 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 62 Asp Arg Gly His Tyr Val Leu Cys Asp Phe Gly Ser Thr Thr Asn Lys 1 5 10 15 Phe Gln Asn Pro Gln Thr Glu Gly Val 20 25 <210> 63 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 63 Gln Val Asp Asn Arg Lys Ala Glu Ala Glu Glu Ala Ile Lys Arg Leu 1 5 10 15 Ser Tyr Ile Ser Gln Lys Val Ser Asp 20 25 <210> 64 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 64 Cys Leu Ser Asp Ala Gly Val Arg Lys Met Thr Ala Ala Val Arg Val 1 5 10 15 Met Lys Arg Gly Leu Glu Asn Leu Thr 20 25 <210> 65 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 65 Leu Pro Pro Arg Ser Leu Pro Ser Asp Pro Phe Ser Gln Val Pro Ala 1 5 10 15 Ser Pro Gln Ser Gln Ser Ser Ser Gln 20 25 <210> 66 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 66 Glu Leu Val Leu Glu Asp Leu Gln Asp Gly Asp Val Lys Met Gly Gly 1 5 10 15 Ser Phe Arg Gly Ala Phe Ser Asn Ser 20 25 <210> 67 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 67 Val Thr Met Asp Gly Val Arg Glu Glu Asp Leu Ala Ser Phe Ser Leu 1 5 10 15 Arg Lys Arg Trp Glu Ser Glu Pro His 20 25 <210> 68 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 68 Ile Val Gly Val Met Phe Phe Glu Arg Ala Phe Asp Glu Gly Ala Asp 1 5 10 15 Ala Ile Tyr Asp His Ile Asn Glu Gly 20 25 <210> 69 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 69 Thr Val Thr Pro Thr Pro Thr Pro Thr Gly Thr Gln Ser Pro Thr Pro 1 5 10 15 Thr Pro Ile Thr Thr Thr Thr Thr Val 20 25 <210> 70 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 70 Gln Glu Glu Met Pro Pro Arg Pro Cys Gly Gly His Thr Ser Ser Ser 1 5 10 15 Leu Pro Lys Ser His Leu Glu Pro Ser 20 25 <210> 71 <211> 21 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 71 Pro Asn Ile Gln Ala Val Leu Leu Pro Lys Lys Thr Asp Ser His His 1 5 10 15 Lys Ala Lys Gly Lys 20 <210> 72 <211> 18 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 72 Tyr Glu Met Phe Asn Asp Lys Ser Phe Gln Arg Ala Pro Asp Asp Lys 1 5 10 15 Met Phe <210> 73 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (6)..(6) <223> Selenocysteine <220> <221> MOD_RES <222> (7)..(8) <223> Pyrrolysine <400> 73 Phe Glu Gly Arg Lys Xaa Xaa Xaa Ile 1 5 <210> 74 <211> 14 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Leu or Ile <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (7)..(7) <223> Leu or Ile <220> <221> MOD_RES <222> (8)..(8) <223> Pyrrolysine <220> <221> MOD_RES <222> (10)..(10) <223> Leu or Ile <220> <221> MOD_RES <222> (14)..(14) <223> Pyrrolysine <400> 74 Pro Xaa Phe Ile Xaa Glu Xaa Xaa Ile Xaa Gly Glu Ile Xaa 1 5 10 <210> 75 <211> 19 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 75 Ser Ile Asn Phe Glu Lys Leu Ala Ala Tyr Leu Leu Leu Leu Leu Val 1 5 10 15 Val Val Val <210> 76 <211> 19 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 76 Leu Leu Leu Leu Leu Val Val Val Val Ala Ala Tyr Ser Ile Asn Phe 1 5 10 15 Glu Lys Leu SEQUENCE LISTING <110> GRITSTONE ONCOLOGY, INC. <120> REDUCING JUNCTION EPITOPE PRESENTATION FOR NEOANTIGENS <130> 32669-41062/WO <140> PCT/US2018/062294 <141> 2018-11-21 <150> 62/590,045 <151> 2017-11-22 <160> 76 <170> PatentIn version 3.5 <210> 1 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 1 Tyr Val Tyr Val Ala Asp Val Ala Ala Lys 1 5 10 <210> 2 <211> 17 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 2 Tyr Glu Met Phe Asn Asp Lys Ser Gln Arg Ala Pro Asp Asp Lys Met 1 5 10 15 Phe <210> 3 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 3 Tyr Glu Met Phe Asn Asp Lys Ser Phe 1 5 <210> 4 <211> 11 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (3)..(3) <223> Pyrrolysine <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <400> 4 His Arg Xaa Glu Ile Phe Ser His Asp Phe Xaa 1 5 10 <210> 5 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Leu or Ile <220> <221> MOD_RES <222> (5)..(5) <223> Leu or Ile <220> <221> MOD_RES <222> (7)..(7) <223> Pyrrolysine <400> 5 Phe Xaa Ile Glu Xaa Phe Xaa Glu Ser Ser 1 5 10 <210> 6 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Pyrrolysine <400> 6 Asn Glu Ile Xaa Arg Glu Ile Arg Glu Ile 1 5 10 <210> 7 <211> 27 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (1)..(1) <223> Leu or Ile <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <220> <221> MOD_RES <222> (15)..(15) <223> Selenocysteine <220> <221> MOD_RES <222> (21)..(21) <223> Leu or Ile <220> <221> MOD_RES <222> (27)..(27) <223> Leu or Ile <400> 7 Xaa Phe Lys Ser Ile Phe Glu Met Met Ser Xaa Asp Ser Ser Xaa Ile 1 5 10 15 Phe Leu Lys Ser Xaa Phe Ile Glu Ile Phe Xaa 20 25 <210> 8 <211> 13 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (11)..(11) <223> Pyrrolysine <400> 8 Lys Asn Phe Leu Glu Asn Phe Ile Glu Ser Xaa Phe Ile 1 5 10 <210> 9 <211> 15 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Pyrrolysine <220> <221> MOD_RES <222> (14)..(14) <223> Leu or Ile <400> 9 Phe Xaa Glu Ile Phe Asn Asp Lys Ser Leu Asp Lys Phe Xaa Ile 1 5 10 15 <210> 10 <211> 16 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (16)..(16) <223> Leu or Ile <400> 10 Gln Cys Glu Ile Xaa Trp Ala Arg Glu Phe Leu Lys Glu Ile Gly Xaa 1 5 10 15 <210> 11 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Selenocysteine <400> 11 Phe Ile Glu Xaa His Phe Trp Ile 1 5 <210> 12 <211> 12 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (7)..(7) <223> Leu or Ile <220> <221> MOD_RES <222> (10)..(10) <223> Selenocysteine <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <400> 12 Phe Glu Trp Arg His Arg Xaa Thr Arg Xaa Xaa Arg 1 5 10 <210> 13 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Leu or Ile <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (8)..(8) <223> Leu or Ile <400> 13 Gln Ile Glu Xaa Xaa Glu Ile Xaa Glu 1 5 <210> 14 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <400> 14 Gln Cys Glu Ile Xaa Trp Ala Arg Glu 1 5 <210> 15 <211> 14 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Leu or Ile <220> <221> MOD_RES <222> (9)..(9) <223> Pyrrolysine <220> <221> MOD_RES <222> (11)..(11) <223> Leu or Ile <400> 15 Phe Xaa Glu Leu Phe Ile Ser Asx Xaa Ser Xaa Phe Ile Glu 1 5 10 <210> 16 <211> 11 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (9)..(9) <223> Leu or Ile <400> 16 Ile Glu Phe Arg Xaa Glu Ile Plu Xaa Glu Phe 1 5 10 <210> 17 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (9)..(9) <223> Leu or Ile <400> 17 Ile Glu Phe Arg Xaa Glu Ile Phe Xaa 1 5 <210> 18 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (4)..(4) <223> Pyrrolysine <220> <221> MOD_RES <222> (8)..(8) <223> Leu or Ile <400> 18 Glu Phe Arg Xaa Glu Ile Phe Xaa Glu 1 5 <210> 19 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (3)..(3) <223> Pyrrolysine <220> <221> MOD_RES <222> (7)..(7) <223> Leu or Ile <400> 19 Phe Arg Xaa Glu Ile Phe Xaa Glu Phe 1 5 <210> 20 <211> 7 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 20 Ser Ile Asn Phe Glu Lys Leu 1 5 <210> 21 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 21 Leu Leu Leu Leu Leu Val Val Val Val 1 5 <210> 22 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 22 Glu Lys Leu Ala Ala Tyr Leu Leu Leu 1 5 <210> 23 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 23 Lys Leu Ala Ala Tyr Leu Leu Leu Leu Leu 1 5 10 <210> 24 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 24 Phe Glu Lys Leu Ala Ala Tyr Leu 1 5 <210> 25 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 25 Ala Ala Tyr Leu Leu Leu Leu Leu 1 5 <210> 26 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 26 Tyr Leu Leu Leu Leu Leu Val Val Val 1 5 <210> 27 <211> 10 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 27 Val Val Val Val Ala Ala Tyr Ser Ile Asn 1 5 10 <210> 28 <211> 7 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 28 Val Val Val Val Ala Ala Tyr 1 5 <210> 29 <211> 8 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 29 Ala Tyr Ser Ile Asn Phe Glu Lys 1 5 <210> 30 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 30 Tyr Asn Tyr Ser Tyr Trp Ile Ser Ile Phe Ala His Thr Met Trp Tyr 1 5 10 15 Asn Ile Trp His Val Gln Trp Asn Lys 20 25 <210> 31 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 31 Ile Glu Ala Leu Pro Tyr Val Phe Leu Gln Asp Gln Phe Glu Leu Arg 1 5 10 15 Leu Leu Lys Gly Glu Gln Gly Asn Asn 20 25 <210> 32 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 32 Asp Ser Glu Glu Thr Asn Thr Asn Tyr Leu His Tyr Cys His Phe His 1 5 10 15 Trp Thr Trp Ala Gln Gln Thr Thr Val 20 25 <210> 33 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 33 Gly Met Leu Ser Gln Tyr Glu Leu Lys Asp Cys Ser Leu Gly Phe Ser 1 5 10 15 Trp Asn Asp Pro Ala Lys Tyr Leu Arg 20 25 <210> 34 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 34 Val Arg Ile Asp Lys Phe Leu Met Tyr Val Trp Tyr Ser Ala Pro Phe 1 5 10 15 Ser Ala Tyr Pro Leu Tyr Gln Asp Ala 20 25 <210> 35 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 35 Cys Val His Ile Tyr Asn Asn Tyr Pro Arg Met Leu Gly Ile Pro Phe 1 5 10 15 Ser Val Met Val Ser Gly Phe Ala Met 20 25 <210> 36 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 36 Phe Thr Phe Lys Gly Asn Ile Trp Ile Glu Met Ala Gly Gln Phe Glu 1 5 10 15 Arg Thr Trp Asn Tyr Pro Leu Ser Leu 20 25 <210> 37 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 37 Ala Asn Asp Asp Thr Pro Asp Phe Arg Lys Cys Tyr Ile Glu Asp His 1 5 10 15 Ser Phe Arg Phe Ser Gln Thr Met Asn 20 25 <210> 38 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 38 Ala Ala Gln Tyr Ile Ala Cys Met Val Asn Arg Gln Met Thr Ile Val 1 5 10 15 Tyr His Leu Thr Arg Trp Gly Met Lys 20 25 <210> 39 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 39 Lys Tyr Leu Lys Glu Phe Thr Gln Leu Leu Thr Phe Val Asp Cys Tyr 1 5 10 15 Met Trp Ile Thr Phe Cys Gly Pro Asp 20 25 <210> 40 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 40 Ala Met His Tyr Arg Thr Asp Ile His Gly Tyr Trp Ile Glu Tyr Arg 1 5 10 15 Gln Val Asp Asn Gln Met Trp Asn Thr 20 25 <210> 41 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 41 Thr His Val Asn Glu His Gln Leu Glu Ala Val Tyr Arg Phe His Gln 1 5 10 15 Val His Cys Arg Phe Pro Tyr Glu Asn 20 25 <210> 42 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 42 Gln Thr Phe Ser Glu Cys Leu Phe Phe His Cys Leu Lys Val Trp Asn 1 5 10 15 Asn Val Lys Tyr Ala Lys Ser Leu Lys 20 25 <210> 43 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 43 Ser Phe Ser Ser Trp His Tyr Lys Glu Ser His Ile Ala Leu Leu Met 1 5 10 15 Ser Pro Lys Lys Asn His Asn Asn Thr 20 25 <210> 44 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 44 Ile Leu Asp Gly Ile Met Ser Arg Trp Glu Lys Val Cys Thr Arg Gln 1 5 10 15 Thr Arg Tyr Ser Tyr Cys Gln Cys Ala 20 25 <210> 45 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 45 Tyr Arg Ala Ala Gln Met Ser Lys Trp Pro Asn Lys Tyr Phe Asp Phe 1 5 10 15 Pro Glu Phe Met Ala Tyr Met Pro Ile 20 25 <210> 46 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 46 Pro Arg Pro Gly Met Pro Cys Gln His His Asn Thr His Gly Leu Asn 1 5 10 15 Asp Arg Gln Ala Phe Asp Asp Phe Val 20 25 <210> 47 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 47 His Asn Ile Ile Ser Asp Glu Thr Glu Val Trp Glu Gln Ala Pro His 1 5 10 15 Ile Thr Trp Val Tyr Met Trp Cys Arg 20 25 <210> 48 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 48 Ala Tyr Ser Trp Pro Val Val Pro Met Lys Trp Ile Pro Tyr Arg Ala 1 5 10 15 Leu Cys Ala Asn His Pro Pro Gly Thr 20 25 <210> 49 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 49 His Val Met Pro His Val Ala Met Asn Ile Cys Asn Trp Tyr Glu Phe 1 5 10 15 Leu Tyr Arg Ile Ser His Ile Gly Arg 20 25 <210> 50 <211> 484 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic polypeptide <400> 50 Thr His Val Asn Glu His Gln Leu Glu Ala Val Tyr Arg Phe His Gln 1 5 10 15 Val His Cys Arg Phe Pro Tyr Glu Asn Ala Met His Tyr Gln Met Trp 20 25 30 Asn Thr Tyr Arg Ala Ala Gln Met Ser Lys Trp Pro Asn Lys Tyr Phe 35 40 45 Asp Phe Pro Glu Phe Met Ala Tyr Met Pro Ile Cys Val His Ile Tyr 50 55 60 Asn Asn Tyr Pro Arg Met Leu Gly Ile Pro Phe Ser Val Met Val Ser 65 70 75 80 Gly Phe Ala Met Ala Tyr Ser Trp Pro Val Val Pro Met Lys Trp Ile 85 90 95 Pro Tyr Arg Ala Leu Cys Ala Asn His Pro Pro Gly Thr Ala Asn Asp 100 105 110 Asp Thr Pro Asp Phe Arg Lys Cys Tyr Ile Glu Asp His Ser Phe Arg 115 120 125 Phe Ser Gln Thr Met Asn Ile Glu Ala Leu Pro Tyr Val Phe Leu Gln 130 135 140 Asp Gln Phe Glu Leu Arg Leu Leu Lys Gly Glu Gln Gly Asn Asn Asp 145 150 155 160 Ser Glu Glu Thr Asn Thr Asn Tyr Leu His Tyr Cys His Phe His Trp 165 170 175 Thr Trp Ala Gln Gln Thr Thr Val Ile Leu Asp Gly Ile Met Ser Arg 180 185 190 Trp Glu Lys Val Cys Thr Arg Gln Thr Arg Tyr Ser Tyr Cys Gln Cys 195 200 205 Ala Phe Thr Phe Lys Gly Asn Ile Trp Ile Glu Met Ala Gly Gln Phe 210 215 220 Glu Arg Thr Trp Asn Tyr Pro Leu Ser Leu Ser Phe Ser Ser Trp His 225 230 235 240 Tyr Lys Glu Ser His Ile Ala Leu Leu Met Ser Pro Lys Lys Asn His 245 250 255 Asn Asn Thr Gln Thr Phe Ser Glu Cys Leu Phe Phe His Cys Leu Lys 260 265 270 Val Trp Asn Asn Val Lys Tyr Ala Lys Ser Leu Lys His Val Met Pro 275 280 285 His Val Ala Met Asn Ile Cys Asn Trp Tyr Glu Phe Leu Tyr Arg Ile 290 295 300 Ser His Ile Gly Arg His Asn Ile Ile Ser Asp Glu Thr Glu Val Trp 305 310 315 320 Glu Gln Ala Pro His Ile Thr Trp Val Tyr Met Trp Cys Arg Val Arg 325 330 335 Ile Asp Lys Phe Leu Met Tyr Val Trp Tyr Ser Ala Pro Phe Ser Ala 340 345 350 Tyr Pro Leu Tyr Gln Asp Ala Lys Tyr Leu Lys Glu Phe Thr Gln Leu 355 360 365 Leu Thr Phe Val Asp Cys Tyr Met Trp Ile Thr Phe Cys Gly Pro Asp 370 375 380 Ala Ala Gln Tyr Ile Ala Cys Met Val Asn Arg Gln Met Thr Ile Val 385 390 395 400 Tyr His Leu Thr Arg Trp Gly Met Lys Tyr Asn Tyr Ser Tyr Trp Ile 405 410 415 Ser Ile Phe Ala His Thr Met Trp Tyr Asn Ile Trp His Val Gln Trp 420 425 430 Asn Lys Gly Met Leu Ser Gln Tyr Glu Leu Lys Asp Cys Ser Leu Gly 435 440 445 Phe Ser Trp Asn Asp Pro Ala Lys Tyr Leu Arg Pro Arg Pro Gly Met 450 455 460 Pro Cys Gln His His Asn Thr His Gly Leu Asn Asp Arg Gln Ala Phe 465 470 475 480 Asp Asp Phe Val <210> 51 <211> 484 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic polypeptide <400> 51 Ile Glu Ala Leu Pro Tyr Val Phe Leu Gln Asp Gln Phe Glu Leu Arg 1 5 10 15 Leu Leu Lys Gly Glu Gln Gly Asn Asn Ile Leu Asp Gly Ile Met Ser 20 25 30 Arg Trp Glu Lys Val Cys Thr Arg Gln Thr Arg Tyr Ser Tyr Cys Gln 35 40 45 Cys Ala His Val Met Pro His Val Ala Met Asn Ile Cys Asn Trp Tyr 50 55 60 Glu Phe Leu Tyr Arg Ile Ser His Ile Gly Arg Thr His Val Asn Glu 65 70 75 80 His Gln Leu Glu Ala Val Tyr Arg Phe His Gln Val His Cys Arg Phe 85 90 95 Pro Tyr Glu Asn Phe Thr Phe Lys Gly Asn Ile Trp Ile Glu Met Ala 100 105 110 Gly Gln Phe Glu Arg Thr Trp Asn Tyr Pro Leu Ser Leu Ala Met His 115 120 125 Tyr Gln Met Trp Asn Thr Ser Phe Ser Ser Trp His Tyr Lys Glu Ser 130 135 140 His Ile Ala Leu Leu Met Ser Pro Lys Lys Asn His Asn Asn Thr Val 145 150 155 160 Arg Ile Asp Lys Phe Leu Met Tyr Val Trp Tyr Ser Ala Pro Phe Ser 165 170 175 Ala Tyr Pro Leu Tyr Gln Asp Ala Gln Thr Phe Ser Glu Cys Leu Phe 180 185 190 Phe His Cys Leu Lys Val Trp Asn Asn Val Lys Tyr Ala Lys Ser Leu 195 200 205 Lys Tyr Arg Ala Ala Gln Met Ser Lys Trp Pro Asn Lys Tyr Phe Asp 210 215 220 Phe Pro Glu Phe Met Ala Tyr Met Pro Ile Ala Tyr Ser Trp Pro Val 225 230 235 240 Val Pro Met Lys Trp Ile Pro Tyr Arg Ala Leu Cys Ala Asn His Pro 245 250 255 Pro Gly Thr Cys Val His Ile Tyr Asn Asn Tyr Pro Arg Met Leu Gly 260 265 270 Ile Pro Phe Ser Val Met Val Ser Gly Phe Ala Met His Asn Ile Ile 275 280 285 Ser Asp Glu Thr Glu Val Trp Glu Gln Ala Pro His Ile Thr Trp Val 290 295 300 Tyr Met Trp Cys Arg Ala Ala Gln Tyr Ile Ala Cys Met Val Asn Arg 305 310 315 320 Gln Met Thr Ile Val Tyr His Leu Thr Arg Trp Gly Met Lys Tyr Asn 325 330 335 Tyr Ser Tyr Trp Ile Ser Ile Phe Ala His Thr Met Trp Tyr Asn Ile 340 345 350 Trp His Val Gln Trp Asn Lys Gly Met Leu Ser Gln Tyr Glu Leu Lys 355 360 365 Asp Cys Ser Leu Gly Phe Ser Trp Asn Asp Pro Ala Lys Tyr Leu Arg 370 375 380 Lys Tyr Leu Lys Glu Phe Thr Gln Leu Leu Thr Phe Val Asp Cys Tyr 385 390 395 400 Met Trp Ile Thr Phe Cys Gly Pro Asp Ala Asn Asp Asp Thr Pro Asp 405 410 415 Phe Arg Lys Cys Tyr Ile Glu Asp His Ser Phe Arg Phe Ser Gln Thr 420 425 430 Met Asn Asp Ser Glu Glu Thr Asn Thr Asn Tyr Leu His Tyr Cys His 435 440 445 Phe His Trp Thr Trp Ala Gln Gln Thr Thr Val Pro Arg Pro Gly Met 450 455 460 Pro Cys Gln His His Asn Thr His Gly Leu Asn Asp Arg Gln Ala Phe 465 470 475 480 Asp Asp Phe Val <210> 52 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 52 Ser Ser Thr Pro Tyr Leu Tyr Tyr Gly Thr Ser Ser Val Ser Tyr Gln 1 5 10 15 Phe Pro Met Val Pro Gly Gly Asp Arg 20 25 <210> 53 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 53 Glu Met Ala Gly Lys Ile Asp Leu Leu Arg Asp Ser Tyr Ile Phe Gln 1 5 10 15 Leu Phe Trp Arg Glu Ala Ala Glu Pro 20 25 <210> 54 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 54 Ala Leu Lys Gln Arg Thr Trp Gln Ala Leu Ala His Lys Tyr Asn Ser 1 5 10 15 Gln Pro Ser Val Ser Leu Arg Asp Phe 20 25 <210> 55 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 55 Val Ser Ser His Ser Ser Gln Ala Thr Lys Asp Ser Ala Val Gly Leu 1 5 10 15 Lys Tyr Ser Ala Ser Thr Pro Val Arg 20 25 <210> 56 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 56 Lys Glu Ala Ile Asp Ala Trp Ala Pro Tyr Leu Pro Glu Tyr Ile Asp 1 5 10 15 His Val Ile Ser Pro Gly Val Thr Ser 20 25 <210> 57 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 57 Ser Pro Val Ile Thr Ala Pro Pro Ser Ser Pro Val Phe Asp Thr Ser 1 5 10 15 Asp Ile Arg Lys Glu Pro Met Asn Ile 20 25 <210> 58 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 58 Pro Ala Glu Val Ala Glu Gln Tyr Ser Glu Lys Leu Val Tyr Met Pro 1 5 10 15 His Thr Phe Phe Ile Gly Asp His Ala 20 25 <210> 59 <211> 22 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 59 Met Ala Asp Leu Asp Lys Leu Asn Ile His Ser Ile Ile Gln Arg Leu 1 5 10 15 Leu Glu Val Arg Gly Ser 20 <210> 60 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 60 Ala Ala Ala Tyr Asn Glu Lys Ser Gly Arg Ile Thr Leu Leu Ser Leu 1 5 10 15 Leu Phe Gln Lys Val Phe Ala Gln Ile 20 25 <210> 61 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 61 Lys Ile Glu Glu Val Arg Asp Ala Met Glu Asn Glu Ile Arg Thr Gln 1 5 10 15 Leu Arg Arg Gln Ala Ala Ala His Thr 20 25 <210> 62 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 62 Asp Arg Gly His Tyr Val Leu Cys Asp Phe Gly Ser Thr Thr Asn Lys 1 5 10 15 Phe Gln Asn Pro Gln Thr Glu Gly Val 20 25 <210> 63 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 63 Gln Val Asp Asn Arg Lys Ala Glu Ala Glu Glu Ala Ile Lys Arg Leu 1 5 10 15 Ser Tyr Ile Ser Gln Lys Val Ser Asp 20 25 <210> 64 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 64 Cys Leu Ser Asp Ala Gly Val Arg Lys Met Thr Ala Ala Val Arg Val 1 5 10 15 Met Lys Arg Gly Leu Glu Asn Leu Thr 20 25 <210> 65 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 65 Leu Pro Pro Arg Ser Leu Pro Ser Asp Pro Phe Ser Gln Val Pro Ala 1 5 10 15 Ser Pro Gln Ser Gln Ser Ser Ser Gln 20 25 <210> 66 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 66 Glu Leu Val Leu Glu Asp Leu Gln Asp Gly Asp Val Lys Met Gly Gly 1 5 10 15 Ser Phe Arg Gly Ala Phe Ser Asn Ser 20 25 <210> 67 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 67 Val Thr Met Asp Gly Val Arg Glu Glu Asp Leu Ala Ser Phe Ser Leu 1 5 10 15 Arg Lys Arg Trp Glu Ser Glu Pro His 20 25 <210> 68 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 68 Ile Val Gly Val Met Phe Phe Glu Arg Ala Phe Asp Glu Gly Ala Asp 1 5 10 15 Ala Ile Tyr Asp His Ile Asn Glu Gly 20 25 <210> 69 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 69 Thr Val Thr Pro Thr Pro Thr Pro Thr Gly Thr Gln Ser Pro Thr Pro 1 5 10 15 Thr Pro Ile Thr Thr Thr Thr Thr Val 20 25 <210> 70 <211> 25 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 70 Gln Glu Glu Met Pro Pro Arg Pro Cys Gly Gly His Thr Ser Ser Ser 1 5 10 15 Leu Pro Lys Ser His Leu Glu Pro Ser 20 25 <210> 71 <211> 21 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 71 Pro Asn Ile Gln Ala Val Leu Leu Pro Lys Lys Thr Asp Ser His His 1 5 10 15 Lys Ala Lys Gly Lys 20 <210> 72 <211> 18 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 72 Tyr Glu Met Phe Asn Asp Lys Ser Phe Gln Arg Ala Pro Asp Asp Lys 1 5 10 15 Met Phe <210> 73 <211> 9 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (6)..(6) <223> Selenocysteine <220> <221> MOD_RES <222> (7)..(8) <223> Pyrrolysine <400> 73 Phe Glu Gly Arg Lys Xaa Xaa Xaa Ile 1 5 <210> 74 <211> 14 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <220> <221> MOD_RES <222> (2)..(2) <223> Leu or Ile <220> <221> MOD_RES <222> (5)..(5) <223> Pyrrolysine <220> <221> MOD_RES <222> (7)..(7) <223> Leu or Ile <220> <221> MOD_RES <222> (8)..(8) <223> Pyrrolysine <220> <221> MOD_RES <222> (10)..(10) <223> Leu or Ile <220> <221> MOD_RES <222> (14)..(14) <223> Pyrrolysine <400> 74 Pro Xaa Phe Ile Xaa Glu Xaa Xaa Ile Xaa Gly Glu Ile Xaa 1 5 10 <210> 75 <211> 19 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 75 Ser Ile Asn Phe Glu Lys Leu Ala Ala Tyr Leu Leu Leu Leu Leu Val 1 5 10 15 Val Val Val <210> 76 <211> 19 <212> PRT <213> Artificial Sequence <220> <223> Description of Artificial Sequence: Synthetic peptide <400> 76 Leu Leu Leu Leu Leu Val Val Val Val Ala Ala Tyr Ser Ile Asn Phe 1 5 10 15 Glu Lys Leu

Claims

A method of identifying a cassette sequence for a neoantigen vaccine comprising the following steps:
For a patient, obtaining at least one of exome, transcript, or whole genomic tumor nucleotide sequencing data from a subject's tumor cells and normal cells, wherein the nucleotide sequencing data is normal with nucleotide sequencing data from tumor cells. It is used to obtain data representing the peptide sequence of each of the identified neoantigen sets by comparing nucleotide sequencing data from the cells, wherein the peptide sequence of each neoantigen is identical to the corresponding wild-type, parent peptide sequence identified from the subject's normal cells. Including at least one distinction to distinguish and comprising information regarding a set of amino acid positions in the peptide sequence and a plurality of amino acids constituting the peptide sequence;
Using a computer processor to enter the peptide sequence of the neoantigen into a machine-learning presentation model to generate a set of numerical presentation possibilities for the set of neoantigens, wherein each presentation probability in the set corresponds to the corresponding angiogenesis. Steps in which the antigen represents the possibility of being presented by one or more MHC alleles on the surface of a subject's tumor cells, the machine-learning presentation model comprising:
A plurality of parameters identified based on at least a training data set comprising:
For each sample in the sample set, a label obtained by mass spectrometry measuring the presence of a peptide bound to at least one MHC allele of the MHC allele set identified as present in the sample;
A training peptide sequence comprising, for each sample, a plurality of amino acids constituting the training peptide sequence and information on a set of positions of amino acids in the training peptide sequence; And
A function representing the relationship between the peptide sequence of the neoantigen received as input and the likelihood of presentation generated as output;
Identifying, for a subject, a therapeutic subset of the neoantigen from a set of neoantigens, a therapeutic subset of the neoantigens corresponding to a predetermined number of neoantigens having a probability of presenting above a predetermined threshold; And
Identifying a cassette sequence comprising a sequence of linked therapeutic epitopes each comprising a peptide sequence of a corresponding neoantigen in a therapeutic subset of the neoantigen, wherein the cassette sequence is between one or more adjacent pairs of therapeutic epitopes Identified based on the presentation of one or more conjugation epitopes across the corresponding conjugation of.

The method of claim 1, wherein the presentation of the one or more conjugation epitopes is determined based on the likelihood of presentation generated by entering the sequence of one or more conjugation epitopes into a machine-learning presentation model.

The method of claim 1, wherein the presentation of the one or more conjugated epitopes is determined based on prediction of binding affinity between the one or more conjugated epitopes and one or more MHC alleles of the subject.

The method of claim 1, wherein the presentation of the one or more conjugated epitopes is determined based on prediction of binding stability of the one or more conjugated epitopes.

The method of claim 1, wherein the one or more conjugation epitopes comprises a conjugation epitope that overlaps a sequence of a first therapeutic epitope and a sequence of a second therapeutic epitope linked after the first therapeutic epitope.

The method of claim 1, wherein the linker sequence is located between a first therapeutic epitope and a second therapeutic epitope linked after the first therapeutic epitope, the one or more conjugation epitopes comprising a conjugation epitope overlapping the linker sequence.

The method of claim 1, wherein the method comprises the following steps:
For each sequence pair of therapeutic epitopes, determining a set of junction epitopes spanning the junction between the sequence pairs of therapeutic epitopes; And
For each sequence pair of therapeutic epitopes, determining a distance metric representing the presentation of a set of conjugation epitopes for sequence pairs on one or more MHC alleles of the subject.

The method of claim 1, wherein the method comprises the following steps:
Generating a set of candidate cassette sequences corresponding to different sequences of the therapeutic epitope;
For each candidate cassette sequence, determining a presentation score for the candidate cassette sequence based on the distance metric for each sequence pair of therapeutic epitopes in the candidate cassette sequence; And
Selecting a candidate cassette sequence associated with a presentation score that is below a predetermined threshold as the cassette sequence for the neoantigen vaccine.

9. The method of claim 8, wherein the set of candidate cassette sequences is randomly generated.

The method of claim 7, wherein the method of identifying the cassette sequence further comprises the following steps:
Steps to solve for the value of x _km in the following optimization problem:

(Where v corresponds to a predetermined number of neoantigens, k corresponds to a therapeutic epitope and m is The therapeutic epitopes correspond to adjacent therapeutic epitopes linked to, P is the pathway matrix given below:

Wherein D is a v x v matrix in which element D ( k,m ) represents the distance metric of the ordered pair k,m of therapeutic epitopes); And
selecting a cassette sequence based on the resolution value for x _km .

The method of claim 1, further comprising preparing or preparing a tumor vaccine comprising the cassette sequence.

A method of identifying a cassette sequence for a neoantigen vaccine comprising the following steps:
For a patient, obtaining at least one of exome, transcript, or whole genomic tumor nucleotide sequencing data from a subject's tumor cells and normal cells, wherein the nucleotide sequencing data is normal cells and nucleotide sequencing data from tumor cells. The nucleotide sequencing data from are used to obtain data representing the peptide sequence of each of the identified neoantigen sets, and the peptide sequence of each neoantigen is distinguished from the corresponding wild-type, parent peptide sequence identified from the subject's normal cells. Comprising at least one modification and comprising information regarding a set of amino acid positions in the peptide sequence and a plurality of amino acids constituting the peptide sequence;
Identifying, for the subject, a therapeutic subset of the neoantigen from the set of neoantigens; And
Identifying a cassette sequence comprising a sequence of linked therapeutic epitopes each comprising a peptide sequence of a corresponding neoantigen in a therapeutic subset of the neoantigen, wherein the cassette sequence is between one or more adjacent pairs of therapeutic epitopes Identified based on the presentation of one or more conjugation epitopes across the corresponding conjugation of.

13. The method of claim 12, wherein the presentation of the one or more conjugation epitopes is determined based on the likelihood of presentation generated by entering the sequence of one or more conjugation epitopes into a machine-learning presentation model, wherein the likelihood of presentation is one or more conjugation epitopes of the patient. A method indicative of the possibilities presented by one or more MHC alleles on the surface of a tumor cell, the set of presentation possibilities being identified based at least on the received mass spectrometry data.

The method of claim 12, wherein the presentation of the one or more conjugated epitopes is determined based on prediction of binding affinity between the one or more conjugated epitopes and one or more MHC alleles of the subject.

The method of claim 12, wherein the presentation of the one or more conjugated epitopes is determined based on prediction of binding stability of the one or more conjugated epitopes.

The method of claim 12, wherein the one or more conjugation epitopes comprises a conjugation epitope that overlaps a sequence of a first therapeutic epitope and a sequence of a second therapeutic epitope linked after the first therapeutic epitope.

The method of claim 12, wherein the linker sequence is located between a first therapeutic epitope and a second therapeutic epitope linked after the first therapeutic epitope, the one or more conjugation epitopes comprising a conjugation epitope overlapping the linker sequence.

13. The method of claim 12, wherein the cassette sequence comprises the following steps:
For each sequence pair of therapeutic epitopes, determining a set of junction epitopes spanning the junction between the sequence pairs of therapeutic epitopes; And
For each sequence pair of therapeutic epitopes, determining a distance metric representing the presentation of a set of conjugation epitopes for sequence pairs on one or more MHC alleles of the subject.

13. The method of claim 12, wherein the cassette sequence comprises the following steps:
Generating a set of candidate cassette sequences corresponding to different sequences of therapeutic epitopes;
For a candidate cassette sequence, determining a presentation score for the candidate cassette sequence based on the distance metric for each sequence pair of therapeutic epitopes in the candidate cassette sequence; And
Selecting a candidate cassette sequence associated with a presentation score below a predetermined threshold as the cassette sequence for the neoantigen vaccine.

20. The method of claim 19, wherein the set of candidate cassette sequences is randomly generated.

The method of claim 18, wherein the method of identifying the cassette sequence further comprises the following steps:
Steps to solve for the value of x _km in the following optimization problem:

The method of claim 12, further comprising preparing or preparing a tumor vaccine comprising the cassette sequence.

A method of identifying a cassette sequence for a neoantigen vaccine comprising the following steps:
Obtaining a peptide sequence for a treatment subset of a shared antigen or a treatment subset of a shared neoantigen to treat a plurality of subjects, said treatment subset corresponding to a predetermined number of peptide sequences having a potential for presentation above a predetermined threshold To do; And
Identifying a cassette sequence comprising a sequence of linked therapeutic epitopes each comprising a corresponding peptide sequence in a therapeutic subset of a shared antigen or a therapeutic subset of a shared neoantigen, wherein identifying the cassette sequence comprises the following steps: Includes:
For each sequence pair of therapeutic epitopes, determining a set of junction epitopes spanning the junction between the sequence pairs of therapeutic epitopes; And
For each sequence pair of therapeutic epitopes, determining a distance metric representing the presentation of the set of conjugation epitopes for the sequence pair, the distance metric being the corresponding sub-, indicating the likelihood of presentation of the set of conjugation epitopes on the MHC allele. Determined as a combination of a set of weights each representing a prevalence of the corresponding MHC allele with a distance metric.

A tumor vaccine comprising a cassette sequence comprising a sequence of linked therapeutic epitopes, wherein the cassette sequence is identified by performing the following steps:
For a patient, obtaining at least one of exome, transcript, or whole genomic tumor nucleotide sequencing data from a subject's tumor cells and normal cells, wherein the nucleotide sequencing data is nucleotide sequencing data from tumor cells and normal By comparing the nucleotide sequencing data from the cells, the set of identified neoantigens is used to obtain data representing each peptide sequence, and the peptide sequence of each neoantigen is matched with the corresponding wild-type, parent peptide sequence identified from the subject's normal cells. Including at least one distinction to distinguish and comprising information regarding a set of amino acid positions in the peptide sequence and a plurality of amino acids constituting the peptide sequence;
Identifying, for the subject, a therapeutic subset of the neoantigen from the set of neoantigens; And
Identifying a cassette sequence comprising a sequence of linked therapeutic epitopes each comprising a peptide sequence of a corresponding neoantigen in a therapeutic subset of the neoantigen, wherein the cassette sequence is between one or more adjacent pairs of therapeutic epitopes Identified based on the presentation of one or more conjugation epitopes across the corresponding conjugation of.

25. The method of claim 24, wherein the presentation of the one or more conjugation epitopes is determined based on the likelihood of presentation generated by entering the sequence of one or more conjugation epitopes into a machine-learning presentation model, wherein the likelihood of presentation is one or more conjugation epitopes of the patient. A tumor vaccine representing the potential presented by one or more MHC alleles on the surface of a tumor cell, the set of presentation possibilities identified at least based on the received mass spectrometric data.

25. The tumor vaccine of claim 24, wherein the presentation of the one or more conjugated epitopes is determined based on prediction of binding affinity between the one or more conjugated epitopes and one or more MHC alleles of the subject.

25. The tumor vaccine of claim 24, wherein the presentation of the one or more conjugated epitopes is determined based on prediction of binding stability of the one or more conjugated epitopes.

25. The tumor vaccine of claim 24, wherein the one or more conjugated epitopes comprises a conjugated epitope overlapping the sequence of the first therapeutic epitope and the sequence of the second therapeutic epitope linked following the first therapeutic epitope.

25. The tumor of claim 24, wherein the linker sequence is located between a first therapeutic epitope and a second therapeutic epitope linked after the first therapeutic epitope, the one or more conjugation epitopes comprising a conjugation epitope overlapping the linker sequence. vaccine.

25. The tumor vaccine of claim 24, which identifies a cassette sequence comprising the following steps:
For each sequence pair of therapeutic epitopes, determining a set of junction epitopes spanning the junction between the sequence pairs of therapeutic epitopes; And
For each sequence pair of therapeutic epitopes, determining a distance metric representing the presentation of a set of conjugation epitopes for sequence pairs on one or more MHC alleles of the subject.

25. The tumor vaccine of claim 24, which identifies a cassette sequence comprising the following steps:
Generating a set of candidate cassette sequences corresponding to different sequences of therapeutic epitopes;
For each candidate cassette sequence, determining a presentation score for the candidate cassette sequence based on the distance metric for each sequence pair of therapeutic epitopes in the candidate cassette sequence; And
Selecting a candidate cassette sequence associated with a presentation score below a predetermined threshold as the cassette sequence for the neoantigen vaccine.

32. The tumor vaccine of claim 31, wherein the set of candidate cassette sequences is randomly generated.

The tumor vaccine of claim 30, wherein the step of identifying the cassette sequence further comprises the following steps:
Steps to solve for the value of x _km in the following optimization problem:

25. The tumor vaccine of claim 24, further comprising preparing or preparing a tumor vaccine comprising said cassette sequence.

A tumor vaccine comprising a cassette sequence comprising a sequence of linked therapeutic epitopes, wherein the cassette sequence is aligned to each include a peptide sequence of a corresponding neoantigen in a therapeutic subset of a neoantigen, and the sequence of the therapeutic epitope is treated A tumor vaccine that is identified based on the presentation of one or more conjugation epitopes spanning the corresponding junction between one or more adjacent pairs of enemy epitopes, wherein the conjugation epitope of the cassette sequence has an HLA binding affinity below a threshold binding affinity.

The tumor vaccine of claim 35, wherein the threshold binding affinity is 1000 nM or greater.

A tumor vaccine comprising a cassette sequence comprising a sequence of linked therapeutic epitopes, wherein the cassette sequence is aligned to each include a peptide sequence of a corresponding neoantigen in a therapeutic subset of a neoantigen, and the sequence of the therapeutic epitope is treated A tumor vaccine that is identified based on presentation of one or more conjugation epitopes across corresponding junctions between one or more adjacent pairs of enemy epitopes, wherein at least a threshold percentage of the conjugation epitopes of the cassette sequence has a potential for presentation below the probability of presenting a threshold.

The tumor vaccine of claim 37, wherein the threshold percentage is 50%.