CN113316818A

CN113316818A - Method for identifying neoantigens

Info

Publication number: CN113316818A
Application number: CN202080008090.2A
Authority: CN
Inventors: 姜宁; 张文宏; 田晔; 仇超
Original assignee: Mark Zhun Biotechnology Co ltd
Current assignee: Trace Biomedical Technology Xiamen Co Ltd
Priority date: 2019-03-15
Filing date: 2020-03-13
Publication date: 2021-08-27
Anticipated expiration: 2040-03-13
Also published as: WO2020187143A1; CN113316818B; CN111696628A

Abstract

The present invention relates to the field of tumor immunotherapy. In particular, the invention provides methods and devices for identifying tumor-specific neoantigens in a patient. The neoantigens identified by the methods or devices of the invention can be used to develop vaccines or T cell therapies against the tumor.

Description

Method for identifying neoantigens

Technical Field

Background

Cancer is characterized by abnormal cell proliferation. The success of conventional therapy depends on the type of cancer and the stage at which it is detected. Many treatments involve expensive and painful surgery and chemotherapy and are often unsuccessful, or only moderately prolong the life of the patient. Promising therapies being developed include tumor vaccines or T cell therapies targeting tumor antigens, which enable the patient's immune system to distinguish between tumor and healthy cells and elicit the patient's immune response.

Neoantigens are a class of immunogens that are associated with patient-specific tumor-specific mutations. The neoantigens have shown good promise as targets for anti-tumor immunization technologies, such as personalized tumor vaccines.

Although there are strategies for identifying candidate neoantigens by sequencing and HLA typing, there are disadvantages of high false positive rate, few applicable population, and the like, which severely limit the development of neoantigen-based anti-tumor vaccines. Thus, there remains a need in the art for new methods for identifying neoantigens.

Brief description of the invention

In one aspect, the present invention provides a method of identifying a neoplastic antigen in a subject, the method comprising the steps of:

(a) analyzing the sequencing results of the whole exome of the tumor tissue or cell and the normal tissue or cell of the object to identify the tumor tissue specific somatic mutation;

(b) analyzing the subject tumor tissue or cells for transcriptome sequencing results and further screening for somatic mutations identified in step (a);

(c) analyzing the sequencing result of the whole exome of the normal tissues or cells of the subject, and carrying out HLA typing on the patient;

(d) analyzing the binding of the mutant peptide corresponding to the somatic mutation to MHC based on the results of steps (b) and (c), thereby screening candidate tumor-specific neoantigens.

In another aspect, the present invention provides a device for identifying a tumor neoantigen in a subject, the device comprising: a memory for storing a program; a processor for implementing the method for identifying a tumor neoantigen in a subject of the present invention by executing the program stored in the memory.

In another aspect, the present invention provides a computer-readable storage medium comprising a program executable by a processor to perform the method of the present invention for identifying a tumor neoantigen in a subject.

In another aspect, the present invention provides a device for identifying tumor neoantigens in a subject, the device comprising the following four modules: a somatic mutation identification module I) for identifying tumor-specific somatic mutations based on the results of whole exome sequencing of the tumor tissue or cells and normal tissue or cells of the subject; a tumor specific somatic mutation screening module II for further screening tumor specific somatic mutations based on the transcriptome sequencing results of the tumor tissue or cells of the subject; an HLA typing module III for HLA typing based on the sequencing result of the whole exome of the normal tissue or cell of the subject); and tumor neoantigen prediction module IV).

In another aspect, the invention provides a neoplastic antigen identified according to the method or device of the invention.

In another aspect, the invention provides a pharmaceutical composition comprising a tumor neoantigen identified according to the method or device of the invention, and a pharmaceutically acceptable carrier.

In another aspect, the invention also provides the use of a tumor neoantigen identified according to the method or device of the invention or a pharmaceutical composition of the invention in the manufacture of a medicament for the treatment and/or prevention of cancer.

In another aspect, the present invention provides a method of treating cancer in a subject, the method comprising:

a) identifying at least one neoplastic antigen of the subject by the method or device of the invention;

b) generating at least one tumor neoantigen identified in step a); and

c) administering to said subject said at least one tumor neoantigen produced in step b).

Drawings

FIG. 1, a flow chart showing the method of identifying neoantigens according to the present invention.

FIG. 2, candidate neoantigen of H22 cells. The neoantigen of RPKM >0 in H22 cells is shown, with the red line representing the threshold line for RPKM ═ 1. RPKM is more than or equal to 1 and is selected as a candidate neoantigen.

Figure 3 shows the animal pharmacodynamics experimental protocol.

Figure 4, shows H22 subcutaneous tumor-bearing mouse groupings. On day 5 of growth of the subcutaneous tumors, tumor size was measured with a vernier caliper and grouped after volume calculation. A total of 6 groups were set: ctrl, poly I: C, SLPs, anti-PD1, poly I: C + anti-PD1, SLPs + anti-PD1, starting 6 animals per group. SLPs are H22 neoantigen synthesized 25 amino acid long peptide, and poly I: C is adjuvant. Tumor volume calculation formula: v_Tumor(s)＝(L _{Long and long} _{Diameter of a pipe}×L _{Short diameter} ²)1/2。

FIG. 5 shows the tumor growth curve of H22 subcutaneous tumor-bearing mice. Tumors were first measured 5 days after tumor inoculation and every 3 days thereafter. After data collection was complete, a single mouse tumor growth curve was plotted. During the experiment, when the tumor volume of the mice grows to 2000-3000mm³At intervals, the test was stopped and sacrificed.

Figure 6 shows pictures of H22 subcutaneous tumor-bearing mice. Mice were sacrificed and photographed 26 days after tumor inoculation. Tumor-bearing pictures of each mouse of each group are shown.

FIG. 7 shows SLP corresponding to stacked ASPs design. Each SLPs was designed as a 4-amino acid tandem assay peptide (ASP) of 15 amino acids. The ASPs design is shown as model H22 SLP1(Bard1) with the red marker letter being the mutated amino acid.

FIG. 8 shows the result of IFN-. gamma.ELISPOT assay of mouse splenocytes. ASPs of SLP1-17 were used to stimulate splenocytes from mice, and IFN-gamma secretion from splenocytes was measured in vitro by ELISPOT method. Each dot in the figure represents a mouse, 5X 10⁵The number of IFN-. gamma.secreting cells in the spleen cells. Single spots in All SLPs 5X 10 mice⁵Total number of IFN-. gamma.secreting cells in the spleen cells that responded to all SLPs. The abscissa, from left to right, is ctrl, poly I: C, SLPs, anti-PD1, poly I: C + anti-PD1, SLPs + anti-PD1, respectively.

Detailed Description

In some embodiments of this aspect of the invention, the sequencing is high throughput sequencing, also known as next generation sequencing ("NGS"). Second generation sequencing produces thousands to millions of sequences simultaneously in a parallel sequencing process. NGS is distinguished from "Sanger sequencing" (one generation sequencing), which is based on electrophoretic separation of chain termination products in a single sequencing reaction. Sequencing platforms that can be used with the NGS of the present invention are commercially available and include, but are not limited to, Roche/454FLX, Illumina/Solexa Genome Analyzer, and Applied Biosystems SOLID system, among others.

Exome sequencing is a genome analysis method of high-throughput sequencing after capturing and enriching DNA of the whole genome exome region by using a sequence capture technology. Because it has high sensitivity to common and rare variations, only 2% of the genome need be sequenced to discover most disease-related variations in exon regions.

Transcriptome sequencing is to obtain almost all transcripts and gene sequences of specific cell or tissue of some species in some state via the second generation sequencing platform, and may be used in research of gene expression amount, gene function, structure, alternative splicing, new transcript prediction, etc.

The normal tissue or cell may be any non-neoplastic tissue or cell, such as peripheral blood (for non-hematologic cancers) or a tissue adjacent to a cancer, preferably peripheral blood.

The tumor tissue or cells include, but are not limited to, the following tumor tissues or cells: liver cancer, lung cancer, ovarian cancer, colon cancer, rectal cancer, melanoma, kidney cancer, bladder cancer, prostate cancer, breast cancer, lymphoma, hematological malignancies, head and neck cancer, glioma, stomach cancer, nasopharyngeal cancer, laryngeal cancer, pancreatic cancer, cervical cancer, esophageal cancer, small intestine cancer, chronic or acute leukemia, and osteosarcoma.

"somatic mutation" refers to a mutation that occurs in a somatic cell of an organism other than a germ cell. Somatic mutations do not pass on to offspring, but may result in the phenotype of the contemporary organism, for example, in a tumor. Somatic mutations generally refer to nucleotide mutations in a DNA sequence. However, as will be understood by those skilled in the art, the term may also refer to corresponding amino acid mutations in a particular context.

As used herein, the term "antigen" refers to a substance, such as a polypeptide, that induces an immune response. As used herein, the term "neoantigen" is an antigen having at least one alteration that makes it different from the corresponding wild-type parent antigen, e.g., the alteration is a tumor-specific somatic mutation. As used herein, the term "tumor neoantigen" or "tumor-specific neoantigen" is a neoantigen that is present in a tumor cell or tissue of a subject but is substantially absent from a normal cell or tissue of the subject. The term "neoantigen" can be a full-length protein, or a portion thereof that comprises the alteration. For example, a "tumor neoantigen" can be a polypeptide (mutant peptide) comprising a tumor-specific somatic mutation, particularly a polypeptide that is immunogenic (e.g., comprises a T cell epitope), which is truncated from the full-length protein. The polypeptide may be about 8 to about 35 amino acids in length, e.g., 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 amino acids, or any range therebetween.

As used herein, "subject" means a mammal, including a rodent or primate, e.g., mouse, rat, monkey, human. Preferably the subject is a human.

As used herein, "MHC" refers to the major histocompatibility complex (major histocompatibility complex). Human MHC is also called HLA (human Leukocyte antigen). It will be understood by those skilled in the art that the term is not limited to humans when used in other species where HLA typing, as used herein, refers to MHC typing in nature.

As mentioned above, the method of identifying a tumor neoantigen of the present invention comprises four main steps, wherein the first step (a)) is aimed at the accurate analysis of tumor-specific somatic mutations in tumor tissue.

By analyzing the paired normal and tumor tissues or cells, the obtained tumor tissue-specific or mutation ratio is obviously higher than that of the normal tissue of the same individual, and the obtained tumor tissue-specific or mutation ratio is considered to be the specific somatic mutation generated by the tumor tissue. In general, tumor tissue genomes are highly dynamic, changing in progression, and highly heterogeneous. In the process of tumor genome sequencing, the purity of tumor cells of a plurality of samples can not reach 80 percent, and the purity of tumor cells of a plurality of samples can be even lower. These cause tumor-specific somatic mutations that are difficult to accurately find.

Various algorithms for detecting tumor somatic mutations based on different principles have been published today, including but not limited to: 1) strelka uses a novel bayesian approach that considers the allele frequencies of cancer and paracarcinoma tissues as continuous values, i.e., paracarcinoma tissues are represented as a mixture of reproductive and noise, and tumor tissues are represented as a mixture of paracarcinoma and somatic mutations. Therefore, Strelka can guarantee higher sensitivity even for impure samples. Strelka searches InDels candidates for subsequent re-alignment (indel alignment); then, the physiological variable stability is calculated according to the information of the re-comparison, and a series of filtration is carried out to obtain a credible somatic mutation detection result. 2) MuTect2 is based on the GATK HaplotpypeCaller module, and finds regions to be further analyzed through obvious mutation evidence, which is called ActiveRegions. The algorithm then builds a De Brujin-like map, reassembles ActiveRegions, detects haplotypes that may be present, and realigns using the Smith-Waterman algorithm. Using PairHMM algorithm, ActiveRegions are paired up with each haplotype on a read data basis to generate a haplotype likelihood matrix. This matrix is then transformed to generate allele likelihoods for each possible variation position, and the probability of somatic mutation at each potential variation position is inferred. 3) The detection principle of the TNHaplotpyper of sentienon is consistent with that of Mutect2, co-registration is carried out on a cancer sample and a matched paracarcinoma sample, and then the mutation detection of the textual SNV and the Indel is carried out on the comparison BAM file after the series of operations through the TNHaplotpyper model of sentienon. However, each of the above methods has the disadvantages of high false positive rate and poor accuracy.

Aiming at the defects of the existing tumor somatic mutation detection, the inventor constructs a set of analysis flow and strategy which can effectively reduce the false positive rate of detection and improve the accuracy of somatic mutation detection.

Firstly, the inventor selects a plurality of methods based on different principles, uses the methods to detect the somatic mutation in the tumor tissue from high-throughput sequencing data respectively, and then takes intersection of the somatic mutation detection results of independent analysis, thereby greatly reducing the false positive rate of detection. The methods for detecting somatic mutations include, but are not limited to, the Strelka1 (see https:// adaptive. oup. com/bioinformatics/article/28/14/1811/218573), Strelka2 (see https:// www.nature.com/articles/s41592-018-0051-x), VarScan (see http:// vacuum. sourceforce. net), Mutect2 (see http:// www.broadinstitute.org/caner/cga/mut), and/or MuSE (see https:// bioinformatics. mdanderson. org/main/MuSE) methods. Other methods of detecting somatic mutations known in the art may also be applied to the present invention.

In some embodiments of the methods of the invention, step (a) identifies tumor-specific somatic mutations from the whole exome sequencing results by at least 3, at least 4, at least 5, e.g., 3, 4, 5, 6, 7, 8, 9, or 10 or more different methods, respectively, independently.

In some embodiments, step (a) identifies tumor-specific somatic mutations from the whole exome sequencing results by at least 3 different methods, respectively independently, and selects for tumor-specific somatic mutations that were all identified in the at least 3 different methods. For example, the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2, and MuSE.

In some preferred embodiments, the tumor-specific somatic mutations are identified using at least 5 different methods, e.g., the at least 5 different methods include Strelka1, Strelka2, VarScan, Mutect2, and MuSE. However, other methods known in the art for detecting somatic mutations may be further included.

In addition, the parameters of the method can be adjusted according to needs, and the detection threshold value is increased, so that the false positive rate of detection is further reduced.

More importantly, the present inventors have surprisingly found that tumour specific somatic mutations can be obtained more accurately by further filtering the results obtained by setting a specific set of filtering criteria. Thus, in some embodiments, step (a) further screens for somatic mutations that meet the following criteria:

1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;

2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;

3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;

4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and

5) the allelic frequency of the mutation is less than 0.01 in the sequencing result of the whole exome of a normal tissue or cell comprising at least 100, at least 200, at least 300 or more, for example 200 and 300 normal subjects.

As used herein, "sequencing depth" refers to the ratio of the total number of bases obtained by sequencing to the size of the genome to be tested (number of bases). For example, a target region of 1000bp in length is sequenced to give a total of 200 reads (reads), each 50bp in length, to a sequencing depth of 200 × 50bp/1000bp ═ 10.

As used herein, "allele frequency" refers to the proportion of a particular variation in a sample that is among all alleles at that variation site. For example, in a sample sequencing data, the ratio of the number of reads that contain a particular variation to the number of reads at all of the sites is the allelic frequency of the variation.

The method for identifying a tumor neoantigen of the present invention comprises the second step (b)) of further screening candidate somatic mutation sites in combination with information on the gene expression level, prediction of the gene function of the mutation, and the like.

In this step, for each individual cell mutation obtained by the first step, analysis of annotation of the mutation site at the gene structure level, the mutation function level (affecting the gene-encoding function level) is performed based on the NCBI human genome annotation information database.

In the NCBI annotation database, annotation of mutation sites at the gene structure level included: exonic, helicing, ncRNA, UTR5/UTR3, intron, upstream/downstream, intergenic > undnown. In some embodiments of the methods of the invention, the screening priority order is: the exon is divided into ncRNA, UTR5/UTR3, intron, upstream/downstream, interactive and unknown.

In the NCBI annotation database, the annotation that the mutation site affects the coding function of the gene includes: stopgain, stoplos, nononyymous SNV, synonymous SNV, and unbnown. In some embodiments of the methods of the invention, the screening priority order is: stopgain > stoploss > nonsynonymous SNV > synonymous SNV > unknown.

In some preferred embodiments, somatic mutations are selected for structural level annotation of the gene as exonic and affecting gene coding functional level annotation as nonsynonymous SNV (non-synonymous single nucleotide variation).

In addition, based on transcriptome sequencing data of tumor tissues or cells, the expression levels of all about 3 ten thousand protein-encoding genes that have been annotated in the NCBI human genome annotation information database can be detected. Thus, in this step, selection of somatic mutations based on gene expression levels may also be included.

In some embodiments, wherein somatic mutations are selected that are located within a highly expressed gene, for example, the highly expressed gene has an rpkm (reads Per Kilobase Per Million mapped reads) greater than or equal to 1. RPKM is the product of the number of reads localized to the gene (exon) divided by the number of all reads localized to the genome (in million) and the length of the gene (exon) (in kb).

Through the above steps, tumor-specific somatic mutations that are located within highly expressed genes and that alter the amino acid sequence can be identified. Thus, in some embodiments, the somatic mutation of the present invention is a mutation located in the protein coding sequence of a highly expressed gene, and which results in an amino acid mutation.

In addition, based on transcriptome sequencing data of tumor tissues or cells, the expression level of HLA gene, CD4 gene and/or CD8 gene of the subject can also be evaluated to determine whether the subject is suitable for immunotherapy with tumor neoantigens.

Thus, in some embodiments, step b) further comprises assessing the expression level of an HLA gene, a CD4 gene and/or a CD8 gene in the subject.

The method of identifying a tumor neoantigen of the present invention comprises a third step (c)) of HLA-typing the subject based on the sequencing of the whole exome of normal tissues or cells of the subject.

HLA typing remains a problem in medicine today. In clinic, the current gold standard method for HLA typing recommended by the World Health Organization (WHO) is PCR-SBT technology, but the method has the problems of non-unique typing, low resolution (4 bits), long time consumption (15 days-20 days), high cost (2000 yuan/sample) and the like.

In the invention, the sequencing data of the exons of normal tissues or cells (such as peripheral blood) of a subject are utilized to carry out HLA typing, the information of each allele on all currently known HLA I/II type gene loci is integrated, and the sequencing data of the exons are used for carrying out high-precision comparison analysis on 2 levels of amino acid sequences and nucleotide sequences, so that the typing aiming at the HLA I/II gene loci can realize the precision of more than 6 (2 x 3), the analysis time is not more than 3 hours, and the precision is more than 98 percent (the consistency of comparison with the technical result of the 'gold standard' PCR-SBT).

In some embodiments, at least one or more, preferably all, of the following databases are used for HLA typing in step (c): ATHLATES (http:// www.broadinstitute.org/scientific-community/science/projects/visual-genetic mics/athletics), HLA-HD (https:// www.genome.med.kyoto-u.ac.jp/HLA-HD /), HLAVBseq (http:// nagasakilab.csml.org/HLA), seq2HLA (http:// bitbucket.org/sebastin _ boegel/seq2HLA), and HLAminer (http:// www.bcgsc.ca/platform/bioinfor/software/wrapper/hliner).

The fourth step (d)) of the method for identifying a tumor neoantigen of the present invention is to predict a tumor neoantigen for a specific HLA type by mutating the amino acid sequence-altered tumor-specific somatic cell selected from the high-expression genes based on the analysis results of the first 3 steps.

In some embodiments, step (d) comprises:

d1) extracting an amino acid sequence corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation;

d2) based on the HLA typing results of step (c), scoring and ranking the extracted mutant peptides independently by MHC binding affinity, MHC binding stability, proteasome digestion, mass spectrometry data, respectively; and

d3) based on the results of step d2), candidate tumor neoantigens are selected by scoring and ranking the mutant peptides by geometric mean.

As used herein, an "amino acid sequence or mutant peptide" corresponding to the somatic mutation refers to an amino acid sequence or peptide comprising the amino acid mutation resulting from the somatic mutation, which is encoded by a nucleotide sequence in the genome of the subject comprising the somatic mutation.

In some embodiments, the amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, corresponding to said somatic mutation is extracted in d 1). For example, a series of mutant peptides of about 8 to about 35 amino acids in length corresponding to the somatic mutations can be obtained by extracting, for each tumor-specific somatic mutation identified through the preceding steps, the entire amino acid sequence extending forward and/or backward by about 7 to about 17 amino acids centered on the corresponding mutant amino acid (i.e., the mutant amino acid resulting from the somatic mutation) based on the amino acid sequence of the protein encoded by the nucleotide sequence in the genome of the subject that includes the somatic mutation. Preferably, for each tumor-specific somatic mutation identified by the preceding steps, for example, the entire amino acid sequence extending forward and backward about 7 to about 13 amino acids centered on the corresponding mutated amino acid can be extracted, thereby obtaining a series of mutated peptides of about 15 to about 27 amino acids in length corresponding to the somatic mutation.

The obtained mutant peptides were then scored and ranked independently for their likelihood of being candidate neoantigens from the perspective of their respective MHC binding affinity, MHC binding stability, proteasome cleavage (i.e., whether the mutant peptides could be produced by proteasome cleavage), and mass spectral data for the corresponding HLA type determined by the foregoing steps.

In some embodiments, the extracted mutant peptides are scored and ordered in step (d2) using one or more methods/tools selected from the group consisting of NetMHCns (http:// www.cbs.dtu.dk/services/NetMHCns), NetMHC (http:// www.cbs.dtu.dk/services/NetMHC), NetMHCpan (http:// www.cbs.dtu.dk/services/NetMHCpan), PickPocket (http:// www.cbs.dtu.dk/services/PickPocket), MHCflurry (htps:// www.sciencedirect.com/science/arle/pii/S2405471218302321 dgcid ═ r ss _ sd _ all), NetMHClab (http:// www.cbs.dtu.dk/services/MHCsacb-1.0), Chot (www.cbs.dtu.dk/services/Chop). For example, the binding affinity of a mutant peptide to a particular MHC can be analyzed using NetMHCcons, NetMHC, NetMHCpan, and/or PickPocket tools; the netMHCstab tool can be used to analyze the binding stability of mutant peptides to specific MHC; MHCflurry can be used to predict binding of mutant peptides to MHC depending on mass spectral data; NetChop can be used to analyze the possibility of proteasome cleavage to generate mutant peptides.

And finally, performing final comprehensive scoring sequencing on the mutant peptides by a geometric mean method based on prediction results of different angles. For example, for a particular mutant peptide, the MHC binding affinity ranking is 3, the MHC binding stability ranking is 2, the proteasome cleavage ranking is 2, the mass spectrometry data ranking is 4, and the geometric mean ranking is

The mutant peptides can be ranked according to the geometric mean score and candidate tumor neoantigens selected therefrom.

By the method, the tumor neoantigen can be identified with higher accuracy, and the false positive rate is obviously reduced.

Those skilled in the art will appreciate that all or part of the functions of the above-described method steps may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions of the above method steps are implemented by means of a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

In a second aspect, the present invention provides a device for identifying a tumor neoantigen in a subject, the device comprising the following four modules: tumor specific somatic mutation identification module I); tumor specific somatic mutation screening module II); HLA typing module III); and tumor neoantigen prediction module IV).

Wherein the tumor specific somatic mutation identification module I) identifies tumor specific somatic mutations based on whole exome sequencing results of the tumor tissue or cells and normal tissue or cells of the subject.

In some embodiments, the tumor-specific somatic mutation identification module I) identifies a somatic mutation from the whole exome sequencing results by at least 3 different methods independently and respectively, and selects a somatic mutation that was identified in all of the at least 3 different methods. For example, the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2, and MuSE.

In some preferred embodiments, the tumor specific somatic mutation identification module I) identifies the somatic mutations using at least 5 different methods, e.g., the at least 5 different methods include strelska 1, strelska 2, VarScan, Mutect2, and MuSE. However, other methods known in the art for detecting somatic mutations may be further included.

In some embodiments, the tumor-specific somatic mutation identification module I) further screens for somatic mutations that meet the following criteria:

Tumor-specific somatic mutation screening module II) further screening for tumor-specific somatic mutations based on the transcriptome sequencing results of the tumor tissue or cells of the subject.

In some embodiments, the tumor-specific somatic mutation screening module II) selects a somatic mutation based on the gene expression level. In some embodiments, it selects for somatic mutations that are located within a highly expressed gene, e.g., the highly expressed gene has an RPKM of 1 or greater.

In some embodiments, the tumor specific somatic mutation screening module II) performs a selection of the somatic mutations at the gene structural level and at the level of affecting gene coding function, e.g., selecting a somatic mutation with a gene structural level annotated as exonic and a level of affecting gene coding function annotated as nosynonymous SNV.

In some embodiments, tumor-specific somatic mutation screening module II) also optionally evaluates the expression level of HLA gene, CD4 gene, and/or CD8 gene in the subject.

HLA typing module III) HLA typing is performed based on the whole exome sequencing results of normal tissues or cells of the subject.

In some embodiments, HLA typing module III) HLA types using at least the following databases: ATHLATES, HLA-HD, HLAVBseq, seq2HLA and HLAminer.

Tumor neoantigen prediction module IV) predicts tumor neoantigens based on the results of the three steps.

In some embodiments, tumor neoantigen prediction module IV):

extracting an amino acid sequence corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation, for example, extracting an amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, for example, 25 amino acids, corresponding to the somatic mutation;

based on HLA typing results, the extracted mutant peptides are respectively and independently scored and sequenced through MHC binding affinity, MHC binding stability, proteasome enzyme digestion and mass spectrum data; and

and comprehensively scoring and sequencing the mutant peptides by a geometric mean method, thereby selecting candidate tumor neoantigens.

In some embodiments, the extracted mutant peptides are scored and ordered using one or more selected from NetMHCcons, NetMHC, NetMHCpan, PickPocket, MHCflurry, netMHCstab, NetChop.

In another aspect, the present invention also provides a device for identifying a tumor neoantigen in a subject, the device comprising: a memory for storing a program; a processor for implementing the method of the first aspect of the invention by executing the program stored in the memory.

In another aspect, the invention also provides a computer readable storage medium comprising a program executable by a processor to perform the method of the first aspect of the invention.

As used herein, a "pharmaceutically acceptable carrier" is a substance that can be added to an active pharmaceutical ingredient to help formulate or stabilize the formulation without causing significant adverse toxicological effects to the patient, including, but not limited to, disintegrants, binders, fillers, buffers, isotonic agents, stabilizers, antioxidants, surfactants, or lubricants.

In some embodiments, the medicament is a tumor vaccine. In some embodiments, the vaccine is a therapeutic vaccine.

In some embodiments, the pharmaceutical composition or the medicament further comprises an adjuvant. For example, the adjuvant is poly I: C.

b) generating at least one tumor neoantigen identified in step a); and

In some embodiments, wherein a plurality of tumor neoantigens are identified, generated, and administered, e.g., at least 5, at least 10, at least 20, at least 30, at least 40, at least 50, or even more tumor neoantigens.

In some embodiments, the tumor neoantigen is administered with an adjuvant. For example, the adjuvant is poly I: C.

In some preferred embodiments, the method further comprises administering to the subject an immune checkpoint inhibitor. The immune checkpoint inhibitors include, but are not limited to, PD1 antibodies, PDL1 antibodies, CTLA-4 antibodies, and the like.

In various aspects and embodiments herein, the cancer includes, but is not limited to, liver cancer, lung cancer, ovarian cancer, colon cancer, rectal cancer, melanoma, kidney cancer, bladder cancer, prostate cancer, breast cancer, lymphoma, hematologic malignancies, head and neck cancer, glioma, stomach cancer, nasopharyngeal cancer, laryngeal cancer, pancreatic cancer, cervical cancer, esophageal cancer, small bowel cancer, chronic or acute leukemia, and osteosarcoma.

In this context, the term "and/or" covers all combinations of items connected by the term, which should be taken as if each combination had been listed individually herein. For example, "a and/or B" encompasses "a", "a and B", and "B". For example, "A, B and/or C" encompasses "a", "B", "C", "a and B", "a and C", "B and C", and "a and B and C".

The invention is explained in more detail below with reference to specific embodiments and the drawing. However, it should not be construed as limiting the invention.

Examples

The study takes a mouse liver cancer model as an example, and starts from the second-generation sequencing of tumor tissues and peripheral blood whole exome and the second-generation sequencing result of transcriptome, the specific neoantigen of liver cancer is identified.

Example 1, accurate analysis of specific somatic mutations (physiological mutations) in tumor tissues:

1.1 summary of public databases and publicly published algorithms required for this example

TABLE 1

1.2 the specific method steps:

1) raw sequencing data acquisition and interpretation (raw data) of tumor tissue samples and peripheral blood control samples: the whole exome sequencing is a genome analysis method of high-throughput sequencing after capturing and enriching DNA of a whole genome exome region by using a sequence capture technology. Because it has high sensitivity to common and rare variations, only 2% of the genome need be sequenced to discover most disease-related variations in exon regions. The whole exome sequencing technology has the characteristics of strong pertinence, deep coverage, high data accuracy, simplicity, convenience, economy, high efficiency and the like.

Tumor tissue samples and peripheral blood control samples were obtained and subjected to high throughput exon sequencing via Illumina platform. The obtained original image data file is converted into a sequencing read (Sequenced Reads) through CASAVA Base recognition (Base Calling) analysis, and the result is stored in a FASTQ (fq for short) file format and is called Raw Reads.

The FASTQ file contains the name of each read, the base sequence, and its corresponding sequencing quality information. In the FASTQ format file, each base corresponds to a base Quality character, and the sequencing Quality Score (Phred Quality Score) is obtained by subtracting 33 from the ASCII code value corresponding to each base Quality character. Different Phred Quality Score represents different base sequencing error rates, e.g., values of 20 and 30 for Phred Quality Score indicate a base sequencing error rate of 1.0% and 0.1%, respectively. The FASTQ format is exemplified as follows:

(1) the first line starts with "@", followed by Illumina sequencing tag Identifiers (Sequence Identifiers) and descriptors (optional section);

(2) the second row is a base sequence;

(3) the third line begins with "+", followed by the Illumina sequencing tag identifier (selective moiety);

(4) the fourth row is the sequencing quality value of the corresponding base, and the value of ASCII corresponding to each character in the row is subtracted by 33, namely the sequencing quality value of the corresponding base in the second row is obtained.

2) Quality control and filtering of raw sequencing data (clean data): the raw sequencing data were quality assessed using the FastQC algorithm. The raw sequencing data was processed using Trim _ galore software, with the following criteria: the linker sequence fragments and low quality fragments with a Q value less than 20 were removed from the 3' end, while fragments less than 70bp in length were removed, resulting in Clean high quality sequenced sequence fragments for subsequent analysis (Clean data).

3) Sequencing data aligned to the reference genome (alignment): high quality sequencing data by quality control were aligned to the reference genome using the Bowtie2 algorithm. The alignment results are sorted, repeated sequences are labeled and removed.

4) Analyzing somatic mutations in the tumor tissue sample sequencing data results by comparison with the sequencing results of peripheral blood control samples: somatic mutations in tumor tissues are detected by using Strelka1, Stralka2, VarScan, Mutect2 (sentien) and MuSE analysis algorithms respectively, and then intersection is taken for detection results of the 5 independently analyzed somatic mutations, so that the false positive rate of detection is greatly reduced. And the parameters of each algorithm are adjusted, so that the detection threshold is improved, and the false positive rate of detection is further reduced.

5) Integration and filtering of 5 independent algorithm results (consistency and filtering): and (3) taking intersection sets from the somatic mutation detection results of the above 5 independent analyses, and filtering to obtain a high-quality somatic mutation result in the tumor tissue. The filtration criteria were as follows: (i) the sequencing depth of both tumor tissue and peripheral blood samples > -10; (ii) in tumor sample data, the number of reads supporting this variation > -3 (de-duplication data); (iii) in tumor sample data, the allele frequency > of the mutation is 0.1; (iv) in peripheral blood sample sequencing data, the allele frequency of the variation is < 0.01; (v) the frequency of this variation was <0.01 in 100 normal human peripheral blood exon sequencing data that the inventors have established.

6) The specific somatic mutations selected were verified by first generation sequencing (Sanger sequencing). The results show that the false positive rate of somatic mutations identified by the methods of the invention is reduced by a factor of 2-3 compared to prior art methods.

Example 2 screening of somatic mutation sites based on Gene expression level and prediction of mutated Gene function

For each individual cell mutation detected in example 1, gene-based (gene-based) and functional-based (region-based) annotation of the mutation site was performed based on the NCBI human genome annotation information database.

(1) Annotation information and priority order at the gene structure level: the exon is divided into ncRNA, UTR5/UTR3, intron, upstream/downstream, interactive and unknown.

(2) Annotation information and priority order affecting gene coding function: stopgain > stoploss > nonsynonymous SNV > synonymous SNV > unknown.

In the present invention, only the nnsynymous SNV (affecting the level of gene coding function) located in exonic (structural level of gene) was selected.

At the same time, the expression level of the annotated protein-encoding genes in all NCBI human genome annotation information databases is detected based on transcriptome sequencing data, from which

(1) Further screening out somatic cell mutation on genes with high and medium expression level (RPKM is more than or equal to 1);

(2) the expression level of HLA gene/CD 4/CD8 in the sample was evaluated.

Therefore, somatic mutations which can change protein coding sequences and are positioned on genes with high and high expression levels can be further screened, and the expression levels of the HLA genes/CD 4/CD8 are evaluated to judge whether the patient is suitable for the tumor neoantigen immunotherapy at present.

2.1 summary of public databases and published algorithms required for this example

TABLE 2

2.2 the specific method steps:

1) raw sequencing data acquisition and presentation of tumor tissue sample transcriptome (raw data): tumor tissue samples were obtained, mRNA was captured using characteristic PolyA sequences and second-generation sequencing was performed. The original image data file obtained by high-throughput sequencing (Illumina) is converted into a sequencing read (sequential Reads) through CASAVA Base recognition (Base Calling) analysis, and the result is stored in a FASTQ (fq for short) file format and is called Raw Reads.

3) Sequencing data aligned to the reference genome (alignment): and (3) aligning the quality-controlled high-quality sequencing data to a reference genome by using a Tophat2 algorithm, and sequencing the aligned results.

4) Analysis of gene expression level (gene expression information): the expression level of each gene was evaluated by calculating the RPKM value.

5) Functional annotation analysis of somatic mutations (mutation annotation interpretation): for each individual cell mutation, analysis of gene-based (gene-based) and functional (region-based) annotation of the mutation site was performed based on the NCBI genome annotation information database. Only the nonynonymous SNV (affecting the level of gene coding function) located in exonic (structural level of the gene) was selected.

The results are shown in the following table:

somatic mutations in mouse models were further screened based on gene expression levels and annotation information. The number of somatic mutations that were further selected for each model is shown in bold.

TABLE 3

Example 3 HLA-I/II typing of test samples based on peripheral blood exon sequencing data

3.1 summary of public databases and published algorithms required for this example

TABLE 4

3.2 the specific method steps:

1) raw sequencing data acquisition and quality control and filtration: the same as in example 1.

2) Based on the 5 different HLA genotype database information, the sequencing data were aligned strictly to HLA gene annotated regions and HLA typed. Based on the analysis results of 5 different databases, HLA typing was judged.

Results as shown in the table below, it can be seen that the method of the present invention can achieve typing of more than 6 (2 x 3) HLA sites in 8 individuals, and the accuracy is greater than 98% compared to the gold standard PCR-SBT technique.

TABLE 5

The two columns shown in bold are the typing results for PCR-SBT.

Example 4 screening of personalized tumor neoepitope Using optimized computational model platform

In this example, based on the analysis results of the first 3 examples, the tumor neoantigen was predicted for a specific HLA type against somatic mutations selected from genes with high or medium expression levels that alter the protein coding sequence. The embodiment adopts a multi-angle analysis and comprehensive prediction strategy. Although this strategy will filter out some positive results, the screening of retained neoantigens is more accurate and the false positive rate is low. In this example, tumor-specific neoantigens are independently predicted from the aspects of binding affinity (binding affinity), binding stability (binding stability), proteasomal cleavages (proteasomal cleavages), and Mass spectrometry data (Mass spectrometry), and then the results of independent analysis from different angles are integrated, so that neoantigens with significant effects from several angles are screened. And finally, sequencing the predicted neoantigens by adopting a strategy of a geometric mean method.

4.1 summary of public databases and publicly published algorithms required for this example

TABLE 6

4.2 the specific method steps:

1) based on the somatic mutation sites analyzed in examples 1-3, all amino acid sequences were extracted by extending 7-13aa forward and backward around the missense mutation site in the protein coding region as the center.

2) Binding predicted HLA typing, tumor specific neoantigens were predicted independently for binding affinity (binding affinity), binding stability (binding stability), proteasomal cleavage (proteasomal cleavages), Mass spectrometry data (Mass spectrometry) using NetMHCcons, NetMHC, NetMHCpan, PickPocket, MHCflurry, netMHCstab, NetChop, respectively, ranked from top to bottom according to likelihood.

3) And finally, comprehensively sequencing the predicted neoantigens by adopting a geometric mean method according to sequencing of different methods.

Example 5 validation of the method and Effect of neoantigen identification based on H22 mouse tumor model

Screening of tumor neoantigen

Screening for tumor somatic mutations by Whole Exon Sequencing (WES)

First, mouse hepatocellular carcinoma (HCC) H22 cells were purchased from a double denier cell bank (FDCC). Genomic DNA was extracted from cells cultured at the 10 th passage. 6-8 weeks old Balb/C mice originated from H22 cells purchased from Beijing Wittingle, Inc., and genomic DNA was extracted from rat tail tissue. Subsequently, the above genomic DNA samples were subjected to 200 XWES sequencing by Shanghaineo and provenance. The original sequencing data were analyzed bioinformatically as described in the above examples, i.e., using Balb/C gene sequence as wild type control, calculating the frequency of somatic mutation/allelic mutation of H22 cells, and analyzing the MHC class I molecular typing of Balb/C mice and H22 cells. The results show that the H22 cell has 108 genes with amino acid mutation, namely, the H22 cell contains 108 candidate neoantigens, and the MHC class I molecules of the H22 cell and the Balb/C mouse are H2-Kd types.

RNA sequencing (RNA-seq) to detect Gene expression levels

The H22 cells cultured at the 9 th and 10 th generations were taken for RNA-seq to detect the mRNA expression level of the gene, and the average of the two generations was taken to represent the protein expression level. The mRNA expression level is represented by RPKM value, and the larger the value, the higher the expression level.

MHC class I molecule affinity prediction for neoantigenic peptides

The immunogenicity of peptides includes the ability of MHC class I/II molecules to present peptides (characterized by the MHC molar affinity of the peptide fragments and the stability of the MHC-peptide complex) and the ability of TCR to recognize the MHC-peptide complex. MHC I presented peptides are recognized by CD 8T cells, predicting that MHC I affinity of the peptide will aid in predicting its ability to activate CD 8T cell immune responses.

Some representative results of the above three steps are shown in the following table.

TABLE 7

Selection of candidate pool of neoantigens

Tumor cells usually contain multiple neoantigens, e.g., 108 in H22, and the gene expression level is the first factor in determining whether a neoantigen is a suitable vaccine target. According to the data fed back by RNA-seq sequencing, a candidate neoantigen library of H22 cells is screened by taking RPKM (RPKM is more than or equal to 1) as a standard, and 23 candidate neoantigens are screened in total (figure 2).

Second, design of new antigen vaccine

In general, a plurality of nascent CD 8T cell epitopes including mutation sites may exist near the mutation sites of amino acids, and the length is usually 8-13 amino acids. These epitopes have the potential to successfully attack target sites by CD 8T cells, and in order to cover the epitopes to the maximum extent, the immunogen corresponding to a single mutation site is designed according to the following principle: according to the protein amino acid sequence, 25 amino acid long peptides with 12 amino acids expanded on both sides by taking a mutation site as a center are used as immunogens. 23 neoantigens are screened out by the H22 model, long peptide sequences of the neoantigens are determined respectively, and then the neoantigens are handed over to Gill Biochemical (Shanghai) Co., Ltd for synthesis, and finally 17/23 long peptides are successfully synthesized. When in treatment, the selected new antigen long peptide is combined to obtain the long peptide vaccine. The following are long peptide sequences and syntheses.

TABLE 8

	Immunogen long peptide sequence		Immunogen long peptide sequence
Gene	25 amino acid peptides	Gene	25 amino acid peptides
Abcc4	ETLDLSWYLGIYTGLTAVTVLFGIA	Lcp1	VNIGAEDLKEGKLYLVLGLLWQVIK
Agap3	NKEWKKKYVTLCGNGLLTYHPSLHD	Polrmt	QEFVWEASHYLVCQVFKSLQEMFTS
Bard1	CSRCANILKEPVYLGGCEHIFCSGC	Rnf121	QLLDWLRYLVAWKPVIIGLVQGISY
Cep192	VLESLDSAYHQRTHLESELSQLACS	Sdcbp	KVDKVIQAQTAYFANPASQAFVLVD
Dhodh	DLSTQTIREMYARTQGTIPIIGVGG	Sec23ip	YLFALQSHLCYWESEDTALLLLKEI
Dhx37	YQEIVETTKMYMNGVSTVEIQWIPS	Sestd1	EEIESQHSEWFALYVELNQQIAALL
Endog	ELRSYVMPNAPVNETIPLERFLVPI	Slc25a37	RLQMYNSQHQSALSCIRTVWRTEGL
Eya3	HILSVPVSETTYSGQTQYQTLQQSQ	Snd1	LEEKERSASYKPMFVTEITDDLHFY
Fbxo4	QLGSTDHYWNKTVRDPILWRYFLLR	Srr	EDEIKYATQLVWERMKLLIEPTAGV
Hipk1	AIKILKNHPSYASQGQIEVSILSRL	Tiam1	FRFRCYLASLQGWELPNPKRLLAFA
Khnyn	VDFILQREPYCRYINQLSEALLSLN	Vps33a	AAHLSYGRVNLNALREAVRRELREF
Kpnb1	HTSKFYAKGALQCLVPILTQTLTKQ

Third, evaluation of animal pharmacodynamics

The specific pharmacodynamic protocol is shown in figure 3.

Disease model establishment

To establish the H22 tumor-bearing mouse model, 36 SPF-class Balb/C mice (6-8 weeks old, female) were purchased from Beijing Witonglie, Inc. and housed in the SPF-class animal house of the institute of medicine, Zhangjiang school, university, of double denier. Within one week after the mice reached normal status, tumor cell inoculation was immediately started.

Collecting suspension of H22 cells cultured in vitro on the same day of inoculation, washing the cell precipitate twice with sterile PBS after centrifugation, and finally resuspending the precipitate to 2 × 10 with sterile PBS⁷Cells/ml, stored on ice. For H22 subcutaneous tumor inoculation, 0.1ml of cell suspension (about 2X 10) was injected with a syringe⁶Cells) were injected under the right flank. After inoculation is completed, subcutaneous tumor growth conditions are observed every day, obvious nodules are formed on the third day until red and swollen nodules of about 5mm multiplied by 7mm are formed on the fifth day, and the subcutaneous tumor model is successfully established.

Grouping of tumor-bearing mice and treatment results

5 days after tumor inoculation, the length and length of tumor nodules were measured with a vernier caliper and tumor volume was calculated for each mouse. Subsequently, 36 mice were divided into 6 groups according to tumor volume, and the grouping results showed that the tumor volume was relatively uniform in each group of mice (fig. 4).

The first dose was given 6 days after the tumor implantation, and then the second dose was given 4 times on 9, 13 and 16 days, respectively. SLPs vaccine/anti-PD 1/poly I: C three single drugs are prepared each time. When SLPs vaccine is prepared, dissolving 2mg SLP dry powder with 0.1ml DMSO (SIGMA), adding 0.3ml 1640 culture medium (GIBCO, without serum and double antibody) to prepare 5mg/ml mother liquor, subpackaging, and storing in-80 deg.C refrigerator. Subsequently, a total of 17 SLPs were mixed one by one at a ratio of 20. mu.g/SLP/dose, and then the adjuvant poly I: C (Sigma, 5mg/ml) was added at a ratio of 50. mu.g/dose, and finally the volume was made 0.2 ml/dose with sterile PBS. anti-PD1(BE0146) was purchased from BIOXCELL, USA, and the stock solution of anti-PD1 was diluted to 1mg/ml with the recommended buffer solution of pH7.0(IP0070) from the same company. When preparing poly I: C, preparing a solvent according to the SLP preparation method, and then adding poly I: C and PBS with the same quantity as the SLPs vaccine. For administration, SLPs and poly I: C were administered via tumor-to-flank subcutaneous injection (s.c.) and anti-PD1 was administered via intraperitoneal injection (i.p.).

Tumor size was measured every 3 days, and data were collected at 8 time points in total to plot tumor growth curves for individual mice (fig. 5). After the last time point measurement was completed, the mice were sacrificed and photographs of tumor-bearing mice were taken (fig. 6). Subsequently, a sample collection or the like is performed.

Detection result of splenic cell IFN-gamma ELISPOT of tumor-bearing mice after treatment

After mice were sacrificed, spleens of each mouse were collected and spleen single cell suspensions were prepared, and a total of 6 SLPs + anti-PD1 groups and 4 mice (2 mice each with tumor size maximum/minimum) of the remaining 5 groups were selected to count 26 mice, and the neonatal antigen-specific T cell immune response in the spleens of these mice was detected using IFN-. gamma.ELISA spots. The SLPs vaccine contains 17 neoantigens, the SLP numbering of which is as follows (Table 8). To detect immune responses against SLPs, 4 overlapping detection peptides (ASPs) were designed and synthesized for 17 SLPs, respectively (FIG. 7). Prior to ELISPOT assays, splenocytes were stimulated with ASPs in 96-well plates for a long period of time to generate more antigen-specific T cells, as follows: mixing 4 stacked ASPs of one SLP at equal ratio to obtain ASP concentration of 4 μ g/ml, mixing 50 μ l ASPs with 5 × 10⁵Spleen cells are mixed in equal volume, cytokine IL-2 with the final concentration of 20U/ml is added into the system, half volume of fresh culture medium containing 2 xASPs and 2 xIL-2 is used for changing liquid every 3 days, and ELISPOT detection is carried out after 11 days of stimulation. For detection, 50. mu.l of the stimulation mixture was mixed with 50. mu.l of 2 × ASPs, incubated overnight in a 96-well plate (BD, 51-2447KC) coated with anti-IFN-. gamma. (BD, 51-2525KC), and then secretion of IFN-. gamma.was detected using an anti-IFN-. gamma.detection antibody (BD, 51-1818KZ) according to the instructions of the kit (BD, 551083). The IFN-gamma ELISPOT detection result shows that: on the whole, the number of IFN-gamma secreting splenocytes in the SLPs + anti-PD1 group is obviously superior to that in other groups, and is particularly remarkable in SLP1/2/6/7/15/16/17 and the like; comparative analysis of poly I C + anti-PD1 and SLPs + anti-PD1, the SLPs + anti-PD1 group produced more IFN-. gamma.secreting splenocytes in all SLPs except SLP5 (FIG. 8).

TABLE 9, H22 numbering for the neo-antigen SLP and the corresponding genes

SLP numbering	SLP1	SLP2	SLP3	SLP4	SLP5	SLP6
Gene	Bard1	Cep192	Dhodh	Endog	Eya3	Fbxo4
SLP numbering	SLP7	SLP8	SLP9	SLP10	SLP11	SLP12
Gene	Hipk1	Kpnb1	Lcp1	Rnf121	Sdcbp	Sestd1
SLP numbering	SLP13	SLP14	SLP15	SLP16	SLP17
Gene	Slc25a37	Snd1	Srr	Tiam1	Vps33a

Discussion of results

The above results show that: 1. the SLPs are used independently, so that the growth of the tumor in early and middle stages (7-14 days) is slightly inhibited; 2. the anti-PD1 can be used independently to obviously inhibit the growth of tumors, wherein 2/6 mice have disappeared tumors; poly I, C alone or in combination does not affect tumor growth; SLPs + anti-PD1 treatment shows very strong inhibition effect in early and middle stages of tumor growth, and can be continued until the tumor disappears, and finally only 1 mouse tumor grows to escape, but is still significantly inhibited; treatment with SLPs + anti-PD1 resulted in more IFN- γ positive splenocytes in mice against neoantigen SLPs, suggesting that treatment with SLPs + anti-PD1 achieved tumor clearance by inducing antigen-specific immune responses. These results indicate that the method for identifying neoantigens of the present invention can effectively calculate and screen out neoantigens with immunogenicity.

Claims

A method of identifying a tumor neoantigen in a subject, the method comprising the steps of:

(a) analyzing the sequencing results of the whole exome of the tumor tissues or cells and the living cells of the normal tissues of the object to identify the tumor specific somatic mutation;

(b) analyzing the subject tumor tissue or cells for transcriptome sequencing results and further screening for somatic mutations identified in step (a);

(c) analyzing the sequencing result of the whole exome of the normal tissues or cells of the subject, and carrying out HLA typing on the patient;

(d) analyzing the binding of the mutant peptide corresponding to the somatic mutation to MHC based on the results of steps (b) and (c), thereby screening candidate tumor-specific neoantigens.
The method of claim 1, wherein step (a) separately identifies somatic mutations from the whole exome sequencing results by at least 3 different methods, and selects for somatic mutations that were all identified in the at least 3 different methods, for example the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2, and MuSE.
The method of claim 2, wherein step (a) identifies the somatic mutation using at least 5 different methods, e.g., the at least 5 different methods include strelska 1, strelska 2, VarScan, Mutect2, and MuSE.
The method of any one of claims 1-3, wherein step (a) further screens for somatic mutations meeting the following criteria:

1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;

2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;

3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;

4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and

5) the mutant has an allele frequency of less than 0.01 in a whole exome sequencing result of a normal tissue or cell comprising at least 100 normal subjects.
The method of any one of claims 1-4, wherein step (b) comprises selecting a somatic mutation based on the level of gene expression.
The method of claim 5, wherein somatic mutations located within highly expressed genes are selected, preferably the highly expressed genes have an RPKM of greater than or equal to 1.
The method of any one of claims 1 to 6, wherein step (b) comprises performing a selection of said somatic mutations at the gene structure level and at the level of affecting gene-encoded function, preferably a somatic mutation with a structural annotation for the selection gene as exonic and a functional annotation for the affecting gene-encoded function as nnsynnyms SNV.
The method of any one of claims 1-7, wherein step b) further comprises assessing the expression level of an HLA gene, a CD4 gene and/or a CD8 gene in the subject.
The method of any one of claims 1-8, wherein in step (c) at least the following databases are used for HLA typing: ATHLATES, HLA-HD, HLAVBseq, seq2HLA and HLAminer.
The method of any one of claims 1-9, wherein step (d) comprises:

d1) extracting an amino acid sequence corresponding to the somatic mutation, for example, an amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation;

d2) based on the HLA typing results of step (c), scoring and ranking the extracted mutant peptides independently by MHC binding affinity, MHC binding stability, proteasome digestion, mass spectrometry data, respectively; and

d3) based on the results of step d2), candidate tumor neoantigens are selected by scoring and ranking the mutant peptides by geometric mean.
The method of claim 10, wherein the extracted mutant peptides are scored and ordered in step (d2) using one or more selected from the group consisting of NetMHCcons, NetMHC, NetMHCpan, PickPocket, mhcfury, netMHCstab, NetChop.
A device for identifying a tumor neoantigen in a subject, the device comprising: a memory for storing a program; a processor for implementing the method of any one of claims 1 to 11 by executing a program stored in the memory.
A computer readable storage medium comprising a program executable by a processor to implement the method of any one of claims 1-11.
A device for identifying tumor neoantigens in a subject, the device comprising the following four modules: a somatic mutation identification module I) for identifying tumor-specific somatic mutations based on the results of whole exome sequencing of the tumor tissue or cells and normal tissue or cells of the subject; a tumor-specific somatic mutation screening module II that further screens for tumor-specific somatic mutations based on the transcriptome sequencing results of the subject tumor tissue or cells); an HLA typing module III for HLA typing based on the sequencing result of the whole exome of the normal tissue or cell of the subject); and tumor neoantigen prediction module IV).
The apparatus of claim 14, wherein the somatic mutation identification module I) identifies the somatic mutations from the whole exome sequencing results by at least 3 different methods, respectively independently, and selects for the somatic mutations that were all identified in the at least 3 different methods, e.g. the at least 3 different methods are selected from the group consisting of Strelka1, Strelka2, VarScan, Mutect2 and MuSE.
The device of claim 15, somatic mutation identification module I) identifies the somatic mutations using at least 5 different methods, e.g., the at least 5 different methods include strelska 1, strelska 2, VarScan, Mutect2, and MuSE.
The device of any one of claims 14-16, the somatic mutation identification module I) further screening for somatic mutations meeting the following criteria:

1) the sequencing depth of the tumor tissue or cell and the normal tissue or cell is greater than or equal to 10;

2) (ii) in the sequencing data of the tumor tissue or cell, the number of reads comprising the mutation is greater than or equal to 3;

3) (ii) the allele frequency of the mutation is greater than 0.1 in the sequencing data of the tumor tissue or cell;

4) (ii) in the sequencing data of the normal tissue or cell, the mutant allele frequency is less than or equal to 0.01; and

5) the allelic frequency of the mutation is less than 0.01 in the whole exome secondary sequencing result of a normal tissue or cell comprising at least 100, at least 200, at least 300 or more, for example 200 and 300 normal subjects.
The device of any one of claims 14-17, wherein the tumor specific somatic mutation screening module II) selects a somatic mutation based on gene expression level.
The apparatus of claim 18, wherein somatic mutations within highly expressed genes are selected, e.g., the highly expressed genes have an RPKM of greater than or equal to 1.
The device of any one of claims 14-19, tumor specific somatic mutation screening module II) performs a selection of said somatic mutations at the gene structure level and at the level of influencing gene encoding function, e.g. a somatic mutation with a structural level of the selection gene annotated as exonic and a functional level of the influencing gene encoding annotated as nosynonymous SNV.
The device of any one of claims 14-20, tumor specific somatic mutation screening module II) further assessing the expression level of HLA gene, CD4 gene and/or CD8 gene in said subject.
Device according to any of claims 14 to 21, HLA typing module III) HLA typing using at least the following databases: ATHLATES, HLA-HD, HLAVBseq, seq2HLA and HLAminer.
The device of any one of claims 14-22, tumor neoantigen prediction module IV):

extracting an amino acid sequence corresponding to the somatic mutation, for example, an amino acid sequence of about 8 to 35 amino acids, preferably about 15 to 27 amino acids, corresponding to the somatic mutation, thereby obtaining a mutant peptide corresponding to the somatic mutation;

based on HLA typing results, the extracted mutant peptides are respectively and independently scored and sequenced through MHC binding affinity, MHC binding stability, proteasome enzyme digestion and mass spectrum data; and

and comprehensively scoring and sequencing the mutant peptides by a geometric mean method, thereby selecting candidate tumor neoantigens.
The device of claim 23, wherein the extracted mutant peptides are scored and ordered using NetMHCcons, NetMHC, NetMHCpan, PickPocket, MHCflurry, netMHCstab, NetChop, respectively.
A method of treating cancer in a subject, the method comprising:

a) identifying at least one neoplastic antigen of the subject by the method of any one of claims 1-11;

b) generating at least one tumor neoantigen identified in step a); and

c) administering to said subject said at least one tumor neoantigen produced in step b).
The method of claim 25, wherein the method further comprises administering to the subject an immune checkpoint inhibitor.